GithubHelp home page GithubHelp logo

feature suggestion: cache about lintr HOT 8 CLOSED

r-lib avatar r-lib commented on September 18, 2024
feature suggestion: cache

from lintr.

Comments (8)

jimhester avatar jimhester commented on September 18, 2024

All of the plugins start a new R process each time they lint, so any caching would have to be on-disk so they would persist across sessions.

However I don't see an easy way to perform caching without sacrificing correctness in some way(especially running only on the diff).

There are some major performance issues with the current linter implementations. I have made some improvements to the worst offenders e4076bd and I am going to do a full pass over all the linters to make things snappy even on larger files.

from lintr.

daroczig avatar daroczig commented on September 18, 2024

Well, improving performance is an ever better solution :)

from lintr.

jimhester avatar jimhester commented on September 18, 2024

I think some caching strategy has merit, I will look into it some more.

from lintr.

daroczig avatar daroczig commented on September 18, 2024

What would be IMHO useful for large files with e.g. bunch of R functions, is to save the partial lint results to a file named to the hash of the partial content. E.g. I have some R files in my R packages, which include tens of functions, and most of the time I'm just only working on one of those, so it's useless to also lint the rest every time.

From a technical point of view, parse can define these logical parts to be cached, like functions, if conditions, for loops etc. What I did in the pander package:

  • parse R expressions to logical parts
  • start a timer each time before analysing the content of these logical parts (separately)
  • check if results are found in the cache and return if found
  • do stuff
  • check if the time elapsed since the timer was start is larger than e.g. 0.1 second
  • if it was larger, save the results to a file named to the hash of the content of the related logical part

More details, if you might find this useful: http://rapporter.github.io/pander/#caching

from lintr.

jimhester avatar jimhester commented on September 18, 2024

yeah caching based on a hash of the content was what I was thinking of doing. Thanks for the link, I will check it out.

from lintr.

jimhester avatar jimhester commented on September 18, 2024

One issue with the current caching is that if you add or remove lines from a file the cached lints will have the wrong line numbers. I'll have to think about how to update them correctly.

from lintr.

jimhester avatar jimhester commented on September 18, 2024

I can think of 2 solutions to this.

  1. Store the previous file state in the cache and use a diff utility to figure out what changed, then adjust the lines if needed.
    • problems
      • No cross platform diff utility I can assume window will have. (unless I am ignorant of one). We can't assume they will have git installed for instance.
  2. Search for lines from cached lints in the new file.
    • problems
      • would be slow unless care is taken
        • maybe start at the previous location and alternate searching above and below?
        • or put all linted lines in a hash and do one pass through the file.
      • duplicate lines could create false positives (this is true of 1 as well).

I think 2 is the way to go, we won't have to parse diff output or figure out how to diff on windows, in the basic case (line is in same position) 2 will be plenty fast.

from lintr.

jimhester avatar jimhester commented on September 18, 2024

This was actually much easier than I though it would be, I implemented 2 above, seems to be working fine. There may be some weirdness if you reorder two of exactly the same lines, but I think it is rare enough to punt on.

from lintr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.