Running lint on large files (like editing an R

I can think of 2 solutions to this. Store the previous file st

feature suggestion: cache about lintr HOT 8 CLOSED

r-lib commented on September 18, 2024

feature suggestion: cache

from lintr.

Comments (8)

jimhester commented on September 18, 2024

All of the plugins start a new R process each time they lint, so any caching would have to be on-disk so they would persist across sessions.

However I don't see an easy way to perform caching without sacrificing correctness in some way(especially running only on the diff).

There are some major performance issues with the current linter implementations. I have made some improvements to the worst offenders e4076bd and I am going to do a full pass over all the linters to make things snappy even on larger files.

from lintr.

daroczig commented on September 18, 2024

Well, improving performance is an ever better solution :)

from lintr.

jimhester commented on September 18, 2024

I think some caching strategy has merit, I will look into it some more.

from lintr.

daroczig commented on September 18, 2024

What would be IMHO useful for large files with e.g. bunch of R functions, is to save the partial lint results to a file named to the hash of the partial content. E.g. I have some R files in my R packages, which include tens of functions, and most of the time I'm just only working on one of those, so it's useless to also lint the rest every time.

From a technical point of view, parse can define these logical parts to be cached, like functions, if conditions, for loops etc. What I did in the pander package:

parse R expressions to logical parts
start a timer each time before analysing the content of these logical parts (separately)
check if results are found in the cache and return if found
do stuff
check if the time elapsed since the timer was start is larger than e.g. 0.1 second
if it was larger, save the results to a file named to the hash of the content of the related logical part

More details, if you might find this useful: http://rapporter.github.io/pander/#caching

from lintr.

jimhester commented on September 18, 2024

yeah caching based on a hash of the content was what I was thinking of doing. Thanks for the link, I will check it out.

from lintr.

jimhester commented on September 18, 2024

One issue with the current caching is that if you add or remove lines from a file the cached lints will have the wrong line numbers. I'll have to think about how to update them correctly.

from lintr.

jimhester commented on September 18, 2024

I can think of 2 solutions to this.

Store the previous file state in the cache and use a diff utility to figure out what changed, then adjust the lines if needed.
- problems
  - No cross platform diff utility I can assume window will have. (unless I am ignorant of one). We can't assume they will have git installed for instance.
Search for lines from cached lints in the new file.
- problems
  - would be slow unless care is taken
    - maybe start at the previous location and alternate searching above and below?
    - or put all linted lines in a hash and do one pass through the file.
  - duplicate lines could create false positives (this is true of 1 as well).

I think 2 is the way to go, we won't have to parse diff output or figure out how to diff on windows, in the basic case (line is in same position) 2 will be plenty fast.

from lintr.

jimhester commented on September 18, 2024

This was actually much easier than I though it would be, I implemented 2 above, seems to be working fine. There may be some weirdness if you reorder two of exactly the same lines, but I think it is rare enough to punt on.

from lintr.

feature suggestion: cache about lintr HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs