I'm adding inotify support git using watchman. One of the things I need is to know if

FWIW, I believe that <a class="user-mention notranslate" data-hovercard-type="user" da

return the list of new files since a specified point of time about watchman HOT 8 CLOSED

facebook commented on April 23, 2024

return the list of new files since a specified point of time

from watchman.

Comments (8)

wez commented on April 23, 2024

FWIW, I believe that @dturner-tw already has a working implementation of git + watchman.

To answer your question about since queries, yes, this is one of the core features in watchman.
We track changes along with an abstract clock identifier that ticks as changes are observed.
The watchman service can maintain a symbolic cursor that tracks a change for your specific tool.

For example, you can choose a cursor name like n:mytool (make sure that you don't pick a name that collides with another tool) then when you issue this query for the first time you'll get information about all the files in the entire tree:

["query", "/path/to/root", {
   "since": "n:mytool",
   "fields": ["name"]
}]

when you issue that query a second time, you'll get just the changes since the last time.

In some cases (if watchman got restarted, or you overflowed inotify kernel limits), watchman will tell you that some files have changed, even if they haven't really, to ensure that you don't miss changes.

You can read a bit more about this stuff here:

https://facebook.github.io/watchman/docs/cmd/query.html
https://facebook.github.io/watchman/docs/file-query.html
https://facebook.github.io/watchman/docs/clockspec.html

from watchman.

pclouds commented on April 23, 2024

Hi,

Yes I know about David's watchman support. I kinda compete with him in this :)

How does a cursor name show me new files since its time? Suppose I have file A already when I register n:mytool, then I delete A and recreate A (editors and compilers do that) and add B. When I query watchman I'd expect to see B only, not A. I can't rely on cclock because cclock would be updated because of the recreation.

$ ls
a
$ ../watchman watch `pwd`
{
    "version": "3.0.0",
    "watch": "<my path>"
}
$ echo '[ "query", "<my path>", { "since" : "n:mytool" } ]' | ../watchman query -j 
{
    "version": "3.0.0",
    "clock": "c:1415665145:5806:1:3",
    "is_fresh_instance": true,
    "files": [
        {
            "name": "a",
            "size": 0,
            "mode": 33188,
            "new": true,
            "exists": true
        }
    ]
}
$ echo '[ "query", "/home/pclouds/w/watchman/z", { "since" : "n:mytool" } ]' | ../watchman query -j 
{
    "version": "3.0.0",
    "clock": "c:1415665145:5806:1:7",
    "is_fresh_instance": false,
    "files": []
}
$ rm a 
$ echo 3>a
$ echo 3>b
$ ls
a  b
$ echo '[ "query", "/home/pclouds/w/watchman/z", { "since" : "n:mytool" } ]' | ../watchman query -j                                                                                             
{
    "version": "3.0.0",
    "clock": "c:1415665145:5806:1:14",
    "is_fresh_instance": false,
    "files": [
        {
            "name": "b",
            "size": 0,
            "mode": 33188,
            "new": true,
            "exists": true
        },
        {
            "name": "a",
            "size": 0,
            "mode": 33188,
            "new": true,
            "exists": true
        }
    ]
}

from watchman.

wez commented on April 23, 2024

From the perspective of the kernel and the filesystem, A is a new file here so that is what watchman is reporting to you. What watchman is saying is that something about A changed since you last looked. You can use that signal as a way to figure out whether you need to open the file and look at its content.

from watchman.

pclouds commented on April 23, 2024

I agree it could be done outside watchman. But that's less efficient. For short-lived programs like git, we would need to keep the list of all files somewhere on disk in order to determine if a file is a "new" or not, and pay I/O cost for this file list (David did this). But we use watchman to reduce I/O in the first place.

Or we could add yet another daemon to keep the whole file list on memory (so no I/O penalty), duplicating what watchman already keeps in memory. I could go with this, but I was hoping that maybe watchman can be extended somehow to let the user attach some custom attributes to its file list. I can try to work out something if you're interested in this option. Otherwise I think we can close this issue.

from watchman.

pclouds commented on April 23, 2024

Sorry I can't stop thinking about this. Another option instead of custom attributes is support clockspec in command "find". The user needs to ask for this in advance (e.g. at "query" time) so we can make a snapshot of the file list (basically one more linked list per cursor/clock in struct watchman_file).

from watchman.

sunshowers commented on April 23, 2024

That sounds far too expensive -- O(number of cursors/clocks * number of files) -- and I'm pretty sure you're going to have to resolve Watchman's file list against your own anyway. Having your own daemon sounds like the correct approach.

from watchman.

wez commented on April 23, 2024

I've thought about custom attributes in watchman in the past, but what it boils down to is that Watchman can't know enough about any specific use case to make optimal choices about slurping data out of files.
This makes it likely that any effort to save I/O in the client will result in a net increase in redundant or unnecessary I/O in Watchman. The next logical step from there is "well, let's add a plugin or extensibility framework so that we can run code in the process itself when files change", and that poses some other challenges; security of dynamic loading if we expose this only in C, embedding scripting language(s) instead of using C, ensuring that those functions return quickly enough that we don't underflow the notification stream and so on.

I'm not saying that we can't do any of these things, but one of the reasons that we haven't needed to thus far is that the client typically knows more about the nature of the watched root and can make smarter choices about when to incur the I/O cost, and that gain is bigger than we're likely to see by making changes in the watchman service.

The find command is a simple legacy command to find files. The since command is the corresponding simple legacy command to do the same with a clockspec. We recommend that you use the query command; both find and since are internally implemented in terms of query, so you'd be saving some small translation overhead.

from watchman.

pclouds commented on April 23, 2024

I finally agree there's no elegant way to put this in watchman. I guess I'll have to live with another daemon. Thank you for making watchman.

from watchman.

return the list of new files since a specified point of time about watchman HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs