Comments (7)
Thanks for the feedback! This is certainly something to consider, at least on some platforms. I'm not sure it would be a huge resource saver, and many of watchman's features (such as coalescing filesystem events) also exist in Mutagen, but it might actually help with platforms that are difficult to watch with a Go-based implementation.
For macOS and Windows, we'd probably spend more on the resources to shell out to watchman-wait
or connect to the watchman daemon than we spend at the moment on the native recursive watches provided by FSEvents/ReadDirectoryChangesW
, so I'm not sure this makes sense there.
For Linux, the case is a little less clear. At the moment, Mutagen has a hybrid mechanism where it uses polling subsidized with a limited number of inotify
watches on the most recently updated contents to maintain low synchronization latency without exhausting watch descriptors. The limit is currently 50 inotify
directory watches per synchronization session, and I suspect that most people won't hit even that number. I usually only work in a few directories a day, maybe 10-20 at most, but I'd need more data points to understand the right limit. I think the default user watch descriptor limit on Linux is 8192, so Mutagen shouldn't go too far towards exhausting this limit, even if other watch services are running. But, yes, if you're already spending the resources on watching via watchman, I can see why you'd want to avoid double spending.
The only major downside to Mutagen's hybrid approach is that it has a polling loop, but I'm not sure this is really avoidable. Even watchman has a polling mechanism for its inotify
implementation for cases where changes are dropped due to rate limits. The benefit of watchman is that I think it only runs this polling if it detects these sorts of drops.
I guess the other benefit of watchman on Linux would be that, if you really want to, you can do closer-to-real-time monitoring on every directory in your hierarchy, though it's really, really hard to get right with inotify
(and kqueue
et al.). If anyone has done it correctly, I have no doubt that it's watchman, so I certainly see the benefit of enabling that for Mutagen users.
My only real concern with it is that people inevitably run into the reality of limited watch descriptors as soon as they try to watch a node_modules
directory or something similar, and they usually go to the highest-level project available to air their grievance. I'd like to avoid this if possible.
Actually, I'd be interested in hearing about your workflow with watchman, especially if you're on Linux, in terms of how you manage dealing with limited watch descriptors, whether or not you've run into issues with watch descriptor exhaustion, whether you ignore directories like node_modules
, how your other tools interact with watchman, etc. Any experience reports would be hugely useful in designing integration.
For the BSDs and things like Solaris, this would probably be a good route to take. Mutagen currently just uses polling on these platforms because I haven't found a good kqueue
implementation in Go that doesn't try to be recursive and haven't had time to write one. Solaris' FEN also requires cgo, which is a bit of a pain, though Go's syscall implementation on Solaris goes through libc somehow, so it might be possible to piggyback on that infrastructure to avoid cgo. In any case, these aren't large user cross-sections, so I haven't prioritized them. Being able to shell out to watchman on these platforms to provide some semblance of real-time watching would probably provide a lot of benefit for significantly less effort than a native Go implementation.
from mutagen.
Thanks for the feedback! This is certainly something to consider, at least on some platforms. I'm not sure it would be a huge resource saver, and many of watchman's features (such as coalescing filesystem events) also exist in Mutagen, but it might actually help with platforms that are difficult to watch with a Go-based implementation.
For macOS and Windows, we'd probably spend more on the resources to shell out to
watchman-wait
or connect to the watchman daemon than we spend at the moment on the native recursive watches provided by FSEvents/ReadDirectoryChangesW
, so I'm not sure this makes sense there.For Linux, the case is a little less clear. At the moment, Mutagen has a hybrid mechanism where it uses polling subsidized with a limited number of
inotify
watches on the most recently updated contents to maintain low synchronization latency without exhausting watch descriptors. The limit is currently 50inotify
directory watches per synchronization session, and I suspect that most people won't hit even that number. I usually only work in a few directories a day, maybe 10-20 at most, but I'd need more data points to understand the right limit. I think the default user watch descriptor limit on Linux is 8192, so Mutagen shouldn't go too far towards exhausting this limit, even if other watch services are running. But, yes, if you're already spending the resources on watching via watchman, I can see why you'd want to avoid double spending.
Yeah, I am on Linux as my OS, so the appeal of using watchman would be using those resources only once, and getting a more live sync compared to the one handled by the polling loop Mutagen uses.
The only major downside to Mutagen's hybrid approach is that it has a polling loop, but I'm not sure this is really avoidable. Even watchman has a polling mechanism for its
inotify
implementation for cases where changes are dropped due to rate limits. The benefit of watchman is that I think it only runs this polling if it detects these sorts of drops.
Yeah, the advantage of this for me would be 2-fold:
- Not spending resources polling very large directories every 10 seconds when they aren't being changed
- Taking advantage of existing polling & watch work being done, and avoid duplicating effort.
I guess the other benefit of watchman on Linux would be that, if you really want to, you can do closer-to-real-time monitoring on every directory in your hierarchy, though it's really, really hard to get right with
inotify
(andkqueue
et al.). If anyone has done it correctly, I have no doubt that it's watchman, so I certainly see the benefit of enabling that for Mutagen users.My only real concern with it is that people inevitably run into the reality of limited watch descriptors as soon as they try to watch a
node_modules
directory or something similar, and they usually go to the highest-level project available to air their grievance. I'd like to avoid this if possible.
I definitely understand that. I've increased my watch descriptor count as I am often watching multiple copies of mozilla-central with watchman, so I need far, far more than 8192 watches. Perhaps with good documentation, and making watchman
an opt-in feature, you'd be able to avoid the watch descriptor exhaustion bugs.
Actually, I'd be interested in hearing about your workflow with watchman, especially if you're on Linux, in terms of how you manage dealing with limited watch descriptors, whether or not you've run into issues with watch descriptor exhaustion, whether you ignore directories like
node_modules
, how your other tools interact with watchman, etc. Any experience reports would be hugely useful in designing integration.
I have my /proc/sys/fs/inotify/max_user_watches
set to 1,000,000 right now, and I haven't run into issues with it recently, but when I was using only ~30k watches I would occasionally hit the limit and have to raise it.
I try to ignore giant directories when possible, such as the build output directory or node_modules
, but with the size of some repositories I work on that isn't enough to avoid exhausting the default watch limit.
For the BSDs and things like Solaris, this would probably be a good route to take. Mutagen currently just uses polling on these platforms because I haven't found a good
kqueue
implementation in Go that doesn't try to be recursive and haven't had time to write one. Solaris' FEN also requires cgo, which is a bit of a pain, though Go's syscall implementation on Solaris goes through libc somehow, so it might be possible to piggyback on that infrastructure to avoid cgo. In any case, these aren't large user cross-sections, so I haven't prioritized them. Being able to shell out to watchman on these platforms to provide some semblance of real-time watching would probably provide a lot of benefit for significantly less effort than a native Go implementation.
That is true. I think I would generally agree that watchman shouldn't be the default backend, but if a platform is not supported otherwise, etc. it could be nice to have it as an option on that platform, to get the wide support base & near-real-time watching with less effort than a fresh implementation.
The extra dependency is unfortunate though
from mutagen.
Thanks for all of the info and feedback - all of that makes sense. I didn't even know you could set the watch descriptor limit that high, so that's useful information. I agree that, with proper documentation, it's possible to make the limitations and workarounds clear.
The extra dependency is unfortunate though
I think it's probably not too much of a concern. I like to think of it as an "integration" rather than a "dependency" - it's all about the marketing spin! People who run BSDs and Solaris and Linux as their daily drivers usually don't mind a little extra effort if it allows them greater flexibility.
As a thought - what if this integration was done a bit differently...
Instead of Mutagen shelling out or connecting to watchman, is it possible that Mutagen could:
- Add a "no watch" watching mode that did no polling or scanning, and
- Add a
flush
command (and possibly some other scripting) as you suggested
Then, sessions could be created without any polling/watching overhead and watchman's hooks could be used to drive Mutagen synchronization. I feel like watchman might be in a better position to be the commanding process in this case anyway, especially since Mutagen won't really know anything about its configuration. Some documentation of how to set this up could be added, since people will probably want to customize it anyway, and a naive Mutagen-watchmen integration might not leave that much flexibility.
from mutagen.
Instead of Mutagen shelling out or connecting to watchman, is it possible that Mutagen could:
1. Add a "no watch" watching mode that did no polling or scanning, and 2. Add a `flush` command (and possibly some other scripting) as you suggested
Then, sessions could be created without any polling/watching overhead and watchman's hooks could be used to drive Mutagen synchronization. I feel like watchman might be in a better position to be the commanding process in this case anyway, especially since Mutagen won't really know anything about its configuration. Some documentation of how to set this up could be added, since people will probably want to customize it anyway, and a naive Mutagen-watchmen integration might not leave that much flexibility.
Hmm, that's a interesting idea. With the flush command, it'd be nice if there was some mechanism to pass in the list of changed files, as watchman would be able to provide it.
One thing which would be nice if that strategy was taken would be to allow the remote machine to still use the 'portable' approach even though polling had been disabled locally.
from mutagen.
With the flush command, it'd be nice if there was some mechanism to pass in the list of changed files, as watchman would be able to provide it.
It's an interesting idea. I'm not exactly sure how to integrate this into Mutagen's algorithm, which expects to do a full (but fast) scan and three-way merge on every synchronization cycle. If the paths passed to Mutagen didn't thoroughly represents all of the changes, Mutagen would still work, but would report errors when it went to do a compare-and-swap operation on disk for a path that had been modified but unreported. For watchman this probably wouldn't be an issue, but I'd want to avoid giving people a footgun.
There's also the syntax for reporting changes to figure out. Does watchman have a standard(ish) format that it uses? I'd prefer not to reinvent the wheel.
One thing which would be nice if that strategy was taken would be to allow the remote machine to still use the 'portable' approach even though polling had been disabled locally.
Noted and agreed. I'll add endpoint-specific watch mode settings.
from mutagen.
There's also the syntax for reporting changes to figure out. Does watchman have a standard(ish) format that it uses? I'd prefer not to reinvent the wheel.
IIRC it uses JSON? I think something like:
{
"files": [
"path/to/a.txt",
"b.txt"
]
}
(e.g. that seems to be what the template git fsmonitor hook does: https://github.com/git/git/blob/c4df23f7927d8d00e666a3c8d1b3375f1dc8a3c1/templates/hooks--fsmonitor-watchman.sample#L63-L113)
from mutagen.
Although it's only tangentially related, I've added a no-watch
file watching mode in v0.8.0
(currently in beta) which allows users to manually manage synchronization via the new mutagen flush
command. In theory, sessions could be created with --watch-mode=no-watch
and Watchman could invoke mutagen flush <session-or-path>
or mutagen flush --all
when it sees a change. This is still a far cry from more extensive integration, where Mutagen would connect to Watchman, but it might be a stop-gap solution for some users.
from mutagen.
Related Issues (20)
- beta scan error: raw POSIX symbolic links not supported on Windows
- Volume override doesn't work since 4.27.1 HOT 3
- Reduce the space usage of folder `staging`
- unable to create file: unable to relocate staged file: Access is denied.
- permission changes after restart computer on Windows
- Last error: beta polling error: unable to receive poll response: unable to read message length: unexpected EOF
- unable to create symbolic link: operation not permitted
- unable to swap file: unable to validate existing file: modification detected
- Ability to ignore different file mode (executable) as conflicts
- Ability to synchronize timestamps HOT 2
- Ability to create multiple config templates in `.mutagen.yml`
- Ability to have end-to-end encryption transmission
- Really slow scan/staging on NFS mount
- Ability to execute custom command after synchronization done
- Mutagen does'nt sync files on my MAC HOT 1
- [Windows]: Could not find ssh in the lookup path when installed with winget
- High rescan time with watching disabled on endpoint HOT 9
- Add support for hardlink detection and matching HOT 1
- Mutagen sync create will fail to find ssh even when present
- Ability to create forward proxy for creating SOCKS proxy HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mutagen.