Comments (22)
Ah, had to do into_iter
.
from fdir.
I considered going native before but the benefit of a 0 dependency purely Javascript library is far greater — I don't have to deal with platform differences or worry about building the module in all platforms. NAPI is a nightmare. Right now, fdir just works and it is fast. Aside from that, I am also not proficient enough in Rust or C or C++ to write high performance code. I am sure native would be faster.
I would love to benchmark of fdir
against other native options. That'd be actually pretty awesome. PRs are welcome 😄
from fdir.
Alright, curiosity got the better of me and I benchmarked jwalk
with fdir
using napi-rs
.
Code
I am no expert in Rust so this code is probably sub-par.
#![deny(clippy::all)]
#[macro_use]
extern crate napi_derive;
use napi::{CallContext, JsObject, JsUndefined, Result};
use jwalk::{Parallelism, WalkDir};
#[module_exports]
fn init(mut exports: JsObject) -> Result<()> {
exports.create_named_method("jwalk", jwalk)?;
Ok(())
}
#[js_function(1)]
fn jwalk(ctx: CallContext) -> Result<JsUndefined> {
// let dir = ctx.get::<JsString>(0)?.into_utf8()?;
let mut entries: Vec<String> = vec![];
for entry in WalkDir::new("/")
.parallelism(Parallelism::RayonNewPool(8))
.skip_hidden(false)
{
match entry {
Ok(entry) => entries.insert(entries.len(), entry.path().display().to_string()),
Err(_err) => {
println!("failed to access entry");
}
}
}
ctx.env.get_undefined()
}
I tried to be as fair as possible but my limited knowledge of Rust probably doesn't help. Maybe @Brooooooklyn can take a look and advise some improvements to the Rust code.
The benchmark code:
const { fdir } = require("./index.js");
const { jwalk } = require("fdir-bench");
(async () => {
console.time("fdir");
for (let i = 0; i < 5; ++i) {
await new fdir()
.withBasePath()
.withDirs()
.crawl("/")
.withPromise();
}
console.timeEnd("fdir");
console.time("jwalk");
for (let i = 0; i < 5; ++i) {
jwalk("/");
}
console.timeEnd("jwalk");
})();
Of course, this is very simplistic but it does the job.
Results
fdir: 7.795s
jwalk: 6.958s
Number of entries found:
fdir: 1685964 (dirs & files)
jwalk: 1689508 (dirs & files)
Thoughts
Well, fdir
is pretty fast. Only 1.2s slower than jwalk
. I am certain after a few optimizations fdir
can beat jwalk
.
from fdir.
Try this code to avoid re-allocated in Rust
let result: Vec<String> = WalkDir::new("/")
.parallelism(Parallelism::RayonNewPool(8))
.skip_hidden(false)
.filter_map(|dir_entry| dir_entry.ok().map(|entry| entry.path().display().to_string().to_string()))
.collect();
from fdir.
Unfortunately getting this:
error[E0599]: the method `filter_map` exists for struct `WalkDirGeneric<((), ())>`, but its trait bounds were not satisfied
--> src/lib.rs:25:10
|
25 | .filter_map(|dir_entry| {
| ^^^^^^^^^^ method cannot be called on `WalkDirGeneric<((), ())>` due to unsatisfied trait bounds
|
::: /home/thecodrr/.cargo/registry/src/github.com-1ecc6299db9ec823/jwalk-0.5.1/src/lib.rs:158:1
|
158 | pub struct WalkDirGeneric<C: ClientState> {
| ----------------------------------------- doesn't satisfy `WalkDirGeneric<((), ())>: Iterator`
|
= note: the following trait bounds were not satisfied:
`WalkDirGeneric<((), ())>: Iterator`
which is required by `&mut WalkDirGeneric<((), ())>: Iterator`
from fdir.
Any difference in results?
Also might be useful to add a pattern filter such as '**/package.json'.
from fdir.
Did some testing of my own.
MacBook Pro 16 - 2.4 GHz 8-Core Intel Core i9
jwalk - cargo run (unoptimized + debuginfo)
16 threads = 3738ms
jwalk - cargo run --release
8 threads = 2363ms
16 threads = 1732ms
jwalk - napi (returning results)
817,100 files
16 threads = 2369ms
fdir
773,467 files
2340ms
jwalk - cargo run - filter node_modules
300ms
jwalk - cargo run - filter node_modules + only package.json
...
fdir - filter node_modules
968ms
fdir - filter node_modules + only package.json
1020ms
It's actually very impressive how close they are considering one is running natively...for a large number of files.
Although when globbing is involved, jwalk
is winning by quite a bit.
Tried using both .glob
and .filter
to search for package.json
and I couldn't get it below 900ms.
I also found that using the Rust library fd
I can get 180ms which is impressive. Seems like all these libraries are adding a lot of overhead somewhere.
time fd package.json > /dev/null
fd package.json > /dev/null 0.17s user 0.18s system 824% cpu 0.043 total
fd
seems to use the ignore
package to walk dirs (which is actually used in ripgrep
) which jwalk
is suppose to be faster than.
My use case is to speed up the pnpm
package manager reading all package.json
files on startup. Currently they use fast-glob
which uses micromatch
. Because this has to run on startup, 200ms makes a big difference compared 1s.
Going to try out ignore
as a native lib. Seeing if there is a good api to use: BurntSushi/ripgrep#2028
Could also re-use the glob libs from ignore/ripgrep
with jwalk
and it should be fast enough. jwalk
has a simple parallel API.
Maybe worth looking at using tiny-glob (globrex)
for fdir
as it claims to be a lot faster.
from fdir.
@vjpr Can you share your benchmark code? I'd like to take a look and see what is causing that huge bottleneck.
from fdir.
After a couple "optimizations", here are the results:
jwalk: 2.6862s
ignore: 5.747s
fdir: 3.2936s
These results are average of 5 iterations on the root directory containing 1.6million files & about 300k directories.
jwalk
is very fast. fdir
gets really close but not quite there. I haven't tried using worker_threads
yet for directory traversal so perhaps there is opportunity for performance there. ignore
is slow. I have no idea how fd
is getting such performance? They say they are doing it in parallel which is also being done by jwalk
(via rayon) & fdir
(via libuv).
from fdir.
@vjpr So I went ahead and did a benchmark with fd
which you claimed to do everything within 200ms. Here are the results:
time fdfind --hidden --no-ignore package.json / > /dev/null
real 0m1.203s
user 0m4.639s
sys 0m4.378s
And I did the same thing with fdir
:
time UV_THREADPOOL_SIZE=2 NODE_ENV=production node test.js
real 0m1.226s
user 0m2.097s
sys 0m1.464s
Both returned around the same amount (26k) of results.
Now, maybe I did it wrong but clearly fdir
is faster than or about the same as fd
in terms of speed. Of course, fd
is doing plenty of other things so it isn't exactly fair but still...
Here's the code for the above fdir
script:
await new fdir()
.withBasePath()
.filter((path) => path.indexOf("package.json") > -1)
.crawl("/")
.withPromise();
One thing I noticed: using UV_THREADPOOL_SIZE=2
gave the best performance. Although it wasn't that much worse with default threadpool size either but if I increased it to 8, the crawling speed went down 2x.
from fdir.
@thecodrr What repo were you using to test?
Also just to note, default UV_THREADPOOL_SIZE=4
.
from fdir.
I didn't use any repo. I ran it in my root. Yes, that is 1.6 million files & 300k directories — bigger than 99% of the repositories which means more pattern failures. If you run it inside fdir
repo, it completes in under 12-13 ms.
Which repo did you test on?
Also just to note, default UV_THREADPOOL_SIZE=4.
Yes, I know. That's what surprised me. No idea why size of 2 gave a slightly better perf. The difference is too negligible on small directories though.
from fdir.
I'm using a large internal monorepo. We should find a nice and large public Node.js monorepo to use for tests.
So for me, increasing UV_THREADPOOL_SIZE=8
, fdir
starts beating jwalk
, which is cool!
What machine are you using?
I'm still yet to test out ignore
with crossbeam parallelism which seems to get 200ms.
from fdir.
I'm using a large internal monorepo. We should find a nice and large public Node.js monorepo to use for tests.
What do you suggest? React? Jest?
So for me, increasing UV_THREADPOOL_SIZE=8, fdir starts beating jwalk, which is cool!
Very interesting.
What machine are you using?
Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz with 16 GB RAM and of course an SSD (a mediocre one). And I am on Linux.
from fdir.
So ignore
with parallel is about 200ms faster than fdir
. 12 threads.
fn get_overrides(source: String) -> Override {
let mut override_builder = OverrideBuilder::new(source.clone());
let exclude_patterns = vec!["**/package.json", "!**/node_modules/**"];
for pattern in exclude_patterns {
override_builder
.add(pattern)
.map_err(|e| anyhow!("Malformed exclude pattern: {}", e));
}
let overrides = override_builder
.build()
.map_err(|_| anyhow!("Mismatch in exclude patterns"));
overrides.unwrap()
}
pub fn walk_parallel(source: String) -> Vec<String> {
let overrides = get_overrides(source.clone());
let (tx, rx) = mpsc::channel();
WalkBuilder::new(source)
.hidden(false)
.overrides(overrides)
.standard_filters(false)
.threads(cmp::min(12, num_cpus::get()))
.build_parallel()
.run(move || {
let tx = tx.clone();
Box::new(move |dir_entry_result| {
if let Ok(dir_entry) = dir_entry_result {
tx.send(dir_entry.file_name().to_owned()).unwrap();
}
ignore::WalkState::Continue
})
});
let mut metadatas: Vec<_> = rx.into_iter().map(|dir| dir.into_string().unwrap()).collect();
//metadatas.sort_by(|a, b| a.len().cmp(&b.len()))
metadatas
}
I'm putting together a repo now.
from fdir.
Linux
There was some discussion here about macOS APFS being slower:
https://users.rust-lang.org/t/whats-the-fastest-way-to-read-a-lot-of-files/39743/4?u=vjpr
...leads me to the conclusion that APFS obtains a global kernel lock during read-only operations such as readdir(). In other words, APFS slows down when attempting to perform parallel read-only I/O.
...
It is apparent that macOS 10.14 Mojave has received performance work relative to macOS 10.13
https://gregoryszorc.com/blog/2018/10/29/global-kernel-locks-in-apfs/
https://news.ycombinator.com/item?id=18332260
Although could be just per-directory lock for readdir
as someone mentions.
from fdir.
So ignore with parallel is about 200ms faster than fdir. 12 threads.
That's a lot of effort 😆. I put together a very, very rough parallel directory crawling in node using cluster
s to get 1.4s. Without clusters, I was getting around 6 seconds. fdir
can touch 1.4s without clusters. Guess where fdir
would be after proper parallelism...
from fdir.
Nice. Cluster seems like the way to go to reach the Node.js max perf. Although for CLI cold-start use cases, would depend on how long it takes the process pool to spin up which might negate the benefit.
My particular use case is 'pnpm' package manager. Many commands need to read the entire tree on startup, so if i could get that down to <500ms for large monorepos it would be great.
FYI, not sure workers will help.
Workers (threads) are useful for performing CPU-intensive JavaScript operations. They do not help much with I/O-intensive work. The Node.js built-in asynchronous I/O operations are more efficient than Workers can be.
https://nodejs.org/api/worker_threads.html#worker-threads
from fdir.
so if i could get that down to <500ms for large monorepos it would be great.
You'll have to give an example of a large monorepo so I can test it. To the best of my knowledge, you can easily traverse a fairly large monorepo under 500ms. That's excluding symlink resolution. I haven't tested with symlink resolution but it should be much, much slower.
FYI, not sure workers will help.
Yes, but child_process might. Have to check and see. I don't expect a huge difference mostly because libuv has all the threads maxed out most of the time. But doesn't hurt to try.
Although for CLI cold-start use cases, would depend on how long it takes the process pool to spin up
This is a real concern.
The main bottleneck is readdir
and as far as I know, there are no faster alternatives in node yet. At the end of the day, everything is limited by IO so there is a point after which the quest for performance becomes foolish.
from fdir.
The main bottleneck is
readdir
and as far as I know, there are no faster alternatives in node yet.
fs.readdir
is the most direct mapping to the underlying syscall readdir(3), so this will always be a limit in a Node env.
The only potential alternative would be WASI, but the Node.js implementation just uses libuv
under the hood anyway. However when using WASI the regex logic could be moved into WASM which might improve things. AssemblyScript could be used to write the WASM in TypeScript rather than needing a syntax change.
If we don't restrict ourselves to cold-start, we could use a persistent daemon that watches the file system for changes. This is a way to get instant speed.
- a. On file change, re-walk the entire tree. CLI queries daemon for instant response.
- High background CPU load.
- b. On file change, run matchers on only the changed files. CLI queries daemon for instant response.
- Less background CPU load.
- c. On CLI start, load a cache, query daemon for file changes, and apply. I use FB's
watchman
package for this at the moment.- No background CPU load.
I think that covers the entire solution space for reading dir trees :D
from fdir.
My benchmarks:
robocopy.exe with /log:output.txt - 7s
fdir - 2 minutes
On a volume with around 1,000,000 files.
from fdir.
@PlkMarudny whats your code for fdir and what are you benchmarking through?
from fdir.
Related Issues (20)
- Export type `Options` from `crawlWithOptions(dir, Options)` HOT 1
- Make a note about the `sync` api being much slower
- Remove callback API in next major HOT 1
- v7.0 Plan HOT 7
- Need to list path to symlink instead of resolved path HOT 2
- Limit files option HOT 1
- Need help debugging an issue?? with fdir HOT 4
- What's the correct way to use globbing? HOT 2
- Crawl with options group interface no longer correct in v5.2.0 (v5.1.0 was the same) HOT 1
- Ideas for mocking HOT 1
- [Feature] glob/crawl by symlink path, not real path HOT 7
- Sorting based on date, name, size etc
- Issue with SharePoint project accessible over the network HOT 2
- Invalid reference of TypeScript types HOT 2
- Feature request: pass picomatch to the builder HOT 3
- Avoiding double backslash
- Corner case when globbing current directory returns zero results HOT 1
- Problem with double Backslash on Windows
- Bug: Module '"picomatch"' has no exported member 'PicomatchOptions'. HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fdir.