Comments (20)
test succeeds if run in a vm with ext4 fs. machine it is failing on is using zfs and btrfs.
This was a great hint and I was able to figure it out now. The test code sorts each individual data_tree (nested/, flat/, empty_dir/), but not the root data_tree.
So, why was it working on most systems and only failing on some? That seems to have to do with different file systems reporting different sizes for the SampleWorkspace. I inspected the actual output of the test on a system with an ext4-filesystem with cargo test multiple_names -- --show-output
. This is the result:
---- multiple_names stdout ----
ACTUAL:
40 ┌──empty-dir│██████████ │ 14%
3 │ ┌──3 │█░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │ 1%
126 ├─┴flat │████████████████████████████████ │ 43%
6 │ ┌──1 │██▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒░░░░░░░░░░░░░░░ │ 2%
66 │ ┌─┴0 │█████████████████░░░░░░░░░░░░░░░ │ 23%
126 ├─┴nested │████████████████████████████████ │ 43%
292 ┌─┴(total) │█████████████████████████████████████████████████████████████████████████│100%
EXPECTED:
40 ┌──empty-dir│██████████ │ 14%
3 │ ┌──3 │█░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │ 1%
126 ├─┴flat │████████████████████████████████ │ 43%
6 │ ┌──1 │██▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒░░░░░░░░░░░░░░░ │ 2%
66 │ ┌─┴0 │█████████████████░░░░░░░░░░░░░░░ │ 23%
126 ├─┴nested │████████████████████████████████ │ 43%
292 ┌─┴(total) │█████████████████████████████████████████████████████████████████████████│100%
Note that the two folders nested/ and flat/ are actually reported to have the exact same size! Therefore, they already are in the correct order and no sorting is required. The test succeeds. In comparison, in the failing test output, you can see that these folders have different sizes (due to a different file system), and therefore the EXPECTED value is not sorted correctly.
That also means, you can actually reproduce this failure on any system, by making sure the size of flat/
is larger than nested/
on all systems, e.g. by just adding a single byte:
--- a/tests/_utils.rs
+++ b/tests/_utils.rs
@@ -79,7 +79,7 @@ impl Default for SampleWorkspace {
"0" => file!("")
"1" => file!("a")
"2" => file!("ab")
- "3" => file!("abc")
+ "3" => file!("abcd")
}
"nested" => dir! {
"0" => dir! {
I will add a PR with a fix in a minute :)
from parallel-disk-usage.
This is weird. pdu
always sort output by their sizes (IIRC, I haven't touch this in a long time). What stopped working on Nix CI?
from parallel-disk-usage.
Anyway, I use rayon's into_par_sorted
to sort the results. This is a rayon's bug. If this rayon's bug has been fix, then fixing this bug is as simple as updating rayon's version. If not, we would have to forward this issue to rayon repo and fallback to regular sort on aarch64-linux.
from parallel-disk-usage.
What stopped working on Nix CI?
the package was just added to nixpkgs (PR submitted 3 weeks ago) and always failed on the aarch64 linux CI.
from parallel-disk-usage.
Can you add a patch to Nix aarch64 build that replace rayon's sort with regular Vec's sort then tell me if it passes?
from parallel-disk-usage.
Actually, pdu's rayon is outdated. You should try updating the rayon version first to see if it passes. If it does, I will update rayon and release a new version.
from parallel-disk-usage.
if i run cargo test
on master my machine fails the same test -- AMD Ryzen 7 5700X 8-Core Processor
just updating the Cargo.lock will fail shell completion tests. just updating the rayon
and rayon-core
by copy / pasting the ones from the new lock file to the old lock file fails in the same way.
from parallel-disk-usage.
just updating the Cargo.lock will fail shell completion tests
This is trivial, just execute ./generate-completions.sh
.
just updating the
rayon
andrayon-core
by copy / pasting the ones from the new lock file to the old lock file fails in the same way.
I don't know how cargo
actually works, but I suspect that it didn't actually update because it detected wrongly that the lockfile is up-to-date.
from parallel-disk-usage.
tried again, same issue. verified new rayon by noting
Compiling rayon-core v1.12.1
Compiling rayon v1.8.1
printed. same issue.
from parallel-disk-usage.
test succeeds if run in a vm with ext4 fs. machine it is failing on is using zfs and btrfs. not sure if that matters. not sure what the aarch64 nix CI is using as a filesystem.
from parallel-disk-usage.
also tried tasksel 01 cargo test
to just run on one core and test still fails.
from parallel-disk-usage.
It still sorts incorrectly then?
So I reexamined the log viewer (from the link you posted) and see that in the fail test, there's one called ACTUAL
and one called EXPECTED
. It's actually the EXPECTED
that is sorted wrong, because it has 42% (nested) under 58% (flat).
The EXPECTED
value was generated by this code:
parallel-disk-usage/tests/usual_cli.rs
Lines 632 to 639 in 8e29f89
In short, it's the test code that bugged, the main code works fine.
from parallel-disk-usage.
yeah, still incorrect -- test fails. i thought that the failure was posted at the top of this message -- guess not. it's the same as the link but pasting it here too. I guess we can just disable the test then for nix.
---- multiple_names stdout ----
ACTUAL:
6 ┌──1 │███████████████████▒▒▒▒▒▒▒░░░░░░ │ 25%
8 ┌─┴0 │██████████████████████████░░░░░░ │ 33%
10 ┌─┴nested│████████████████████████████████ │ 42%
1 │ ┌──1 │███░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │ 4%
2 │ ├──2 │██████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │ 8%
3 │ ├──3 │██████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │ 13%
14 ├─┴flat │█████████████████████████████████████████████ │ 58%
24 ┌─┴(total) │█████████████████████████████████████████████████████████████████████████████│100%
EXPECTED:
1 ┌──1 │███░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │ 4%
2 ├──2 │██████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │ 8%
3 ├──3 │██████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │ 13%
14 ┌─┴flat │█████████████████████████████████████████████ │ 58%
6 │ ┌──1 │███████████████████▒▒▒▒▒▒▒░░░░░░ │ 25%
8 │ ┌─┴0 │██████████████████████████░░░░░░ │ 33%
10 ├─┴nested│████████████████████████████████ │ 42%
24 ┌─┴(total) │█████████████████████████████████████████████████████████████████████████████│100%
thread 'multiple_names' panicked at 'assertion failed: `(left == right)`
from parallel-disk-usage.
The EXPECTED
value was sorted by this line:
parallel-disk-usage/tests/usual_cli.rs
Line 620 in 8e29f89
which calls Vec::sort_by
recursively:
parallel-disk-usage/src/data_tree/sort.rs
Lines 11 to 17 in 8e29f89
(It turns out I didn't use rayon method for sorting, only iterating)
I wonder what's the difference between the test code and the compile binary? Could it be a race condition?
Anyway, can you try replacing par_iter_mut
with iter_mut
to see if it still fails?
parallel-disk-usage/src/data_tree/sort.rs
Line 14 in 8e29f89
from parallel-disk-usage.
modified code as shown by the following diff -- test still fails with the same error. I am not fluent in rust.
--- a/src/data_tree/sort.rs
+++ b/src/data_tree/sort.rs
@@ -1,6 +1,6 @@
use super::DataTree;
use crate::size::Size;
-use rayon::prelude::*;
+//use rayon::prelude::*;
use std::cmp::Ordering;
impl<Name, Data> DataTree<Name, Data>
@@ -11,7 +11,7 @@ where
/// Sort all descendants recursively, in parallel.
pub fn par_sort_by(&mut self, compare: impl Fn(&Self, &Self) -> Ordering + Copy + Sync) {
self.children
- .par_iter_mut()
+ .iter_mut()
.for_each(|child| child.par_sort_by(compare));
self.children.sort_by(compare);
}
from parallel-disk-usage.
@a-n-n-a-l-e-e Can you restore the code (back to par_iter_mut
) then run cargo test --release
instead?
from parallel-disk-usage.
@a-n-n-a-l-e-e Can you restore the code (back to
par_iter_mut
) then runcargo test --release
instead?
done -- test still fails.
from parallel-disk-usage.
At this point, I'm out of ideas.
I guess you can make a little patch for your special build that adds #[ignore]
above the failing tests and call it a day. Since it is the test that got it incorrect anyway.
from parallel-disk-usage.
@peret One thing I don't understand: Both the test code and the main code are called on the same SampleWorkspace
, it should work on the same filesystem. How is it possible that the same filesystem reports different results?
from parallel-disk-usage.
@KSXGitHub, it's because test code and main code do slightly different things. The main code builds the entire DataTree first and then sorts that tree. The test code sorts each individual sub_tree first (flat/, nested/, empty_dir/) and then constructs the overall DataTree from those children. It doesn't, however, sort the overall tree again.
At least that's how I read the code and it seems to make sense, to me.
from parallel-disk-usage.
Related Issues (20)
- 1500K should be shown as 1.5M
- Integration tests
- Improve documentation
- Add schema version to JSON
- Add pdu version to JSON HOT 6
- Preserve root paths HOT 1
- Can you Ignore duplicate inodes? HOT 1
- Better documentation for CLI usage HOT 2
- Feature Request: Filter files by file extension or regex HOT 3
- Dependency Dashboard
- exclude path option HOT 2
- Lots of errors on runtime on macOS by default HOT 4
- Add 'arm64-darwin' build and publish to Homebrew and MacPorts HOT 4
- Fix the benchmark CI
- Show incorrect size for files stored on cloud with link on local disk HOT 7
- stack overflow error HOT 1
- Example to use parallel-disk-usage as a crate. HOT 4
- Informing @Byron about a reverting of a change HOT 3
- HDD performance is poor HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from parallel-disk-usage.