GithubHelp home page GithubHelp logo

ksxgithub / parallel-disk-usage Goto Github PK

View Code? Open in Web Editor NEW
352.0 5.0 9.0 889 KB

Highly parallelized, blazing fast directory tree analyzer

Home Page: https://crates.io/crates/parallel-disk-usage

License: Apache License 2.0

Rust 85.22% Shell 3.66% PowerShell 1.18% TypeScript 7.83% Python 1.50% Elvish 0.59%
rust chart filesystem size dust du disk-usage graph pdu

parallel-disk-usage's Introduction

Parallel Disk Usage (pdu)

Test Benchmark Clippy Code formatting Crates.io Version

Highly parallelized, blazing fast directory tree analyzer.

Description

pdu is a CLI program that renders a graphical chart for disk usages of files and directories, it is an alternative to dust and dutree.

Benchmark

The benchmark was generated by a GitHub Workflow and uploaded to the release page.

Programs

benchmark results (lower is better)

(See more)

Demo

screenshot

asciicast of pdu command

asciicast of pdu command on /usr

Features

  • Fast.
  • Relative comparison of separate files.
  • Extensible via the library crate or JSON interface.
  • Optional progress report.
  • Customize tree depth.
  • Customize chart size.

Limitations

  • Ignorant of hard links: All hard links are counted as real files.
  • Do not follow symbolic links.
  • Do not differentiate filesystem: Mounted folders are counted as normal folders.
  • The runtime is optimized at the expense of binary size.

Development

Prerequisites

Test

./test.sh && ./test.sh --release
Environment Variables
name type default value description
FMT true or false true Whether to run cargo fmt
LINT true or false true Whether to run cargo clippy
DOC true or false false Whether to run cargo doc
BUILD true or false true Whether to run cargo build
TEST true or false true Whether to run cargo test
BUILD_FLAGS string (empty) Space-separated list of flags for cargo build
TEST_FLAGS string (empty) Space-separated list of flags for cargo test

Run

./run pdu "${arguments[@]}"
  • "${arguments[@]}": List of arguments to pass to pdu.

Build

Debug build

cargo build --bin pdu

The resulting executable is located at target/debug/pdu.

Release build

cargo build --bin pdu --release

The resulting executable is located at target/release/pdu.

Update shell completion files

./generate-completions.sh

Extending parallel-disk-usage

The parallel-disk-usage crate is both a binary crate and a library crate. If you desire features that pdu itself lacks (that is, after you have asked the maintainer(s) of pdu for the features but they refused), you may use the library crate to build a tool of your own. The documentation for the library crate can be found in docs.rs.

Alternatively, the pdu command provides --json-input flag and --json-output flag. The --json-output flag converts disk usage data into JSON and the --json-input flag turns said JSON into visualization. These 2 flags allow integration with other CLI tools (via pipe, as per the UNIX philosophy).

Beware that the structure of the JSON tree differs depends on the number of file/directory names that were provided (as CLI arguments):

  • If there are only 0 or 1 file/directory names, the name of the tree root would be a real path (either . or the provided name).
  • If there are 2 or more file/directory names, the name of the tree root would be (total) (which is not a real path), and the provided names would correspond to the children of the tree root.

Installation

Any Desktop OS

From GitHub

Go to the GitHub Release Page and download a binary.

Prerequisites:

cargo install parallel-disk-usage --bin pdu

Arch Linux

Prerequisites:

  • An AUR helper, such as paru
paru -S parallel-disk-usage-bin
paru -S parallel-disk-usage

Follow the installation instruction then run the following command:

sudo pacman -S parallel-disk-usage

Distributions

Packaging Status

Similar programs

License

Apache 2.0 © Hoàng Văn Khải.

parallel-disk-usage's People

Contributors

byron avatar dependabot[bot] avatar ksxgithub avatar peret avatar renovate-bot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

parallel-disk-usage's Issues

Customize the alignment of the bars

The current and only alignment of the bars is to the right. This alignment disconnects smaller bars from their names, making visual look-up (using the human eyes to connect an item to its bar) harder. For this reason, left alignment is perhaps more ergonomic.

Possible improvements?

Hello,

Thank you for the (super fast and) very useful utility. I can think of a couple of optional improvements:

  • There could be a default pager when the output is more than one screen (similar to bat)
  • Colors could be also a good nice option

But again, good job for the useful util 👍

Informing @Byron about a reverting of a change

Hello @Byron, it looks like your commit at c8493b9 causes the benchmark to fail. When I run gdu locally, the error was: unknown flag: --count-links. I haven't update gdu version at all, so I'm not sure what's going on. But I'll be removing this flag.

Integration tests

  • Make sure that all variants are covered.
  • Make sure that it creates correct JSON.
  • Make sure that it creates correct visualization.
  • Error messages

Test case usual_cli::multiple_names sorts expected value incorrectly on some file systems

It looks like the test case usual_cli::multiple_names() is failing on aarch64-linux devices. See this log: https://logs.ofborg.org/?attempt_id=ea8d0c05-ebcb-4177-bf7a-b618bd8f8ff8&key=nixos%2Fnixpkgs.280371

Unfortunately, I don't have an AArch64 device capable of compiling the project and can't reproduce this myself.

What's interesting to me is that the actual value seems to be the correct (sorted) output, whereas the expected value seems to be in the wrong order.

Dependency Dashboard

This issue provides visibility into Renovate updates and their statuses. Learn more

Open

These updates have all been created already. Click a checkbox below to force a retry/rebase of any.


  • Check this box to trigger a request for Renovate to run again on this repository

Feature Request: Filter files by file extension or regex

Referencing questions raised here:

Do you only filter files? What about directories?

I would like to only get size information of files matching an extension(s) provided where the files are spread throughout multiple nested directories with other file content which file size I want to ignore.

My primary interest is total file size by extension in my case. Being able to get additional metrics like breakdown per directory or largest file locations (or largest directories of this content) are nice to haves.

Some tools provide ways to exclude files or directories, that can still be understandable and desirable in this context.


What is the syntax?

I'm not familiar with this tools CLI syntax as it's not documented and I've not yet downloaded/installed it to try it out.

I have used dutree with it's --agrr=500M option on directories with only the content I am interested in specifically (filtering is only useful here for a breakdown of different extensions, eg all image content and I want to know how much is PNG, what the top 10 sizes/paths are).

dutree also lacks filtering support. It does have an exclusion syntax -x but requires repeating this with every path you want to exclude, rather than a single string of delimited values.

In some software like Caddy I use a regex pattern (\.(jpeg|jpg|gif|png|webp|avif|svg)$) for caching requests, it also allows to specify multiple domains with comma delimiter (example.com,www.example.com,example.org). Anything like that as a value to some arg like --regex/--match/--filter-by/--pattern/etc would be good.

For interactive TUI, some users may find it convenient to instead interactively filter results instead.


I'm personally just interested in the ability to filter by file extension. Presently I am using the following shell script:

find . -type f -printf "%f %s\n" |
  awk '{
      PARTSCOUNT=split( $1, FILEPARTS, "." );
      EXTENSION=PARTSCOUNT == 1 ? "NULL" : FILEPARTS[PARTSCOUNT];
      FILETYPE_MAP[EXTENSION]+=$2
    }
   END {
     for( FILETYPE in FILETYPE_MAP ) {
       print FILETYPE_MAP[FILETYPE], FILETYPE;
      }
   }' | sort -n | numfmt --field=1 --to=iec-i --format "%8f" --suffix B

Which outputs results like this:

  9.5MiB css
   19MiB psd
   22MiB json
   24MiB md
   75MiB jpeg
  158MiB js
  174MiB php
  228MiB webp
  2.4GiB gif
  4.8GiB bsp
  4.8GiB pdmod
   12GiB jpg
   15GiB 7z
   16GiB png
   80GiB rar
   97GiB zip

That's a tad limited in output and what can be easily done vs what the nicer CLI tools for disk usage offer, and in this case it's not filtering by specific extensions but for the given output that's not a concern (I can easily identify the extensions I am interested and their sum size, no further detailed information/breakdown to sift through).

Current tools allow me to exclude dirs or scan specific dirs. If the file content is mixed however, it limits the usefulness of insights beyond overview of top file/dir sizes (or aggregated dir sizes).


Related feature request for dua

Improve documentation

  • How to build a DataTree.
  • How to visualize a DataTree.
  • How to use a DataTreeReflection.
  • How to use serde to serialize/deserialize data.

Lots of errors on runtime on macOS by default

The macOS file system is very picky about reading going through a user's home directory, so after approving the terminal app to access Reminders, Photos etc, pdu spits out these errors:

[error] read_dir "./Library/SafariTechnologyPreview": Operation not permitted (os error 1)
[error] read_dir "./.Trash": Operation not permitted (os error 1)
[error] read_dir "./Library/DuetExpertCenter": Operation not permitted (os error 1)
[error] read_dir "./Library/Autosave Information": Operation not permitted (os error 1)
[error] read_dir "./Library/IdentityServices": Operation not permitted (os error 1)
[error] read_dir "./Library/Accounts": Operation not permitted (os error 1)
[error] read_dir "./Library/Safari": Operation not permitted (os error 1)
[error] read_dir "./Library/Biome": Operation not permitted (os error 1)
[error] read_dir "./Library/Shortcuts": Operation not permitted (os error 1)
[error] read_dir "./Library/Messages": Operation not permitted (os error 1)
[error] read_dir "./Library/HomeKit": Operation not permitted (os error 1)
[error] read_dir "./Library/Sharing": Operation not permitted (os error 1)
[error] read_dir "./Library/Mail": Operation not permitted (os error 1)
[error] read_dir "./Library/Containers/com.apple.VoiceMemos": Operation not permitted (os error 1)
[error] read_dir "./Library/Containers/com.apple.archiveutility": Operation not permitted (os error 1)
[error] read_dir "./Library/CoreFollowUp": Operation not permitted (os error 1)
[error] read_dir "./Library/StatusKit": Operation not permitted (os error 1)
[error] read_dir "./Library/Cookies": Operation not permitted (os error 1)
[error] read_dir "./Library/Caches/com.apple.HomeKit": Operation not permitted (os error 1)
[error] read_dir "./Library/Caches/CloudKit": Operation not permitted (os error 1)
[error] read_dir "./Library/Caches/com.apple.Safari": Operation not permitted (os error 1)
[error] read_dir "./Library/Suggestions": Operation not permitted (os error 1)
[error] read_dir "./Library/Metadata/CoreSpotlight": Operation not permitted (os error 1)
[error] read_dir "./Library/Metadata/com.apple.IntelligentSuggestions": Operation not permitted (os error 1)
[error] read_dir "./Library/Group Containers/group.com.apple.secure-control-center-preferences": Operation not permitted (os error 1)
[error] read_dir "./Library/PersonalizationPortrait": Operation not permitted (os error 1)
[error] read_dir "./Library/Group Containers/group.com.apple.notes": Operation not permitted (os error 1)
[error] read_dir "./Library/Caches/FamilyCircle": Operation not permitted (os error 1)
[error] read_dir "./Library/Caches/com.apple.homed": Operation not permitted (os error 1)
[error] read_dir "./Library/Caches/com.apple.ap.adprivacyd": Operation not permitted (os error 1)
[error] read_dir "./Library/Application Support/MobileSync": Operation not permitted (os error 1)
[error] read_dir "./Library/Application Support/CallHistoryTransactions": Operation not permitted (os error 1)
[error] read_dir "./Library/Application Support/com.apple.AuthenticationServices/CredentialProviders": Operation not permitted (os error 1)
[error] read_dir "./Library/Containers/com.apple.Notes": Operation not permitted (os error 1)
[error] read_dir "./Library/Containers/com.apple.Home": Operation not permitted (os error 1)
[error] read_dir "./Library/Containers/com.apple.Safari": Operation not permitted (os error 1)
[error] read_dir "./Library/Containers/com.apple.news": Operation not permitted (os error 1)
[error] read_dir "./Library/Application Support/CloudDocs/session/db": Operation not permitted (os error 1)
[error] read_dir "./Library/Application Support/com.apple.sharedfilelist": Operation not permitted (os error 1)
[error] read_dir "./Library/Application Support/Knowledge": Operation not permitted (os error 1)
[error] read_dir "./Library/Application Support/com.apple.TCC": Operation not permitted (os error 1)
[error] read_dir "./Library/Application Support/FileProvider": Operation not permitted (os error 1)
[error] read_dir "./Library/Application Support/com.apple.avfoundation/Frecents": Operation not permitted (os error 1)
[error] read_dir "./Library/Application Support/CallHistoryDB": Operation not permitted (os error 1)
[error] read_dir "./Library/Containers/com.apple.CloudDocs.MobileDocumentsFileProvider": Operation not permitted (os error 1)
[error] read_dir "./Library/Containers/com.apple.mail": Operation not permitted (os error 1)
[error] read_dir "./Library/Containers/com.apple.corerecents.recentsd/Data/Library/Recents": Operation not permitted (os error 1)
[error] read_dir "./Library/Containers/com.apple.stocks": Operation not permitted (os error 1)
[error] read_dir "./Library/Containers/com.apple.SafariTechnologyPreview": Operation not permitted (os error 1)
[error] read_dir "./Library/Assistant/SiriVocabulary": Operation not permitted (os error 1)

On the contrast, dust spits out this error:

Did not have permissions for all directories

And both GNU du and BSD du on macOS spits out these errors:

du: cannot read directory './Library/SafariTechnologyPreview': Operation not permitted
du: cannot read directory './Library/Application Support/MobileSync': Operation not permitted
du: cannot read directory './Library/Application Support/CallHistoryTransactions': Operation not permitted
du: cannot read directory './Library/Application Support/com.apple.AuthenticationServices/CredentialProviders': Operation not permitted
du: cannot read directory './Library/Application Support/CloudDocs/session/db': Operation not permitted
du: cannot read directory './Library/Application Support/com.apple.sharedfilelist': Operation not permitted
du: cannot read directory './Library/Application Support/Knowledge': Operation not permitted
du: cannot read directory './Library/Application Support/com.apple.TCC': Operation not permitted
du: cannot read directory './Library/Application Support/FileProvider': Operation not permitted
du: cannot read directory './Library/Application Support/com.apple.avfoundation/Frecents': Operation not permitted
du: cannot read directory './Library/Application Support/CallHistoryDB': Operation not permitted
du: cannot read directory './Library/Assistant/SiriVocabulary': Operation not permitted
du: cannot read directory './Library/Autosave Information': Operation not permitted
du: cannot read directory './Library/IdentityServices': Operation not permitted
du: cannot read directory './Library/Messages': Operation not permitted
du: cannot read directory './Library/HomeKit': Operation not permitted
du: cannot read directory './Library/Sharing': Operation not permitted
du: cannot read directory './Library/Mail': Operation not permitted
du: cannot read directory './Library/DuetExpertCenter': Operation not permitted
du: cannot read directory './Library/Accounts': Operation not permitted
du: cannot read directory './Library/Safari': Operation not permitted
du: cannot read directory './Library/Biome': Operation not permitted
du: cannot read directory './Library/Shortcuts': Operation not permitted
du: cannot read directory './Library/Suggestions': Operation not permitted
du: cannot read directory './Library/Group Containers/group.com.apple.secure-control-center-preferences': Operation not permitted
du: cannot read directory './Library/Group Containers/group.com.apple.notes': Operation not permitted
du: cannot read directory './Library/Containers/com.apple.VoiceMemos': Operation not permitted
du: cannot read directory './Library/Containers/com.apple.archiveutility': Operation not permitted
du: cannot read directory './Library/Containers/com.apple.Home': Operation not permitted
du: cannot read directory './Library/Containers/com.apple.Safari': Operation not permitted
du: cannot read directory './Library/Containers/com.apple.CloudDocs.MobileDocumentsFileProvider': Operation not permitted
du: cannot read directory './Library/Containers/com.apple.mail': Operation not permitted
du: cannot read directory './Library/Containers/com.apple.Notes': Operation not permitted
du: cannot read directory './Library/Containers/com.apple.news': Operation not permitted
du: cannot read directory './Library/Containers/com.apple.corerecents.recentsd/Data/Library/Recents': Operation not permitted
du: cannot read directory './Library/Containers/com.apple.stocks': Operation not permitted
du: cannot read directory './Library/Containers/com.apple.SafariTechnologyPreview': Operation not permitted
du: cannot read directory './Library/PersonalizationPortrait': Operation not permitted
du: cannot read directory './Library/Metadata/CoreSpotlight': Operation not permitted
du: cannot read directory './Library/Metadata/com.apple.IntelligentSuggestions': Operation not permitted
du: cannot read directory './Library/Cookies': Operation not permitted
du: cannot read directory './Library/CoreFollowUp': Operation not permitted
du: cannot read directory './Library/StatusKit': Operation not permitted
du: cannot read directory './Library/Caches/com.apple.HomeKit': Operation not permitted
du: cannot read directory './Library/Caches/CloudKit': Operation not permitted
du: cannot read directory './Library/Caches/com.apple.Safari': Operation not permitted
du: cannot read directory './Library/Caches/FamilyCircle': Operation not permitted
du: cannot read directory './Library/Caches/com.apple.homed': Operation not permitted
du: cannot read directory './Library/Caches/com.apple.ap.adprivacyd': Operation not permitted

Can you Ignore duplicate inodes?

Firstly good job, pdu is really fast.

If you create a hard symlink like this: ln file hard_link it appears twice in pdu where as it is only taking up one 'space' on disk. Both file and hard_link have the same Inode - Hence I only 'count' the disk space for one of them. Therefore in dust I chose to ignore the duplicate file as removing either 'file' or 'hard_link' wouldn't recover disk space until they were both removed.

I'm curious to see how you would solve the above problem. I have tried and using channels (slow) and tried locks round a shared hashmap (even slower)

I'm in the middle of a dust re-write to see if I can make it faster based on some of your ideas. :-).

You can see me doing the check for duplicate inodes here:
https://github.com/bootandy/dust/blob/master/src/utils/mod.rs#L253

Pager functionality

Getting the idea from bat, it's useful to have a pager when the output is more than one screen

stack overflow error

PDU is generally very fast. I ran pdu on a rather large filesystem and I get the following error

thread '<unknown>' has overflowed its stack
fatal runtime error: stack overflow
[1]    37874 abort (core dumped)  pdu

Preserve root paths

  • pdu should display . as root.
  • pdu $(pwd) should display full path of current directory as root.

Colored output

It would make the output nicer (and maybe easier to interpret) if there's colored output

exclude path option

It would be great to be able to exclude some paths, for example: pdu $HOME -E subdir1 -E subdir2 or maybe even more fined grain like some specific extensions or filenames (but path is my main use case).
For completeness an include could also be useful when it's easier to filter by inclusion than exclusion.

HDD performance is poor

pdu performs about 2x worse on my HDD than single-threaded du. I'm testing on an old home directory of mine on a mechanical hard drive, with about 712 gigabytes of data in around 150,000 files. The size difference reported by the two programs is due to hard links.

$ echo 1 | sudo tee /proc/sys/vm/drop_caches
$ time pdu
...
765.0G ┌─┴.
pdu  0.69s user 2.93s system 4% cpu 1:18.21 total

Compared to du:

$ echo 1 | sudo tee /proc/sys/vm/drop_caches
$ time du -sh .
712G	.
du -sh .  0.28s user 1.46s system 3% cpu 47.405 total

I'm not positive on the source of this difference, but I believe it's due to the directory traversal order used by the two programs. du uses a depth-first search whereas pdu seems to use breadth-first search through rayon, although I can't tell for sure. Interestingly, pdu is comparable to du when manually limited to a single thread:

$ echo 1 | sudo tee /proc/sys/vm/drop_caches
$ time RAYON_NUM_THREADS=1 pdu
...
765.0G ┌─┴.
RAYON_NUM_THREADS=1 pdu  0.46s user 1.87s system 5% cpu 46.078 total

Better documentation for CLI usage

While searching for tools like this one, while it seems impressive/interesting, I have noticed a lack of CLI docs/examples, or at least they're not as easily visible as other projects make them such as on the README.

I know that if I install the binary or browse source I should have an easier time finding such information, but it'd be better UX to promote discovery of such to the project README or specific link to another document. Unless the main focus of the project is on using it as a crate.


Personally, I'm looking for a CLI that allows me to break analysis down further by filtering to file types/extensions, which seems to be a rare feature with only some tools offering the opposite of exclude rules only. I can raise a separate issue as a feature request for that if it interests you.

Fix the benchmark CI

The current implementation attempts to download the pdu binary from the Release Page. Unfortunately, for some reason, it fails to authenticate.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.