flamegraph-rs / flamegraph Goto Github PK

Easy flamegraphs for Rust projects and everything else, without Perl or pipes <3

License: Apache License 2.0

Rust 100.00%

flamegraph's Introduction

[cargo-]flamegraph

A Rust-powered flamegraph generator with additional support for Cargo projects! It can be used to profile anything, not just Rust projects! No perl or pipes required <3

How to use flamegraphs: what's a flamegraph, and how can I use it to guide systems performance work?

Relies on perf on linux and dtrace otherwise. Built on top of @jonhoo's wonderful Inferno all-rust flamegraph generation library!

Windows is getting dtrace support, so if you try this out please let us know how it goes :D

Note: If you're using lld on Linux, you must use the --no-rosegment flag. Otherwise perf will not be able to generate accurate stack traces (explanation). For example

[target.x86_64-unknown-linux-gnu]
linker = "/usr/bin/clang"
rustflags = ["-Clink-arg=-fuse-ld=lld", "-Clink-arg=-Wl,--no-rosegment"]

Installation

cargo install flamegraph

This will make the flamegraph and cargo-flamegraph binaries available in your cargo binary directory. On most systems this is usually something like ~/.cargo/bin.

Requirements on Linux:

Debian (x86 and aarch)

Note: Debian bullseye (the current stable version as of 2022) packages an outdated version of Rust which does not meet flamegraph's requirements. You should use rustup to install an up-to-date version of Rust, or upgrade to Debian bookworm (the current testing version) or newer.

sudo apt install -y linux-perf

Ubuntu (x86)

Not working on aarch, use a Debian distribution, or make a PR with your solution for Ubuntu

sudo apt install linux-tools-common linux-tools-generic linux-tools-`uname -r`

Ubuntu/Ubuntu MATE (Raspberry Pi)

sudo apt install linux-tools-raspi

Pop!_OS

sudo apt install linux-tools-common linux-tools-generic

Shell auto-completion

At the moment, only flamegraph supports auto-completion. Supported shells are bash, fish, zsh, powershell and elvish. cargo-flamegraph does not support auto-completion because it is not as straight-forward to implement for custom cargo subcommands. See #153 for details.

How you enable auto-completion depends on your shell, e.g.

flamegraph --completions bash > $XDG_CONFIG_HOME/bash_completion # or /etc/bash_completion.d/

Examples

# if you'd like to profile an arbitrary executable:
flamegraph [-o my_flamegraph.svg] -- /path/to/my/binary --my-arg 5

# or if the executable is already running, you can provide the PID via `-p` (or `--pid`) flag:
flamegraph [-o my_flamegraph.svg] --pid 1337

# NOTE: By default, perf tries to compute which functions are
# inlined at every stack frame for every sample. This can take
# a very long time (see https://github.com/flamegraph-rs/flamegraph/issues/74).
# If you don't want this, you can pass --no-inline to flamegraph:
flamegraph --no-inline [-o my_flamegraph.svg] /path/to/my/binary --my-arg 5

# cargo support provided through the cargo-flamegraph binary!
# defaults to profiling cargo run --release
cargo flamegraph

# by default, `--release` profile is used,
# but you can override this:
cargo flamegraph --dev

# if you'd like to profile a specific binary:
cargo flamegraph --bin=stress2

# if you want to pass arguments as you would with cargo run:
cargo flamegraph -- my-command --my-arg my-value -m -f

# if you want to use interesting perf or dtrace options, use `-c`
# this is handy for correlating things like branch-misses, cache-misses,
# or anything else available via `perf list` or dtrace for your system
cargo flamegraph -c "record -e branch-misses -c 100 --call-graph lbr -g"

# Run criterion benchmark
# Note that the last --bench is required for `criterion 0.3` to run in benchmark mode, instead of test mode.
cargo flamegraph --bench some_benchmark --features some_features -- --bench

cargo flamegraph --example some_example --features some_features

# Profile unit tests.
# Note that a separating `--` is necessary if `--unit-test` is the last flag.
cargo flamegraph --unit-test -- test::in::package::with::single::crate
cargo flamegraph --unit-test crate_name -- test::in::package::with::multiple:crate
cargo flamegraph --unit-test --dev test::may::omit::separator::if::unit::test::flag::not::last::flag

# Profile integration tests.
cargo flamegraph --test test_name

Usage

flamegraph is quite simple. cargo-flamegraph is more sophisticated:

Usage: cargo flamegraph [OPTIONS] [-- <TRAILING_ARGUMENTS>...]

Arguments:
  [TRAILING_ARGUMENTS]...  Trailing arguments passed to the binary being profiled

Options:
      --dev                            Build with the dev profile
      --profile <PROFILE>              Build with the specified profile
  -p, --package <PACKAGE>              package with the binary to run
  -b, --bin <BIN>                      Binary to run
      --example <EXAMPLE>              Example to run
      --test <TEST>                    Test binary to run (currently profiles the test harness and all tests in the binary)
      --unit-test [<UNIT_TEST>]        Crate target to unit test, <unit-test> may be omitted if crate only has one target (currently profiles the test harness and all tests in the binary; test selection can be passed as trailing arguments after `--` as separator)
      --bench <BENCH>                  Benchmark to run
      --manifest-path <MANIFEST_PATH>  Path to Cargo.toml
  -f, --features <FEATURES>            Build features to enable
      --no-default-features            Disable default features
  -r, --release                        No-op. For compatibility with `cargo run --release`
  -v, --verbose                        Print extra output to help debug problems
  -o, --output <OUTPUT>                Output file [default: flamegraph.svg]
      --open                           Open the output .svg file with default program
      --root                           Run with root privileges (using `sudo`)
  -F, --freq <FREQUENCY>               Sampling frequency in Hz [default: 997]
  -c, --cmd <CUSTOM_CMD>               Custom command for invoking perf/dtrace
      --deterministic                  Colors are selected such that the color of a function does not change between runs
  -i, --inverted                       Plot the flame graph up-side-down
      --reverse                        Generate stack-reversed flame graph
      --notes <STRING>                 Set embedded notes in SVG
      --min-width <FLOAT>              Omit functions smaller than <FLOAT> pixels [default: 0.01]
      --image-width <IMAGE_WIDTH>      Image width in pixels
      --palette <PALETTE>              Color palette [possible values: hot, mem, io, red, green, blue, aqua, yellow, purple, orange, wakeup, java, perl, js, rust]
      --skip-after <FUNCTION>          Cut off stack frames below <FUNCTION>; may be repeated
      --flamechart                     Produce a flame chart (sort by time, do not merge stacks)
      --ignore-status                  Ignores perf's exit code
      --no-inline                      Disable inlining for perf script because of performance issues
      --post-process <POST_PROCESS>    Run a command to process the folded stacks, taking the input from stdin and outputting to stdout
  -h, --help                           Print help
  -V, --version                        Print version

Then open the resulting flamegraph.svg with a browser, because most image viewers do not support interactive svg-files.

Enabling perf for use by unprivileged users

To enable perf without running as root, you may lower the perf_event_paranoid value in proc to an appropriate level for your environment. The most permissive value is -1 but may not be acceptable for your security needs etc...

echo -1 | sudo tee /proc/sys/kernel/perf_event_paranoid

DTrace on macOS

On macOS, there is no alternative to running as superuser in order to enable DTrace. This should be done by invoking sudo flamegraph ... or cargo flamegraph --root .... Do not do sudo cargo flamegraph ...; this can cause problems due to Cargo's build system being run as root.

Be aware that if the binary being tested is user-aware, this does change its behaviour.

Improving output when running with `--release`

Due to optimizations etc... sometimes the quality of the information presented in the flamegraph will suffer when profiling release builds.

To counter this to some extent, you may either set the following in your Cargo.toml file:

[profile.release]
debug = true

Or set the environment variable CARGO_PROFILE_RELEASE_DEBUG=true.

Please note that tests, unit tests and benchmarks use the bench profile in release mode (see here).

Usage with benchmarks

In order to perf existing benchmarks, you should set up a few configs. Set the following in your Cargo.toml file to run benchmarks:

[profile.bench]
debug = true

Use custom paths for perf and dtrace

If PERF or DTRACE environment variable is set, it'll be used as corresponding tool command. For example, to use perf from ~/bin:

env PERF=~/bin/perf flamegraph /path/to/my/binary

Systems Performance Work Guided By Flamegraphs

Flamegraphs are used to visualize where time is being spent in your program. Many times per second, the threads in a program are interrupted and the current location in your code (based on the thread's instruction pointer) is recorded, along with the chain of functions that were called to get there. This is called stack sampling. These samples are then processed and stacks that share common functions are added together. Then an SVG is generated showing the call stacks that were measured, widened to the proportion of all stack samples that contained them.

The y-axis shows the stack depth number. When looking at a flamegraph, the main function of your program will be closer to the bottom, and the called functions will be stacked on top, with the functions that they call stacked on top of them, etc...

The x-axis spans all of the samples. It does not show the passing of time from left to right. The left to right ordering has no meaning.

The width of each box shows the total time that that function is on the CPU or is part of the call stack. If a function's box is wider than others, that means that it consumes more CPU per execution than other functions, or that it is called more than other functions.

The color of each box isn't significant, and is chosen at random.

Flamegraphs are good for visualizing where the most expensive parts of your program are at runtime, which is wonderful because...

Humans are terrible at guessing about performance!

Especially people who come to Rust from C and C++ will often over-optimize things in code that LLVM is able to optimize away on its own. It's always better to write Rust in a clear and obvious way, before beginning micro-optimizations, allocation-minimization, etc...

Lots of things that would seem like they would have terrible performance are actually cheap or free in Rust. Closures are fast. Initialization on the stack before moving to a Box is often compiled away. Clones are often compiled away. So, clone() away instead of fighting for too long to get the compiler to be happy about ownership!

Then make a flamegraph to see if any of that was actually expensive.

Flamegraphs Are the Beginning, Not the End

Flamegraphs show you the things that are taking up time, but they are a sampling technique to be used for high-level and initial looks at the system under measurement. They are great for finding the things to look into more closely, and often it will be obvious how to improve something based on its flamegraph, but they are really more for choosing the target to perform optimization on than an optimization measurement tool in itself. They are coarse-grained, and difficult to diff (although this may be supported soon). Also, because flamegraphs are based on the proportion of total time that something takes, if you accidentally make something else really slow, it will show all other things as smaller on the flamegraph, even though the entire program runtime is much slower, the items you were hoping to optimize look smaller.

It is a good idea to use Flamegraphs to figure out what you want to optimize, and then set up a measurement environment that allows you to determine that an improvement has actually happened.

use flamegraphs to find a set of optimization targets
create benchmarks for these optimization targets, and if appropriate use something like cachegrind and cg_diff to measure cpu instructions and diff them against the previous version.
Measuring CPU instructions is often better than measuring the time it takes to run a workload in many cases, because it's possible that a background task on your machine ran and caused something to slow down in terms of physical time, but if you actually made an implementation faster, it is likely to have a stronger correlation with reduced total CPU instructions.
Time spent on the CPU is not the full picture, as time is spent waiting for IO to complete as well, which does not get accounted with tools like perf that only measure what's consuming time on the CPU. Check out Brendan Gregg's article on Off-Cpu Accounting for more information about this!

Performance Theory 101: Basics of Quantitative Engineering

Use realistic workloads on realistic hardware, or your data doesn't necessarily correspond very much with what will be happening in production
All of our guesses are wrong to some extent, so we have to measure the effects of our work. Often the simple code that doesn't seem like it should be fast is actually way faster than code that looks optimized. We need to measure our optimizations to make sure that we didn't make our code both harder to read AND slower.
Measure before you change anything, and save the results in a safe place! Many profiling tools will overwrite their old output when you run them again, so make sure you take care to save the data before you begin so that you can compare before and after.
Take measurements on a warmed up machine that isn't doing anything else, and has had time to cool off from the last workload. CPUs will fall asleep and drop into power-saving modes when idle, and they will also throttle if they get too hot (sometimes SIMD can cause things to run slower because it heats things up so much that the core has to throttle).

Performance Theory 202: USE Method

The USE Method is a way to very quickly locate performance problems while minimizing discovery efforts. It's more about finding production issues than flamegraphs directly, but it's a great technique to have in your toolbox if you are going to be doing performance triage, and flamegraphs can be helpful for identifying the components to then drill down into queue analysis for.

Everything in a computer can be thought of as a resource with a queue in front of it, which can serve one or more requests at a time. The various systems in our computers and programs can do a certain amount of work over time before requests start to pile up and wait in a line until they are able to be serviced.

Some resources can handle more and more work without degrading in performance until they hit their maximum utilization point. Network devices can be thought of as working in this way to a large extent. Other resources start to saturate long before they hit their maximum utilization point, like disks.

Disks (especially spinning disks, but even SSDs) will do more and more work if you allow more work to queue up until they hit their maximum throughput for a workload, but the latency per request will go up before it hits 100% utilization because the disk will take longer before it can begin servicing each request. Tuning disk performance often involves measuring the various IO queue depths to make sure they are high enough to get nice throughput but not so high that latency becomes undesirable.

Anyway, nearly everything in our systems can be broken down to be analyzed based on 3 high-level characteristics:

Utilization is the amount of time the system under measurement is actually doing useful work servicing a request, and can be measured as the percent of available time spent servicing requests
Saturation is when requests have to wait before being serviced. This can be measured as the queue depth over time
Errors are when things start to fail, like when queues are no longer able to accept any new requests - like when a TCP connection is rejected because the system's TCP backlog is already full of connections that have not yet been accept'ed by the userspace program.

This forms the necessary background to start applying the USE Method to locate the performance-related issue in your complex system!

The approach is:

Enumerate the various resources that might be behaving poorly - maybe by creating a flamegraph and looking for functions that are taking more of the total runtime than expected
Pick one of them
(Errors) Check for errors like TCP connection failures, other IO failures, bad things in logs etc...
(Utilization) Measure the utilization of the system and see if its throughput is approaching the known maximum, or the point that it is known to experience saturation
(Saturation) Is saturation actually happening? Are requests waiting in lines before being serviced? Is latency going up while throughput is staying the same?

These probing questions serve as a piercing flashlight for rapidly identifying the underlying issue most of the time.

If you want to learn more about this, check out Brendan Gregg's blog post on it. I tend to recommend that anyone who is becoming an SRE should make Brendan's Systems Performance book one of the first things they read to understand how to measure these things quickly in production systems.

The USE Method derives from an area of study called queue theory which has had a huge impact on the world of computing, as well as many other logistical endeavors that humans have undertaken.

Performance Laws

If you want to drill more into theory, know the law(s)!

Universal Law of Scalability is about the relationship between concurrency gains, queuing and coordination costs
Amdahl's Law is about the theoretical maximum gain that can be made for a workload by parallelization.
Little's Law is a deceptively simple law with some subtle implications from queue theory that allows us to reason about appropriate queue lengths for our systems

flamegraph's People

Contributors

Stargazers

Watchers

Forkers

isgasho licenser killercup kfabryczny ibabushkin mbrukman ambiso iamsingularity atma xjump crazyfork jasonrhansen rjloura tempbottle embarkstudios atul9 wathiede versbinarii hhtiger ralith stevenzwzhai asankaran sinkuu leod itshajia necabo ssundarr3 gfiona chinedufn ptruser h33p blob79 softwareape joshtriplett yoshuawuyts vlthr spitfire05 saethlin michaelkirk nikclayton-dfinity arnej dfinity-lab pikajude xyl012 age-rs austinabell kdy1 ashpil yunstanford 0 eugene-babichenko icodein jakkusakura zthompson47 muskanmahajan37 lanjackg2003 volker-weissmann clarkguan denis2glez gregtatum ede1998 horki aaronc81 scrub-stack mahkoh huangmeme romange benoitgillet gongwang666 bjchambers superfluffy prokls domodwyer arstercz smu160 kraktus zhaopufeng ishitatsuyuki flamegraphs tranzystorekk kevinjohna6 byter09 ajay272191 forsakenharmony ccmlm martinxyz shaoxi2010 mstange idonec nico-abram rexchun sbc64 niederb zeta1999 hopkings2008 kreibaum ckcr4lyf hal8174 richardg867 adambratschikaye

flamegraph's Issues

Support for name unmangling

The flamegraph contains a lot of special characters that can be replaced to make the graph more readable.
Yamakaky wrote a script that fixes this.
We could sed over the svg directly, except for this line: s/[^\.]\.[^\.]/./g which breaks the svg format.

Original:

Improved:

Can we support this?

Support diff mode

perl flamegraph has a mode to generate diff graphs, it would be useful to have it.

Support profiling a process by its pid for a given duration

Panics on OSX: unable to generate a flame graph

First of all, thanks so much for adding support to dtrace and OSX!

I'm trying to use cargo flame graph on bench output on OSX. This is how I run it:

sudo cargo flamegraph --exec="target/release/bench-aba573ea464f3f67"

(where bench-aba... is the executable produced by running cargo bench first)

The error:

unpack delta u64s       time:   [73.363 ns 73.823 ns 74.303 ns]
                        change: [-22.503% -20.920% -19.423%] (p = 0.00 < 0.05)
                        Performance has improved.

dtrace: pid 87448 has exited
thread 'main' panicked at 'unable to generate a flamegraph from the collapsed stack data: Io(Custom { kind: InvalidData, error: StringError("No stack counts found") })', src/libcore/result.rs:1009:5
note: Run with `RUST_BACKTRACE=1` for a backtrace.

NOTE: this is how I used to run dtrace to get a flame graph:

sudo dtrace -c './target/release/bench-2022f41cf9c87baf --profile-time 120' -o out.stacks -n 'profile-997 /pid == $target/ { @[ustack(100)] = count(); }'
~/src/github/FlameGraph/stackcollapse.pl out.stacks | ~/src/github/FlameGraph/flamegraph.pl >rust-bench.svg

I did not add --profile-time this time but I don't think that would make a difference.

Two issues on macOS

Outline

WARNING: building without debuginfo. even though debuginfo is enabled.
command line arguments meant for the program are passed to dtrace, fails with dtrace: invalid probe specifier profile-997 /pid == $target/ { @[ustack(100)] = count(); }: extraneous argument 'me' ($1 is not referenced)

How to reproduce

The first one

on macOS
cargo install flamegraph
cargo new flamegraph_test; cd flamegraph_test
Add to Cargo.toml

[profile.release]
debug = true

Ensure debug symbols exist: cargo build --release; ls target/release/flamegraph_test.dSYM/
sudo cargo flamegraph → see warning even if debug symbols exist

Btw. flamegraph.svg contains the function names, so apparently it does find the debug symbols, and the warning a false alarm.

The second one

echo 'fn main() { println!("Hello, world! By: {:?}", std::env::args().nth(1)); }' > src/main.rs
cargo run -- me
Prints out:

   Compiling flamegraph_test v0.1.0 (/Users/um003415/repos/flamegraph_test)
    Finished dev [unoptimized + debuginfo] target(s) in 3.92s
     Running `target/debug/flamegraph_test me`
Hello, world! By: Some("me")

sudo cargo flamegraph -- me
Prints out:

dtrace: invalid probe specifier profile-997 /pid == $target/ { @[ustack(100)] = count(); }: extraneous argument 'me' ($1 is not referenced)
Hello, world! By: None
failed to sample program

Readme should mention the SVG is interactive

The default image viewer on some systems shows it as a simple image and the function names are too long to fit. It took me a while to figure out that i can open it in e.g. firefox and hover over them to see the full name.

flamegraph comes out flat when using on Docker (debian)

Steps to reproduce

Clone https://github.com/jasonwilliams/boa
Start the docker container with the image provided (i do this via vscode)
Install perf - sudo apt-get install linux-perf
Run cargo flamegraph --dev --bin boa_cli
Open up the flamegraph.svg

Release and distribute binaries for non-Rust devs

If you supply pre-built binaries (maybe also make them available via homebrew), I think this will quickly become the way to answer "code, y u so slow". Especially interesting once #7 is implemented.

Flag for running perf or dtrace as root

It'd be handy to have a flag like --root that called perf and dtrace using sudo to avoid the need for changing system-wide permissions. It also (from memory) gives better visibility into kernel symbols, so a double win!

Cannot run benchmarks for library crate

Some strange oddities. I'm currently developing a quick Python module in Rust to do curve simplification, and I'm trying to use flamegraph to help me profile the code.

Running a specific benchmark works fine:

[shoobs@fabiana curved]$ cargo bench --bench rdp
   Compiling curved v0.1.0 (/home/shoobs/Code/curved)
    Finished bench [optimized + debuginfo] target(s) in 3.30s
     Running target/release/deps/rdp-6098376c514929c6

running 1 test
test large_2d ... bench:  75,888,252 ns/iter (+/- 10,891,873)

test result: ok. 0 passed; 0 failed; 0 ignored; 1 measured; 0 filtered out

Trying to run flamegraph for the same benchmark:

[shoobs@fabiana curved]$ cargo flamegraph --bench rdp
   Compiling pyo3 v0.9.0
   Compiling numpy v0.8.0
   Compiling curved v0.1.0 (/home/shoobs/Code/curved)
    Finished release [optimized] target(s) in 9.50s
could not find desired target rdp in the bench targets for this crate

It's also odd that running flamegraph forces a recompile of a few libraries even after running the benchmark immediately before.

flamegraph sometimes runs the wrong binary in `--test` mode

Command:
cargo flamegraph --test fill_integ

Binary that gets run (snooped via strace):
rust-wasm/target/release/fill_integ-0d31b301ef6d061y

contents of folder:

➜  rust-wasm git:(rust-wasm) ✗ ls -latr target/release/fill_integ-*
-rw-rw-r-- 1 russell russell      584 Mar 15 23:08 target/release/fill_integ-0d31b301ef6d0617.d
-rwxrwxr-x 3 russell russell 11542904 Mar 16 01:03 target/release/fill_integ-0d31b301ef6d0617
-rw-rw-r-- 1 russell russell      584 Mar 16 23:45 target/release/fill_integ-dc6d5509fba706c0.d
-rwxrwxr-x 2 russell russell 12489304 Mar 18 11:25 target/release/fill_integ-dc6d5509fba706c0

-dc6.... is the new binary that should be run but flamegraph is runnin g 0d31b.

Workaround is deleting the old binary.

Profiling with workspace projects

[profile.release]
debug = true

profile seems to be ignored from my workspace projects in the root Cargo.toml

Finished release [optimized + debuginfo] target(s) in 2m 28s

WARNING: building without debuginfo. Enable symbol information by adding the following lines to Cargo.toml:

[profile.release]
debug = true

Feature request: Specifying path to `perf` through environment

It seems the command perf (also, dtrace) is hard-coded and should be contained in PATH.
If perf is installed anywhere else, we need to modify PATH before running it, which is not a good way and may break other programs.

Related: NixOS/nixpkgs#76313

Support for running tests

Given that the tool already supports running binaries and examples (and people seem to want benchmark support too), would there be interest in having the same functionality for tests? The only issue I see with this kind of functionality would be the fact that one would profile the test harness and/or multiple tests in the same binary in some cases.

Additionally, one can obviously just find the binary cargo has built and running that using flamegraph, but that's somewhat annoying.

I have some hacky code for this around, but before I clean it up and open a PR I wanted to sample what opinions the authors and maintainers have.

How to run an example with command line arguments?

I have an example examples/read.rs and trying to run it with flamegraph:

$ cargo flamegraph --example read            
    Finished release [optimized] target(s) in 0.04s

WARNING: building without debuginfo. Enable symbol information by adding the following lines to Cargo.toml:

[profile.release]
debug = true

thread 'main' panicked at 'could not spawn perf: Os { code: 2, kind: NotFound, message: "No such file or directory" }', src/libcore/result.rs:997:5

Can not pass parameters to dtrace

I need to pass parameters to my binary,
I've tried

sudo cargo flamegraph -- ./target/release/minos -f ./3minirefparams.yaml

and

sudo flamegraph -- ./target/release/minos -f ./3minirefparams.yaml

and

sudo flamegraph ./target/release/minos -f ./3minirefparams.yaml

All of these would complain with
dtrace: invalid probe specifier ./3minirefparams.yaml: syntax error near "minirefparams"

or similar error. It does not seem to like any parameter

Trying this on macos with dtrace backend

Whitespace in program arguments is parsed incorrectly

If there's any whitespace in the arguments intended for the profiled program, it looks like those arguments are incorrectly split into multiple individual parts.

src/main.rs:

fn main() {
    println!("{:?}", std::env::args().skip(1).collect::<Vec<_>>());
}

Running normally:

$ cargo run 'space cadet'
    Finished dev [unoptimized + debuginfo] target(s) in 0.00s
     Running `target/debug/foo 'space cadet'`
["space cadet"]

Running with cargo-flamegraph:

$ cargo flamegraph 'space cadet'
    Finished release [optimized + debuginfo] target(s) in 0.00s
["space", "cadet"]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0,018 MB perf.data (5 samples) ]
writing flamegraph to "flamegraph.svg"

Note how the program is incorrectly getting two arguments instead of just one. In practice this makes it impossible to pass any arguments with whitespace to the program.

$ cargo-flamegraph --version
cargo-flamegraph 0.1.9

Option to use `sample` on macos instead of `dtrace`

dtrace is not available on macOS when "system integrity protection is on"

% sudo flamegraph ls
dtrace: system integrity protection is on, some features will not be available

dtrace: failed to execute ls: dtrace cannot control executables signed with restricted entitlements
failed to sample program

It would be helpful if flamegraph could also use sample program for collecting samples.

windows install error

error[E0433]: failed to resolve: could not find `unix` in `os`
  --> C:\Users\Dell\.cargo\registry\src\github.com-1ecc6299db9ec823\signal-hook-0.1.10\src\iterator.rs:55:14
   |
55 | use std::os::unix::io::AsRawFd;
   |              ^^^^ could not find `unix` in `os`

error[E0433]: failed to resolve: could not find `unix` in `os`
  --> C:\Users\Dell\.cargo\registry\src\github.com-1ecc6299db9ec823\signal-hook-0.1.10\src\iterator.rs:56:14
   |
56 | use std::os::unix::net::UnixStream;
   |              ^^^^ could not find `unix` in `os`

error[E0433]: failed to resolve: could not find `unix` in `os`
  --> C:\Users\Dell\.cargo\registry\src\github.com-1ecc6299db9ec823\signal-hook-0.1.10\src\pipe.rs:78:14
   |
78 | use std::os::unix::io::{AsRawFd, RawFd};
   |              ^^^^ could not find `unix` in `os`

error[E0432]: unresolved imports `libc::SIGALRM`, `libc::SIGBUS`, `libc::SIGCHLD`, `libc::SIGCONT`, `libc::SIGHUP`, `libc::SIGIO`, `libc::SIGKILL`, `libc::SIGPIPE`, `libc::SIGPROF`, `libc::SIGQUIT`, `libc::SIGSTOP`, `libc::SIGSYS`, `libc::SIGTRAP`, `libc::SIGUSR1`, `libc::SIGUSR2`, `libc::SIGWINCH`
   --> C:\Users\Dell\.cargo\registry\src\github.com-1ecc6299db9ec823\signal-hook-0.1.10\src\lib.rs:137:14
    |
137 |     SIGABRT, SIGALRM, SIGBUS, SIGCHLD, SIGCONT, SIGFPE, SIGHUP, SIGILL, SIGINT, SIGIO, SIGKILL,
    |              ^^^^^^^  ^^^^^^  ^^^^^^^  ^^^^^^^          ^^^^^^                  ^^^^^  ^^^^^^^ no `SIGKILL` in the root
    |              |        |       |        |                |                       |
    |              |        |       |        |                |                       no `SIGIO` in the root
    |              |        |       |        |                no `SIGHUP` in the root
    |              |        |       |        no `SIGCONT` in the root
    |              |        |       no `SIGCHLD` in the root
    |              |        no `SIGBUS` in the root
    |              no `SIGALRM` in the root
138 |     SIGPIPE, SIGPROF, SIGQUIT, SIGSEGV, SIGSTOP, SIGSYS, SIGTERM, SIGTRAP, SIGUSR1, SIGUSR2,
    |     ^^^^^^^  ^^^^^^^  ^^^^^^^           ^^^^^^^  ^^^^^^           ^^^^^^^  ^^^^^^^  ^^^^^^^
    |     |        |        |
    |     |        |        no `SIGQUIT` in the root
    |     |        no `SIGPROF` in the root
    |     no `SIGPIPE` in the root
139 |     SIGWINCH,
    |     ^^^^^^^^
help: a similar name exists in the module
    |
137 |     SIGABRT, SIGTERM, SIGBUS, SIGCHLD, SIGCONT, SIGFPE, SIGHUP, SIGILL, SIGINT, SIGIO, SIGKILL,
    |              ^^^^^^^
help: a similar name exists in the module
    |
137 |     SIGABRT, SIGALRM, SIGBUS, SIGCHLD, SIGINT, SIGFPE, SIGHUP, SIGILL, SIGINT, SIGIO, SIGKILL,
    |                                        ^^^^^^
help: a similar name exists in the module
    |
137 |     SIGABRT, SIGALRM, SIGBUS, SIGCHLD, SIGCONT, SIGFPE, SIGHUP, SIGILL, SIGINT, SIGIO, SIGILL,
    |                                                                                        ^^^^^^
help: a similar name exists in the module
    |
138 |     SIGFPE, SIGPROF, SIGQUIT, SIGSEGV, SIGSTOP, SIGSYS, SIGTERM, SIGTRAP, SIGUSR1, SIGUSR2,
    |     ^^^^^^

error[E0433]: failed to resolve: use of undeclared type or module `UnixStream`
   --> C:\Users\Dell\.cargo\registry\src\github.com-1ecc6299db9ec823\signal-hook-0.1.10\src\iterator.rs:166:29
    |
166 |         let (read, write) = UnixStream::pair()?;
    |                             ^^^^^^^^^^ use of undeclared type or module `UnixStream`

error[E0412]: cannot find type `UnixStream` in this scope
  --> C:\Users\Dell\.cargo\registry\src\github.com-1ecc6299db9ec823\signal-hook-0.1.10\src\iterator.rs:73:11
   |
73 |     read: UnixStream,
   |           ^^^^^^^^^^ not found in this scope

error[E0412]: cannot find type `UnixStream` in this scope
  --> C:\Users\Dell\.cargo\registry\src\github.com-1ecc6299db9ec823\signal-hook-0.1.10\src\iterator.rs:74:12
   |
74 |     write: UnixStream,
   |            ^^^^^^^^^^ not found in this scope

error[E0425]: cannot find function `recv` in module `libc`
   --> C:\Users\Dell\.cargo\registry\src\github.com-1ecc6299db9ec823\signal-hook-0.1.10\src\iterator.rs:240:19
    |
240 |             libc::recv(
    |                   ^^^^ not found in `libc`

error[E0425]: cannot find value `MSG_DONTWAIT` in module `libc`
   --> C:\Users\Dell\.cargo\registry\src\github.com-1ecc6299db9ec823\signal-hook-0.1.10\src\iterator.rs:244:44
    |
244 |                 if wait { 0 } else { libc::MSG_DONTWAIT },
    |                                            ^^^^^^^^^^^^ not found in `libc`

error[E0412]: cannot find type `RawFd` in this scope
  --> C:\Users\Dell\.cargo\registry\src\github.com-1ecc6299db9ec823\signal-hook-0.1.10\src\pipe.rs:84:26
   |
84 | pub(crate) fn wake(pipe: RawFd) {
   |                          ^^^^^ not found in this scope

error[E0425]: cannot find function `send` in module `libc`
  --> C:\Users\Dell\.cargo\registry\src\github.com-1ecc6299db9ec823\signal-hook-0.1.10\src\pipe.rs:96:15
   |
96 |         libc::send(pipe, b"X" as *const _ as *const _, 1, libc::MSG_DONTWAIT);
   |               ^^^^ not found in `libc`

error[E0425]: cannot find value `MSG_DONTWAIT` in module `libc`
  --> C:\Users\Dell\.cargo\registry\src\github.com-1ecc6299db9ec823\signal-hook-0.1.10\src\pipe.rs:96:65
   |
96 |         libc::send(pipe, b"X" as *const _ as *const _, 1, libc::MSG_DONTWAIT);
   |                                                                 ^^^^^^^^^^^^ not found in `libc`

error[E0412]: cannot find type `RawFd` in this scope
   --> C:\Users\Dell\.cargo\registry\src\github.com-1ecc6299db9ec823\signal-hook-0.1.10\src\pipe.rs:104:42
    |
104 | pub fn register_raw(signal: c_int, pipe: RawFd) -> Result<SigId, Error> {
    |                                          ^^^^^ not found in this scope

error[E0405]: cannot find trait `AsRawFd` in this scope
   --> C:\Users\Dell\.cargo\registry\src\github.com-1ecc6299db9ec823\signal-hook-0.1.10\src\pipe.rs:116:8
    |
116 |     P: AsRawFd + Send + Sync + 'static,
    |        ^^^^^^^ not found in this scope

error: aborting due to 14 previous errors

Some errors have detailed explanations: E0405, E0412, E0425, E0432, E0433.
For more information about an error, try `rustc --explain E0405`.
error: Could not compile `signal-hook`.

windows install error

OS: win10
Rust: 1.38 stable amd64
cargo install flamegraph, I get error like this:

Compiling inferno v0.4.1
Compiling cargo_metadata v0.7.4
Compiling flamegraph v0.1.13
error[E0425]: cannot find value exit_status in this scope
--> d:\Program\cargo\registry\src\mirrors.ustc.edu.cn-61ef6e0cd06fb9b8\flamegraph-0.1.13\src\lib.rs:134:6
|
134 | !exit_status.success()
| ^^^^^^^^^^^ not found in this scope

error: aborting due to previous error

For more information about this error, try rustc --explain E0425.
error: Could not compile flamegraph.
warning: build failed, waiting for other jobs to finish...
error: failed to compile flamegraph v0.1.13, intermediate artifacts can be found at C:\Temp\cargo-installwtC1Id

Caused by:
build failed

Could flamegraph follow forks?

So I just ran cargo-flamegraph on my own project and found the perf data stopped after a call to fork. I understand not following the child but the parent which keeps the same PID is where all my heavy processing is done.

New release

Is there a set of things that need to be implemented before the next release? Just asking cause I'm installing via git so I can get features not yet published to cargo i.e. --bench

unable to generate a flamegraph from the collapsed stack data

I'm receiving this error error within an ubuntu container running on a VM (Virtualbox) atop MacOS Mojave (via multiple runs on 0.1.13 and the current git/master version):

cargo flamegraph --bin=skeleton -- --file=skeleton.toml

Then everything runs as expected, but then:

...
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.119 MB perf.data (8 samples) ]
writing flamegraph to "flamegraph.svg"
thread 'main' panicked at 'unable to generate a flamegraph from the collapsed stack data: Io(Custom { kind: InvalidData, error: "No stack counts found" })', src/libcore/result.rs:1165:5

Seemingly related to #16, but this is now within a container on a VM instead.

Thanks.

problem with envionment variables

My program work with envionment variables, but it's seems like it cann't find them when I run flamegraph.

so how can i pass all the envionment variables to my program? And I'm working with MacOS.

Passing perf record -F option

perf record can take -F option to control sampling frequency. Adding a way to pass it to perf would be useful.

extend flamegraph to take an optional pid and timespan

Adding a timeout to flamegraph would be nice for long running commands.

Example of a command printing output once per second forever:
# flamegraph -o example.svg vfsstat 1

In this scenario being able to pass a 30s timeout value that results in tick-30s { exit(0) }' would be great.
# flamegraph -t 30s -o example.svg vfsstat 1

Additionally being able to run flamegraph against an already running process by specifying a pid and timeout would be nice.
flamegraph -p $(pgrep my-server) -t 30s -o server.svg

flamegraph incompatible with macOS dtrace

When running flamegraph on version 0.2.0, it only results in dtrace showing me it's usage message, indicating that the flamegraph tool used a wrong invocation.

For reference, I use macOS Mojave (Version 10.14.6) with the stock dtrace.

New release?

Hey 👋,

I wanted to use flamegraph for some tests which seems to have been added in #27 however I think the most recent release was cut before this PR was merged.

Could you release a new verison please?

Cannot pass --bench argument to the executable

I want to benchmark foo --bench like this:

cargo flamegraph -e foo --bench

Unfortunately, cargo-flamegraph is trying to interpret the --bench argument rather than passing it to foo:

error: Found argument '--bench' which wasn't expected, or isn't valid in this context

USAGE:
    cargo-flamegraph flamegraph --exec <exec>

For more information try --help

I tried cargo flamegraph -e foo -- --bench and cargo flamegraph -e -- foo --bench but no luck :(

Fails when spaces are in project path

So far I'm really enjoying flamegraph! It's a better solution than what I'd cobbled together on my own. However, I have a bit of a problem:

mymachine:busy_work me$ sudo cargo flamegraph
   Compiling busy_work v0.1.0 (/Users/me/Documents/Code/rust/Research and Development/busy_work)
    Finished release [optimized] target(s) in 0.78s

WARNING: building without debuginfo. Enable symbol information by adding the following lines to Cargo.toml:

[profile.release]
debug = true

dtrace: system integrity protection is on, some features will not be available

dtrace: failed to execute /Users/me/Documents/Code/rust/Research: No such file or directory
failed to sample program

The project path is /Users/me/Documents/Code/rust/Research and Development/busy_work and so it looks like arguments are being supplied wrong here. For good measure I installed the latest according to #61 and I'm still seeing the issue.

Allow user to specify language for coloring flame graph

Inferno supports setting a color palette for the flame graph (just like flamegraph.pl --colors), and some of the palettes (see MultiPalette) are specifically designed to highlight particular languages well. It'd be cool if cargo flamegraph let you pass a --lang parameter to select the language to color the flame graph as! Speaking of, it'd be really neat if we also had support for a Rust coloring that, say, colors functions from std and core differently from user functions!

`cargo install` error

When trying to install via Cargo, I get the following error:

error[E0425]: cannot find value `exit_status` in this scope
   --> C:\Users\Omen\.cargo\registry\src\github.com-1ecc6299db9ec823\flamegraph-0.1.13\src\lib.rs:134:6
    |
134 |     !exit_status.success()
    |      ^^^^^^^^^^^ not found in this scope

error: aborting due to previous error

I am on Windows 10, I tried installing using CMD.exe and Git Bash. I also tried with and without admin privileges. Any help is appreciated!

Support for passing args via -- similar to cargo run

Add support for profiling benchmarks

This looks like a useful project! I've had a few folks ask for profiling support to be added to Criterion.rs; now I can direct them to cargo-flamegraph.

That being the case, it seems like it would be useful to have the ability to run cargo bench inside a profiler with cargo-flamegraph. Is that supported? I didn't see it in the documentation. Naturally, it should be possible to pass arguments to the benchmark executables as well, probably using the same -- separator that Cargo itself uses.

Ability to create graph from existing data

Hello

Sometimes it happens that I've already gathered the perf.data, or that I want to pass additional options to perf (like tuning the events if I want to see eg. flamegraph of branch misses, not total runtime). In such cases I think it would be nice if cargo flamegraph or flamegraph could process the already existing data instead of insisting on running the binary again and profiling it once more.

Would it be possible to add some kind of --existing perf.data (or whatever the file is for other profilers/systems)?

I know there's inferno and that I can build the graph myself, but using it seems to be a multi-step process. I kind of like the easy-to-use nature of flamegraph.

Thank you

Include DWARF call graph info for perf run

We should probably pass in --call-graph dwarf to perf so that we get full call chain information through the DWARF debug symbols embedded in the binary (if any).

Add a way to run cargo examples

Currently, only binary targets seem to be supported.

addr2line taking an exorbitant amount of time

Hi,
I've recently been using flamegraph on a Linux system and once the recording is done it takes an extremely long time. It seems to be invoking addr2line over and over again making the whole process quite slow.

The output is:

[ perf record: Woken up 682 times to write data ]
[ perf record: Captured and wrote 170,633 MB perf.data (21198 samples) ]

21k samples don't sound that much but if it's invoking a program for every sample it that seems to become very expensive.

Does `cargo bench` work with flamegraph?

I have benchmarks set up in benches directory, connected to the main project with a [[bench]] section in Cargo.toml. The benches are simple function calls invoked by their own main(), without Criterion or other benchmark-related dependencies.

Currently we simply run cargo bench to compile benchmarks, which produces a benchmark_name_hash executable file in target/release, for which we manually draw flamegraph by with

flamegraph benchmark_name_hash

This produces usable flamegrpahs.

However, can this be simplified?

We tried this command:

cargo flamegraph --bin=benchmark-name

It doesn't work. Error message is

error: no bin target named `benchmark-name`


WARNING: building without debuginfo. Enable symbol information by adding the following lines to Cargo.toml:

[profile.release]
debug = true

Adding the profile.release section does not help.

Operating system is OSX 10.14.

Default to `--release` builds?

Instead of providing --release to turn on release mode, I suggest defaulting to release mode and providing a --dev option for development builds.

This would have the downside of not matching cargo run, but I am confident that release modes are the correct default here: I can count on one hand the number of times I've ever wanted to profile a dev build.

perf: Segmentation fault

perf --version     
perf version 5.3.10

rustc --version      
rustc 1.39.0 (4560ea788 2019-11-04)

[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0,066 MB perf.data (5 samples) ]
perf: Segmentation fault
Obtained 12 stack frames.
/usr/lib/linux-tools/5.3.0-24-generic/perf(+0x274b87) [0x55d4c0f50b87]
/lib/x86_64-linux-gnu/libc.so.6(+0x4646f) [0x7fb64426b46f]
/usr/lib/linux-tools/5.3.0-24-generic/perf(+0x2fafbc) [0x55d4c0fd6fbc]
/usr/lib/linux-tools/5.3.0-24-generic/perf(+0x27d98f) [0x55d4c0f5998f]
/usr/lib/linux-tools/5.3.0-24-generic/perf(+0x286b43) [0x55d4c0f62b43]
/usr/lib/linux-tools/5.3.0-24-generic/perf(+0x276fb0) [0x55d4c0f52fb0]
/usr/lib/linux-tools/5.3.0-24-generic/perf(+0x2821df) [0x55d4c0f5e1df]
/usr/lib/linux-tools/5.3.0-24-generic/perf(+0x1b5268) [0x55d4c0e91268]
/usr/lib/linux-tools/5.3.0-24-generic/perf(+0x222fb2) [0x55d4c0efefb2]
/usr/lib/linux-tools/5.3.0-24-generic/perf(+0x19e313) [0x55d4c0e7a313]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf2) [0x7fb64424c1e2]
/usr/lib/linux-tools/5.3.0-24-generic/perf(+0x19e56d) [0x55d4c0e7a56d]
failed to sample program

Moving the ownership of crate cargo-flamegraph

Hi, I'm the current owner of crate cargo-flamegraph.

A few years back, I developed a flamegraph tool for cargo in my work. I reserved the crate cargo-flamegraph for me to be eventually able to publish my work. However, publishing OSS in enterprise environments isn't always easy, and I didn't have to energy to drive that home.

I think the current flamegraph tool in this repo exceeds my work in functionality and code quality, and apparently it's confusing for the users that there exists an empty crate cargo-flamegraph, a common pattern for cargo related tools. So it would be better for you have the crate. What do you think?

(@spacejam actually sent me an email around a year back about a transfer, but at that time I wasn't sure whether I was going to give up with publishing. Now I think that it's better for you have the crate.)

unable to collapse generated profile data: Custom { kind: InvalidData, error: StringError("stream did not contain valid UTF-8") }

Randomly panic on my mac

thread 'main' panicked at 'unable to collapse generated profile data: Custom { kind: InvalidData, error: StringError("stream did not contain valid UTF-8") }', src/libcore/result.rs:997:5
stack backtrace:
   0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
   1: std::sys_common::backtrace::_print
   2: std::panicking::default_hook::{{closure}}
   3: std::panicking::default_hook
   4: std::panicking::rust_panic_with_hook
   5: std::panicking::continue_panic_fmt
   6: rust_begin_unwind
   7: core::panicking::panic_fmt
   8: core::result::unwrap_failed
   9: flamegraph::generate_flamegraph_by_running_command
  10: cargo_flamegraph::main
  11: std::rt::lang_start::{{closure}}
  12: std::panicking::try::do_call
  13: __rust_maybe_catch_panic
  14: std::rt::lang_start_internal
  15: main

docs: README should mention MacOS support

Whether flamegraph supports MacOS or not, the README should say so.

Consider tagging github releases

Please consider tagging releases so github creates tarballs for them :)

split into flamegraph for arbitrary executables and cargo-flamegraph for Rust projects

This binary is already helpful for reducing dependencies and friction to generate flamegraphs, but we can go farther by generating a simple flamegraph binary that skips the cargo-specific stuff and can be used for arbitrary workloads.

Add support for Windows

Windows now experimentally supports DTrace, which could allow for possible profiling with cargo-flamegraph

Detaching from programs

I encountered an issue, that current implementation doesn't make an svg file (only perf.data) for programs that are finished with non zero status. My case is the following: I'm trying to sample a program, that never ends by itself (e.g only by kill, without gracious shutdown). Though, perf.data itself is ok, and stackcollapse-perf.pl && flamegraph.pl produce expected result. I think as a quick workaround for that problem might be just skipping exiting at the point above, and just leave a warning/error; and for a proper solution it's better to detach from a program being traced (I guess that possible, right?)

Make README more helpful for those who don't know about flame graphs

Someone pointed out that it'd be good to at least include an example flame graph in the README. I suspect adding a tiny bit of information about what flame graphs are (like how to read them) + link to http://www.brendangregg.com/flamegraphs.html would also be a good addition.