GithubHelp home page GithubHelp logo

janestreet / magic-trace Goto Github PK

View Code? Open in Web Editor NEW
4.5K 34.0 85.0 35.99 MB

magic-trace collects and displays high-resolution traces of what a process is doing

Home Page: https://magic-trace.org

License: MIT License

Makefile 0.01% OCaml 99.69% C 0.30% Shell 0.01%
intel x86 visualizer tracing profile performance-tools introspection

magic-trace's Introduction


magic-trace

Overview

magic-trace collects and displays high-resolution traces of what a process is doing. People have used it to:

  • figure out why an application running in production handles some requests slowly while simultaneously handling a sea of uninteresting requests,
  • look at what their code is actually doing instead of what they think it's doing,
  • get a history of what their application was doing before it crashed, instead of a mere stacktrace at that final instant,
  • ...and much more!

magic-trace:

  • has 2%-10% overhead,
  • doesn't require application changes to use,
  • traces every function call with ~40ns resolution, and
  • renders a timeline of call stacks going back (a configurable) ~10ms.

You use it like perf: point it to a process and off it goes. The key difference from perf is that instead of sampling call stacks throughout time, magic-trace uses Intel Processor Trace to snapshot a ring buffer of all control flow leading up to a chosen point in time1. Then, you can explore an interactive timeline of what happened.

You can point magic-trace at a function such that when your application calls it, magic-trace takes a snapshot. Alternatively, attach it to a running process and detach it with Ctrl+C, to see a trace of an arbitrary point in your program.

Testimonials

"Magic-trace is one of the simplest command-line debugging tools I have ever used."

  • Francis Ricci, Jane Street

"Magic-trace is not just for performance. The tool gives insight directly into what happens in your program, when, and why. Consider using it for all your introspective goals!"

  • Andrew Hunter, Jane Street

I use perf a ton, and I think that both perf and magic-trace give perspectives that the other doesn't. The benefit I got from magic-trace was entirely based on the fact that it works in slices at any zoom level, so I was able to see all the function calls that a 70ns function was performing, which was invisible in perf.

  • Doug Patti, Jane Street

more testimonials...

Install

  1. Make sure the system you want to trace is supported. The constraints that most commonly trip people up are: VMs are mostly not supported, Intel only (Skylake2 or later), Linux only.

  2. Grab a release binary from the latest release page.

    1. If downloading the prebuilt binary (not package), chmod +x magic-trace3
    2. If downloading the package, run sudo dpkg -i magic-trace*.deb

    Then, test it by running magic-trace -help, which should bring up some help text.

Getting started

  1. Here's a sample C program to try out. It's a slightly modified version of the example in man 3 dlopen. Download that, build it with gcc demo.c -ldl -o demo, then leave it running ./demo. We're going to use that program to learn how dlopen works.

  2. Run magic-trace attach -pid $(pidof demo). When you see the message that it's successfully attached, wait a couple seconds and Ctrl+C magic-trace. It will output a file called trace.fxt in your working directory.

  1. Open magic-trace.org, click "Open trace file" in the top-left-hand and give it the trace file generated in the previous step.

  1. That should have expanded into a trace. Zoom in until you can see an individual loop through dlopen/dlsym/cos/printf/dlclose.
    • W zooms into wherever your mouse cursor is pointed (you'll need to zoom in a bunch to see anything useful),
    • S zooms out,
    • A moves left,
    • D moves right, and
    • scroll wheel moves your viewport up and down the stack. You'll only need to scroll to see particularly deep stack traces, it's probably not useful for this example.

  1. Click and drag on the white space around the call stacks to measure. Plant flags by clicking in the timeline along the top. Using the measurement tool, measure how long it takes to run cos. On my screen it takes ~5.7us.

Congratulations, you just magically traced your first program!

In contrast to traditional perf workflows, magic-trace excels at hypothesis generation. For example, you might notice that taking 6us to run cos is a really long time! If you zoom in even more, you'll see that there's actually five pink "[untraced]" cells in there. If you re-run magic-trace with root and pass it -trace-include-kernel, you'll see stacktraces for those. They're page fault handlers! The demo program actually calls cos twice. If you zoom in even more near the end of the 6us cos call, you'll see that the second call takes far less time and does not page fault.

How to use it

magic-trace continuously records control flow into a ring buffer. Upon some sort of trigger, it takes a snapshot of that buffer and reconstructs call stacks.

There are two ways to take a snapshot:

We just did this one: Ctrl+C magic-trace. If magic-trace terminates without already having taken a snapshot, it takes a snapshot of the end of the program.

You can also trigger snapshots when the application calls a function. To do so, pass magic-trace the -trigger flag.

  • -trigger '?' brings up a fuzzy-finding selector that lets you choose from all symbols in your executable,
  • -trigger SYMBOL selects a specific, fully mangled, symbol you know ahead of time, and
  • -trigger . selects the default symbol magic_trace_stop_indicator.

Stop indicators are powerful. Here are some ideas for where you might want to place one:

  • If you're using an asynchronous runtime, any time a scheduler cycle takes too long.
  • In a server, when a request takes a surprisingly long time.
  • After the garbage collector runs, to see what it's doing and what it interrupted.
  • After a compiler pass has completed.

You may leave the stop indicator in production code. It doesn't need to do anything in particular, magic-trace just needs the name. It is just an empty, but not inlined, function. It will cost ~10us to call, but only when magic-trace actually uses it to take a snapshot.

Documentation

More documentation is available on the magic-trace wiki.

Discussion

Join us on Discord to chat synchronously, or the GitHub discussion group to do so asynchronously.

Contributing

If you'd like to contribute:

  1. read the build instructions,
  2. set up your editor,
  3. take a quick tour through the codebase, then
  4. hit up the issue tracker for a good starter project.

Privacy policy

magic-trace does not send your code or derivatives of your code (including traces) anywhere.

magic-trace.org is a lightly modified fork of Perfetto, and runs entirely in your browser. As far as we can tell, it does not send your trace anywhere. If you're worried about that changing one day, set up your own local copy of the Perfetto UI and use that instead.

Acknowledgements

Tristan Hume is the original author of magic-trace. He wrote it while working at Jane Street, who currently maintains it.

Intel PT is the foundational technology upon which magic-trace rests. We'd like to thank the people at Intel for their years-long efforts to make it available, despite its slow uptake in the greater software community.

magic-trace would not be possible without perfs extensive support for Intel PT. perf does most of the work in interpreting Intel PT's output, and magic-trace likely wouldn't exist were it not for their efforts. Thank you, perf developers.

magic-trace.org is a fork of Perfetto, with minor modifications. We'd like to thank the people at Google responsible for it. It's a high quality codebase that solves a hard problem well.

The ideas behind magic-trace are in no way unique. We've written down a list of prior art that has influenced its design.

Footnotes

  1. perf can do this too, but that's not how most people use it. In fact, if you peek under the hood you'll see that magic-trace uses perf to drive Intel PT.

  2. Strictly speaking, anything newer than Broadwell, but this is not a platform we regularly test on, and timing resolution is worse (~1us).

  3. https://github.com/actions/upload-artifact/issues/38

magic-trace's People

Contributors

aalekseyev avatar billduff avatar cgaebel avatar clyvegassant avatar ecatmur avatar gretay-js avatar hlian avatar int-y1 avatar lamoreauxaj avatar pbaumbacher avatar quantum5 avatar robberth avatar shoffmeister avatar theothornhill avatar v-gb avatar xyene avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

magic-trace's Issues

Don't crash on the v8 demo

The v8 demo is in demo/demo.js

Right now, attempts to magic-trace it crash while trying to parse perf output:

   "3435768/3435768 1283684.240499276:   int                      556263730105 v8::base::OS::Abort+0x15 =>     556263608d40 v8::internal::Snapshot::DefaultSnapshotBlob+0x57e00"))

or even:

   "3435768/3435768 1283760.029165481:   int                      55626368b83f v8::internal::Snapshot::DefaultSnapshotBlob+0xda8ff =>     5562635f7e80 v8::internal::Snapshot::DefaultSnapshotBlob+0x46f40"))

At this point, I'd normally go add parsing for software interrupts, int. But I don't understand what's going on here. Looking at V8's source code, OS::Abort is supposed to abort the process. But the process didn't abort! Also also, DefaultSnapshotBlob is pretty trivial function that doesn't look like it should be generating software interrupts.

Handle `perf` failing to start more gracefully

If perf fails to start for any reason, magic-trace hangs and does not propagate the error up. An easy way to test this is to intentionally mess up the flags to perf record, or unintentionally by running on a too-old perf version.

`trace` is race-y with process startup

If a process takes less than ~500ms to execute, then trace will produce an empty tracefile. We should synchronize process startup with perf, probably through some SIGSTOP/SIGCONT dance.

Support increasing the snapshot buffer size

The snapshot buffer can be ~arbitrarily sized, and is passed as the argument to --snapshot in perf. Larger buffers would allow for more data to be captured and displayed.

Ref https://man7.org/linux/man-pages/man1/perf-intel-pt.1.html:

   To select snapshot mode a new option has been added:

       -S

   Optionally it can be followed by the snapshot size e.g.

       -S0x100000

   The default snapshot size is the auxtrace mmap size. If neither
   auxtrace mmap size nor snapshot size is specified, then the
   default is 4MiB for privileged users (or if
   /proc/sys/kernel/perf_event_paranoid < 0), 128KiB for
   unprivileged users. If an unprivileged user does not specify mmap
   pages, the mmap pages will be reduced as described in the new
   auxtrace mmap size option section below.

   The snapshot size is displayed if the option -vv is used e.g.

       Intel PT snapshot size: %zu

Fail gracefully

When magic-trace first starts up, it should detect conditions that we know will never work and bail out early with a good error message when they're hit. This wiki page contains a list of constraints we know about. Let's automate as much as we can.

common events trigger too fast

If I run a particular trace for an event I expect to be trigger very often (say expected 1000Hz) then as soon as magic trace starts recording, it stops. As a consequence I get a very limited history.

While my software trigger could be written to only probabalistically trigger I'd like it if magic trace had an option "refuse to trigger until we've been recording for TIME-SPAN".

`setjmp`/`longjmp` support

We cannot support arbitrary _setjmp/longjmp, but if we see a longjmp we can probably avoid totally breaking the trace by resetting the stack, and ignoring future rets underflowing our call stack (because we wouldn't know how many stack frames should have gotten popped).

They appear as Call "_setjmp" and Call "longjmp" tokens in the stream.

Any support planned for managed languages?

From what I can tell magic-trace right now only works with native binaries as you need to select a symbol address up front.

Do guys have any plans for managed languages? Maybe via the perf symbol map files?

Support more perf event kinds

Ref https://github.com/torvalds/linux/blob/7ee022567bf9e2e0b3cd92461a2f4986ecc99673/tools/perf/builtin-script.c#L1546:

static struct {
	u32 flags;
	const char *name;
} sample_flags[] = {
	{PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CALL, "call"},
	{PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_RETURN, "return"},
	{PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CONDITIONAL, "jcc"},
	{PERF_IP_FLAG_BRANCH, "jmp"},
	{PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CALL | PERF_IP_FLAG_INTERRUPT, "int"},
	{PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_RETURN | PERF_IP_FLAG_INTERRUPT, "iret"},
	{PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CALL | PERF_IP_FLAG_SYSCALLRET, "syscall"},
	{PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_RETURN | PERF_IP_FLAG_SYSCALLRET, "sysret"},
	{PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_ASYNC, "async"},
	{PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CALL | PERF_IP_FLAG_ASYNC |	PERF_IP_FLAG_INTERRUPT, "hw int"},
	{PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_TX_ABORT, "tx abrt"},
	{PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_TRACE_BEGIN, "tr strt"},
	{PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_TRACE_END, "tr end"},
	{PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CALL | PERF_IP_FLAG_VMENTRY, "vmentry"},
	{PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CALL | PERF_IP_FLAG_VMEXIT, "vmexit"},
	{0, NULL}
};

magic-trace trace decoding fails on OCaml programs compiled on my system

$ cat test.ml
let () = print_endline "Hello World"

$ ~/.opam/4.13.1/bin/ocamlopt -o test ./test.ml

$ magic-trace trace -o test.trace ./test
[Couldn't find symbol. Will still snapshot on end]
Hello World
[ perf record: Woken up 1 times to write data ]
[Finished recording!]
[Snapshot taken!]
[ perf record: Captured and wrote 0.002 MB /tmp/magic_trace.tmp.e5ca22/perf.data ]
[Decoding, this may take 30s or so...]
(monitor.ml.Error
 ("Owee_buf.Invalid_format(\"unknown .debug_line version\")")
 ("Raised at Owee_buf.invalid_format in file \"src/owee_buf.ml\", line 22, characters 25-51"
  "Called from Owee_buf.assert_format in file \"src/owee_buf.ml\" (inlined), line 26, characters 4-22"
  "Called from Owee_debug_line.read_header in file \"src/owee_debug_line.ml\", line 54, characters 2-80"
  "Called from Owee_debug_line.read_chunk in file \"src/owee_debug_line.ml\", line 82, characters 12-27"
  "Called from Magic_trace_core__Elf.addr_table.(fun).load_table_next in file \"core/elf.ml\", line 103, characters 12-45"
  "Called from Base__Option.iter in file \"src/option.ml\" (inlined), line 68, characters 14-17"
  "Called from Magic_trace_core__Elf.addr_table in file \"core/elf.ml\", line 90, characters 2-1023"
  "Called from Magic_trace_lib__Trace.Make_commands.decode_to_trace.(fun) in file \"src/trace.ml\", line 70, characters 25-43"
  "Called from Tracing__Tool_output.write_and_view in file \"src/tool_output.ml\", line 32, characters 16-19"
  "Called from Async_kernel__Deferred0.bind.(fun) in file \"src/deferred0.ml\", line 54, characters 64-69"
  "Called from Async_kernel__Job_queue.run_jobs in file \"src/job_queue.ml\", line 167, characters 6-47"
  "Caught by monitor Monitor.protect"))

this is magic-trace v0.15.0 as packaged on opam
System: Fedora 34
owee is version 0.4 (I don't know if 0.5 would fix it, but the magic-trace package on opam requires owee 0.4)

Filter events from after the stop indicator

Magic-trace doesn't stop recording exactly when the stop indicator is hit. That can be quite confusing, especially if you get unlucky and the stop event is nowhere near the right hand side of the trace.

To fix this, let's filter out events from the trace that happen after the stop indicator returns.

It's conceivable that people will want the current behavior behind a flag, but I weakly think it's not worth the extra UI complexity.

Trace state tracking is broken by hardware interrupts

Perf outputs two events magic-trace uses to construct its [untraced] spans: tr strt and tr end.

As we discovered after a deep dive today, magic-trace isn't handling these events properly and that is the cause of some staircase traces:

x

Subtleties include:

  • tr end doesn't need to be accompanied by a tr strt. For example, there's an implicit tr end during a hardware interrupt (hw int). But when tracing userspace only, you don't even see hardware interrupts.
  • Due to an Intel PT bug (?), there are sometimes two tr strts instead of one.

and they make it challenging to figure out what the correct trace state should be.

We propose the following algorithm for tracking trace state, instead of what magic-trace does today:

  • Explicitly track trace state per-thread, one of Tracing | Not_tracing.
  • Initial state is Tracing.
  • Tracing -> Not_tracing on tr end
  • Not_tracing -> Tracing on tr strt
  • On tr end while Not_tracing, print a warning and disbelieve it.
  • On tr strt while Tracing, print a warning and disbelieve it.
  • There's one exception: A tr strt is permitted as the first event of a thread, even though it's prohibited by the other rules.

This usually, but not always, happens around a call to memmove. I've left some example perf script output below:

 1139/1139  428146.916343395:   jcc                            40af03 itch_bbo::book::Book::add_order+0x3b3 =>           40b012 itch_bbo::book::Book::add_order+0x4c2
 1139/1139  428146.916343397:   call                           40b06c itch_bbo::book::Book::add_order+0x51c =>     7ffff7329220 __memmove_ssse3_back+0x0
 1139/1139  428146.916343397:   jmp                      7ffff732924a __memmove_ssse3_back+0x2a =>     7ffff732ba10 __memmove_ssse3_back+0x27f0
 1139/1139  428146.916343398:   return                   7ffff732ba16 __memmove_ssse3_back+0x27f6 =>           40b072 itch_bbo::book::Book::add_order+0x522
 1139/1139  428146.916343398:   call                           40b093 itch_bbo::book::Book::add_order+0x543 =>     7ffff7329220 __memmove_ssse3_back+0x0
 1139/1139  428146.916343445:   tr strt                             0 [unknown] =>     7ffff732bbd0 __memmove_ssse3_back+0x29b0
 1139/1139  428146.916343561:   return                   7ffff732bbd4 __memmove_ssse3_back+0x29b4 =>           40b099 itch_bbo::book::Book::add_order+0x549
 1139/1139  428146.916343592:   jmp                            40b0a8 itch_bbo::book::Book::add_order+0x558 =>           40b1ba itch_bbo::book::Book::add_order+0x66a
 1139/1139  428146.916343592:   jmp                            40b1ce itch_bbo::book::Book::add_order+0x67e =>           40b62d itch_bbo::book::Book::add_order+0xadd
 1139/1139  428146.916323767:   jmp                            40d53e itch_bbo::main+0x9ee =>           40d540 itch_bbo::main+0x9f0
 1139/1139  428146.916323767:   jcc                            40d54e itch_bbo::main+0x9fe =>           40d63f itch_bbo::main+0xaef
 1139/1139  428146.916324004:   hw int                         40d64f itch_bbo::main+0xaff => ffffffff8ad90750 [unknown]
 1139/1139  428146.916324247:   tr strt                             0 [unknown] => ffffffff8ad90762 [unknown]
 1139/1139  428146.916324607:   tr strt                             0 [unknown] =>           40d64f itch_bbo::main+0xaff
 1139/1139  428146.916324732:   jmp                            40d679 itch_bbo::main+0xb29 =>           40d730 itch_bbo::main+0xbe0
 1139/1139  428146.916324732:   call                           40d78b itch_bbo::main+0xc3b =>           40c650 itch_bbo::maybe_sanity_check_execution+0x0
 1139/1139  428146.916294568:   call                           40a5b2 alloc::collections::btree::remove::<impl alloc::collections::btree::node::Handle<alloc::collections::btree::node::NodeRef<alloc::collections::btree::node::marker::Mut,K,V,alloc::collections::btree::node::marker::Leaf>,alloc::collections::btree::node::marker::KV>>::remove_leaf_kv+0x62 =>     7ffff7329220 __memmove_ssse3_back+0x0
 1139/1139  428146.916294568:   jcc                      7ffff7329226 __memmove_ssse3_back+0x6 =>     7ffff732924e __memmove_ssse3_back+0x2e
 1139/1139  428146.916294569:   jmp                      7ffff732926c __memmove_ssse3_back+0x4c =>     7ffff732b4b0 __memmove_ssse3_back+0x2290
 1139/1139  428146.916294569:   return                   7ffff732b4b6 __memmove_ssse3_back+0x2296 =>           40a5b5 alloc::collections::btree::remove::<impl alloc::collections::btree::node::Handle<alloc::collections::btree::node::NodeRef<alloc::collections::btree::node::marker::Mut,K,V,alloc::collections::btree::node::marker::Leaf>,alloc::collections::btree::node::marker::KV>>::remove_leaf_kv+0x65
 1139/1139  428146.916294570:   call                           40a5d6 alloc::collections::btree::remove::<impl alloc::collections::btree::node::Handle<alloc::collections::btree::node::NodeRef<alloc::collections::btree::node::marker::Mut,K,V,alloc::collections::btree::node::marker::Leaf>,alloc::collections::btree::node::marker::KV>>::remove_leaf_kv+0x86 =>     7ffff7329220 __memmove_ssse3_back+0x0
 1139/1139  428146.916294615:   tr strt                             0 [unknown] =>     7ffff732924e __memmove_ssse3_back+0x2e
 1139/1139  428146.916294690:   jmp                      7ffff732926c __memmove_ssse3_back+0x4c =>     7ffff732b5c0 __memmove_ssse3_back+0x23a0
 1139/1139  428146.916294690:   return                   7ffff732b5c8 __memmove_ssse3_back+0x23a8 =>           40a5d9 alloc::collections::btree::remove::<impl alloc::collections::btree::node::Handle<alloc::collections::btree::node::NodeRef<alloc::collections::btree::node::marker::Mut,K,V,alloc::collections::btree::node::marker::Leaf>,alloc::collections::btree::node::marker::KV>>::remove_leaf_kv+0x89
 1139/1139  428146.916294690:   jmp                            40a65c alloc::collections::btree::remove::<impl alloc::collections::btree::node::Handle<alloc::collections::btree::node::NodeRef<alloc::collections::btree::node::marker::Mut,K,V,alloc::collections::btree::node::marker::Leaf>,alloc::collections::btree::node::marker::KV>>::remove_leaf_kv+0x10c =>           40a6c0 alloc::collections::btree::remove::<impl alloc::collections::btree::node::Handle<alloc::collections::btree::node::NodeRef<alloc::collections::btree::node::marker::Mut,K,V,alloc::collections::btree::node::marker::Leaf>,alloc::collections::btree::node::marker::KV>>::remove_leaf_kv+0x170
 1139/1139  428146.916294690:   call                           40a6c3 alloc::collections::btree::remove::<impl alloc::collections::btree::node::Handle<alloc::collections::btree::node::NodeRef<alloc::collections::btree::node::marker::Mut,K,V,alloc::collections::btree::node::marker::Leaf>,alloc::collections::btree::node::marker::KV>>::remove_leaf_kv+0x173 =>           40a1d0 alloc::collections::btree::node::BalancingContext<K,V>::merge_tracking_child_edge+0x0
 1139/1139  428146.916294707:   call                           40a297 alloc::collections::btree::node::BalancingContext<K,V>::merge_tracking_child_edge+0xc7 =>     7ffff7329220 __memmove_ssse3_back+0x0

.

.

Clean up the CLI surface

The CLI surface of magic-trace could use some polish. Here is a small list of minor complaints I'd like to address before 1.0:

  • It could use a stable v1 subcommand with a limited set of features we're comfortable committing to forever.
  • We should either bundle our fork of perfetto or always point people to the web ui, instead of providing the optional -perfetto-ui-base-directory flag.
  • Document the decode command and maybe deprioritize it in the command hierarchy. It's useful if another app wants to do something with perf that magic-trace doesn't support but does want to use magic-trace to visualize the result.
  • If immediate-stop mode can truly cause system crashes, let's remove it.
  • There are two "trigger modes" that one can run magic-trace in: trigger when the app calls a function, or trigger when ctrl+c is hit. Let's find a way to hint that directly, maybe with separate subcommand or maybe by rejiggering the flags.
  • duration_thresh references mark_start, but doesn't explain what that is
  • serve should probably take the http port as an argument (probably with a default), instead of having two separate arguments
  • consider copying perf and outputting trace.data into the working directory, moving any existing file there into trace.data.old
  • Does full-execution even work?
  • If -multi-thread works, run in that mode by default and turn the flag into -single-thread... and maybe accept a tid?

support non-standard characters in symbol names on command line

I have a binary with a symbol whose c name according to e.g. readelf is like:

foo_bar_$5_baz$3_qux_12345

This fails:

magic-trace attach <stuff> -symbol 'foo_bar_$5_baz$3_qux_12345'

Surprisingly if I do:

magic-trace attach <stuff> -symbol foo_bar

I see the right symbol in my fzf, can select it, and everything works (it does not appear to have any escapes etc.) I wonder if we're not escaping the characters in a regex? I also wonder if we should separate symbol-regex from this-is-the-symbol-i-know-it.

Fix broken stacks traces on Go code

In this simple Go example, it's clear that Go's stack switch causes stacktraces to wander off the right hand side of the screen. I think this is easy to fix: when trace_writer.ml, sees the symbol runtime.newstack, it should mark all currently-open stack frames as closed.

I'm not sure if there is any more custom control flow in Go code. e.g. do Go stacks shrink? Please file more bugs if you notice any.

Support tracing into kernel mode

perf can already trace into the kernel, given sufficient perms. Adding support to magic trace involves (at least) supporting iret and hw int decoding / state machine updating, but after that it might "just work".

Changing c-states breaks magic-trace / IPT

I was experiencing issues where my traces were completely empty around the snapshot. After investigating the perf file it hinted at "instruction trace errors" which led me to

https://perf.wiki.kernel.org/index.php/Perf_tools_support_for_Intel%C2%AE_Processor_Trace

which mentions

It is not uncommon to get overflows when transitioning to a C-state, so these errors are not significant.

I was testing this on a TGL laptop and after disabling turbo boost I got pretty stable traces again.

I am wondering whether other people share the same experience with switching c-states or whether there is maybe something else behind it?

If not it might be worth mentioning disabling c-states in the readme / tutorial? Turbo boost was enough for me but probably something lik e the max_cstate kernel flags work as well.

This is on 5.15.17.

trace-many-times

I frequently want as many traces of $CONDITION as I can get. Right now I build a command line, watch it go, wait til it fires, wait for it to decode, and rerun.

I can write a shell loop around this, but you know what sounds great? A flag to "once you're done, re-arm and do it again with a fresh filename".

[Question] Breakpoints using perf hardware breakpoints

Thanks for this interesting project especially its amazingly written accompanying blog post on the janestreet tech blog!

I was intrigued by the following line in the post:

It turns out that perf_event_open can use hardware breakpoints and notify you when a memory address is executed or accessed

Very cool! So I understand that (1) You get notified (probably via a fd) that a hardware breakpoint has been reached (2) You enable intel processor trace for that thread (3) You resume the thread paused on the hardware breakpoint

My question is how do you do (3) ? How do you resume the thread? Do you sent it a SIGCONT or something like that?

Perf_tool_backend doesn't recognize "tr strt tr end"

This shows up for me in our Go demo. E.g. we fail to parse this perf line:

"2118573/2118573 770614.599007116: tr strt tr end 0 [unknown] => 4591e1 [unknown]"

I included a commented-out test for this in my recent pull request to the line-parsing code, for convenience of whoever fixes this.

magic-trace cannot find fzf executable even though it's in $PATH

Installed magic-trace as described in https://blog.janestreet.com/magic-trace/.

~ $ magic-trace attach -output magic.ftf
(monitor.ml.Error
 (Unix.Unix_error "No such file or directory" execvp
  "((prog /usr/bin/fzf) (argv (/usr/bin/fzf)))")
 ("Raised at Base__Result.ok_exn in file \"src/result.ml\", line 249, characters 17-26"
  "Called from Async_kernel__Deferred1.M.map.(fun) in file \"src/deferred1.ml\", line 17, characters 40-45"
  "Called from Async_kernel__Job_queue.run_jobs in file \"src/job_queue.ml\", line 167, characters 6-47"))
No pid selected

~ $ which fzf
/home/omer/rcbackup/nvim/pack/plugins/start/fzf/bin/fzf

Support bigger traces at lower resolution

My understanding is that one of the main blockers preventing taking traces for longer time periods is that perfetto can't handle the size of traces that it produces. I would happily settle for longer traces which only showed function calls that took more than a certain amount of time -- to give a high-level view of where the time was spent. Combined with some option to filter a trace down to a given time range this should allow for exploring large traces reasonably ergonomically.

Rewrite the README

It should have the following sections:

  • standard github badges
  • the magic trace logo
  • an overview of what this is and why anyone should care, with a picture
  • installation, including how to turn off perf paranoid mode
  • examples
  • supported platforms etc.
  • links to documentation
  • how to contribute, including a quick tour of the tree and a request to please squash/rebase PRs

If `-symbol` is specified, the name should also be printed

We currently print the address (as an int -- it should probably be printed as a pointer). We should also print the name, in case the user might've misselected their symbol). name @ addr format sounds reasonable.

(Ran into this while showing someone how to use magic-trace for the first time.)

Run tests in CI

We have a couple of tests, we should figure out how to run them in the CI.

Binaries with DWARF5 debug info fail at decoding time

It seems that magic-trace struggles with DWARF5 which gcc11 now uses by default. Testing a simple program compiled with DWARF5 info gives:

[Attaching to 4199088]
[Snapshot taken!]
...
[ perf record: Woken up 2 times to write data ]
[Finished recording!]
[ perf record: Captured and wrote 4.012 MB /tmp/magic_trace.tmp.848576/perf.data ]
[Decoding, this may take 30s or so...]
(monitor.ml.Error
 ("Owee_buf.Invalid_format(\"unknown .debug_line version\")")
 ("Raised at Owee_buf.invalid_format in file \"src/owee_buf.ml\", line 22, characters 25-51"
  "Called from Owee_buf.assert_format in file \"src/owee_buf.ml\" (inlined), line 26, characters 4-22"
  "Called from Owee_debug_line.read_header in file \"src/owee_debug_line.ml\", line 54, characters 2-80"
  "Called from Owee_debug_line.read_chunk in file \"src/owee_debug_line.ml\", line 82, characters 12-27"
  "Called from Magic_trace_core__Elf.addr_table.(fun).load_table_next in file \"core/elf.ml\", line 103, characters 12-45"
  "Called from Base__Option.iter in file \"src/option.ml\" (inlined), line 68, characters 14-17"
  "Called from Magic_trace_core__Elf.addr_table in file \"core/elf.ml\", line 90, characters 2-1023"
  "Called from Magic_trace_lib__Trace.Make_commands.decode_to_trace.(fun) in file \"src/trace.ml\", line 70, characters 25-43"
  "Called from Tracing__Tool_output.write_and_view in file \"src/tool_output.ml\", line 32, characters 16-19"
  "Called from Async_kernel__Deferred0.bind.(fun) in file \"src/deferred0.ml\", line 54, characters 64-69"
  "Called from Async_kernel__Job_queue.run_jobs in file \"src/job_queue.ml\", line 167, characters 6-47"
  "Caught by monitor Monitor.protect"))

Passing -gdwarf-4 to gcc makes the issue go away.

Provide a better error message when fzf isn't found

When fzf isn't available in the user's PATH, we crash with an ocaml exception. It's a pretty jarring experience for someone who'se never seen one before, and a terrible first impression.

Instead, let's give them a human-readable error message that invites the user to either:

  • install fzf, or
  • pass -pid to perf

Build static binaries in CI

This would require:

  • building against musl
  • possibly different dune targets

...but would allow the attached binaries to be run on anything, not just Ubuntu 20.04-glibc-equivalent systems.

time range filtering

As we know perfetto falls over at some point with large enough traces; can we write a tool that slices our output .ftfs by time range so we can look at parts successfully?

This isn't a good solution but it would be great for usability.

Admittedly it's also a pure Fuschia Format problem as opposed to a magic-trace one but nevertheless there might not be a better place for this to live.

Demangle symbols names in fzf

When showing the user an fzf to select a trigger symbol, show demangled symbols instead of mangled ones.

Symbols specified at the command line, I think, should still be mangled to ease copy+paste between apps and to avoid forcing us to write a name mangler too.

The message we show after a user selects a symbol should print its mangled name so the user can copy+paste it into future magic-trace invocations.

Create some demos

Using magic-trace is a visual experience. We should create example programs and demonstrations of magic-tracing them in a few different formats for people's varying attention spans.

  • a < 10s gif like speedscope has, demonstrating the value magic-trace can provide
  • a 2 minute video quickly walking through a short debugging session
  • A text-based walkthrough where we spell out exactly what commands to run at each step

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.