iovisor / bpftrace Goto Github PK

View Code? Open in Web Editor NEW

8.1K 168.0 1.3K 9.17 MB

High-level tracing language for Linux eBPF

License: Apache License 2.0

CMake 1.64% Shell 0.40% C++ 46.12% Lex 0.41% Yacc 0.78% C 2.44% Python 1.12% LLVM 46.87% Nix 0.22%

bpf ebpf tracing kprobes uprobes tracepoints usdt bcc

bpftrace's People

Contributors

Stargazers

Watchers

Forkers

macros westonsteimel roelvdp alvenwong tchen0123 0xflotus mlen devhttps kinvolk-archives paulfantom wcohen scottt 4144 henriknj cmarcelo helloweishi danobi shaunstanislauslau piperocorporation caringi psanford sjas pbhole hds emaxerrno wenlxie xinma zoidyzoidzoid sandip4n zhenxian-hu kldeng rantala eselyavka beholders-eye yujinqiu mmarchini birch-san hjcccompany b-xiang hogklint tyroguru bumplzz69 yonggeshidai xdbice ppissias dalehamel 307545758 zhuomingliang acj fntlnz marcelraschke tjfontaine calavera chiaqi dejunliu mlubas arno01 bollwarm bill1316 xbe zlim jakewarr8 boat0 cavemanwork stloma oalign vincentbernat leoh0 foobarwidget sourabhtk37 agsaidi hroyrh tenstormavi ekuric navytux sphinxorunixinskie crixalis2013 brinkqiang2cpp kgandhi-pl ajenbo yz111 alejandrox1 roger6325 maskray jasonk000 michalgr sd37 ccf19881030 emamatcyber90 abbadon123 javierhonduco hinkuok-kong devidasjadhav gslavin mgrice ldbobby paavan98pm alexgartrell xuhz chutz

bpftrace's Issues

add rand builtin

rand should return BPF_FUNC_get_prandom_u32.

user address space writes

Perhaps we should have a way to call BPF_FUNC_probe_write_user()

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=96ae52279594470622ff0585621a13e96b700600

This could either be a call, eg, write(), or it could just happen if you write to a user address. As an example of the latter, imagine if this worked (needs #4):

./src/bpftrace -e 'uprobe:/bin/bash:readline { *uaddr("ps1_prompt") = "BPF-says-hi> "; }'

So your bash prompt becomes "BPF-says-hi> " the text time you hit enter.

hist to truncate zero range

Ignore the missing ASCII bars (that's #50), this ticket is about trimming the zero range:

# ./src/bpftrace -e 'kprobe:do_nanosleep { @ = hist(pid); }'
Attaching 1 probe...
^C

@:
[0, 1]                 0 |                                                    |
[2, 4)                 0 |                                                    |
[4, 8)                 0 |                                                    |
[8, 16)                0 |                                                    |
[16, 32)               0 |                                                    |
[32, 64)               0 |                                                    |
[64, 128)              0 |                                                    |
[128, 256)             0 |                                                    |
[256, 512)             0 |                                                    |
[512, 1K)              0 |                                                    |
[1K, 2K)               4 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[2K, 4K)               0 |                                                    |
[4K, 8K)               0 |                                                    |
[8K, 16K)              0 |                                                    |
[16K, 32K)             1 |@@@@@@@@@@@@@                                       |

should be:

# ./src/bpftrace -e 'kprobe:do_nanosleep { @ = hist(pid); }'
Attaching 1 probe...
^C

@:
[1K, 2K)               4 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[2K, 4K)               0 |                                                    |
[4K, 8K)               0 |                                                    |
[8K, 16K)              0 |                                                    |
[16K, 32K)             1 |@@@@@@@@@@@@@                                       |

Similar to the lhist() changes. See #1 and #49

A kill() call (could also be called raise(), which sounds less alarming) can be used to send a signal to the current process. I imagine it would be used by security teams in some situations, as a zero-day workaround. Eg, to issue a kill(9) after a certain kprobe/uprobe program.

I don't see a kernel function for this[1], so this probably involves adding the BPF kernel function first.

[1] https://github.com/iovisor/bcc/blob/master/docs/kernel-versions.md

Sting comparisons in filters always return true

# ./src/bpftrace -e 'kretprobe:vfs_read /comm == "sshd"/ { @[comm] = count(); }'
Attaching 1 probe...
^C

@[systemd-journal]: 1
@[systemd]: 4
@[sshd]: 10
@[snmpd]: 14
@[snmp-pass]: 30

That should only be matching "sshd".

I'm guessing something is wrong when strings are used in boolean operations. You can dump the llvm instructions using -d. Also check how string comparisons were done in bcc (eg, I think trace.py has an implementation).

filter make '/' optional

Filters (aka predicates) are wrapped in "/"s, like awk. But awk was using them to identify string comparisons, and they weren't needed for int comparisons. I'm wondering if we can make them completely optional, since we already have a type system.

Eg:

./src/bpftrace -e 'kretprobe:vfs_read /pid > 1000/ { @[pid] = count(); }'
./src/bpftrace -e 'kretprobe:vfs_read pid > 1000 { @[pid] = count(); }'

The first works, the second doesn't. Could the second work? This is a nice-to-have if it's easy, but not that important.

I imagine it'd involve editing lexer.l and parser.yy Also see ast/codegen_llvm.cpp and start with:

      case bpftrace::Parser::token::BAND:  expr_ = b_.CreateAnd    (lhs, rhs); break;

usdt arguments

USDT probes have arguments that should be made available as arg0, arg1, etc.

You can see the arguments using readelf, eg:

# readelf -n ~/tick

Displaying notes found at file offset 0x00000254 with length 0x00000020:
  Owner                 Data size	Description
  GNU                  0x00000010	NT_GNU_ABI_TAG (ABI version tag)
    OS: Linux, ABI: 2.6.32

Displaying notes found at file offset 0x00000274 with length 0x00000024:
  Owner                 Data size	Description
  GNU                  0x00000014	NT_GNU_BUILD_ID (unique build ID bitstring)
    Build ID: 4f07e5e5b4d993414247062758f16a6f41f92b40

Displaying notes found at file offset 0x00001078 with length 0x00000044:
  Owner                 Data size	Description
  stapsdt              0x0000002e	NT_STAPSDT (SystemTap probe descriptors)
    Provider: tick
    Name: loop
    Location: 0x000000000040057e, Base: 0x000000000040064c, Semaphore: 0x0000000000000000
    Arguments: -4@-4(%rbp)

Last time I discussed this in detail was here: iovisor/bcc#327

Since usdt argument support was add to bcc, and bpftrace uses functions from bcc, that's how we should do it here. Eg, see bcc/src/cc/usdt/usdt_args.cc

Struct support will depend on #31.

all abort()s should print an error message

src # grep -R abort *
ast/ast.cpp:    default: abort();
ast/ast.cpp:    default: abort();
ast/codegen_llvm.cpp:    abort();
ast/codegen_llvm.cpp:      abort();
ast/codegen_llvm.cpp:    abort();
ast/codegen_llvm.cpp:        abort();
ast/codegen_llvm.cpp:      case bpftrace::Parser::token::LAND:  abort(); // Handled earlier
ast/codegen_llvm.cpp:      case bpftrace::Parser::token::LOR:   abort(); // Handled earlier
ast/codegen_llvm.cpp:      default: abort();
ast/codegen_llvm.cpp:    default: abort();
ast/irbuilderbpf.cpp:        abort();
ast/semantic_analyser.cpp:      abort();
ast/semantic_analyser.cpp:          abort();
attached_probe.cpp:    default: abort();
attached_probe.cpp:    default: abort();
attached_probe.cpp:      abort();
attached_probe.cpp:      abort();
attached_probe.cpp:      abort();
attached_probe.cpp:      abort();
attached_probe.cpp:      abort();
attached_probe.cpp:    abort();
attached_probe.cpp:    abort();
attached_probe.cpp:    abort();
attached_probe.cpp:    abort();
bpftrace.cpp:        abort();
bpftrace.cpp:      abort();
bpftrace.cpp:        abort();
map.cpp:    abort();
map.cpp:        abort();
mapkey.cpp:  abort();
types.cpp:    default: abort();
types.cpp:  abort();

To simulate one of these, you can try:

diff --git a/src/ast/semantic_analyser.cpp b/src/ast/semantic_analyser.cpp
index 64c9411..869e2f0 100644
--- a/src/ast/semantic_analyser.cpp
+++ b/src/ast/semantic_analyser.cpp
@@ -87,7 +87,7 @@ void SemanticAnalyser::visit(Call &call)
     }
   }

-  if (call.func == "hist") {
+  if (call.func == "his") {
     check_assignment(call, true, false);
     check_nargs(call, 1);
     check_arg(call, Type::integer, 0);

Then:

# ./src/bpftrace -e 'kprobe:do_nanosleep { @ = his(pid); }'
Error: missing codegen for function "his"
Aborted

Oh, right, I forgot -- this abort() was so annoying that I already added that error message!

Without it, you'd just get the abort:

# ./src/bpftrace -e 'kprobe:do_nanosleep { @ = his(pid); }'
Aborted

Then you'd need to:

# ulimit -c unlimited
# ./src/bpftrace -e 'kprobe:do_nanosleep { @ = his(pid); }'
Error: missing codegen for function "his"
Aborted (core dumped)
# gdb src/bpftrace core
[...]
Core was generated by `./src/bpftrace -e kprobe:do_nanosleep { @ = his(pid); }'.
Program terminated with signal SIGABRT, Aborted.

warning: Unexpected size of section `.reg-xstate/29593' in core file.
#0  0x00007fade3675428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
54	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x00007fade3675428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1  0x00007fade367702a in __GI_abort () at abort.c:89
#2  0x00000000007d46f6 in bpftrace::ast::CodegenLLVM::visit (this=0x7ffc6c2ff6b0, call=...) at /mnt/src/bpftrace-rand/src/ast/codegen_llvm.cpp:506
#3  0x00000000007ce422 in bpftrace::ast::Call::accept (this=0x36fc7e0, v=...) at /mnt/src/bpftrace-rand/src/ast/ast.cpp:20
#4  0x00000000007d5d88 in bpftrace::ast::CodegenLLVM::visit (this=0x7ffc6c2ff6b0, assignment=...)
    at /mnt/src/bpftrace-rand/src/ast/codegen_llvm.cpp:672
#5  0x00000000007ce5e4 in bpftrace::ast::AssignMapStatement::accept (this=0x36fc860, v=...) at /mnt/src/bpftrace-rand/src/ast/ast.cpp:56
#6  0x00000000007d64d8 in bpftrace::ast::CodegenLLVM::visit (this=0x7ffc6c2ff6b0, probe=...) at /mnt/src/bpftrace-rand/src/ast/codegen_llvm.cpp:758
#7  0x00000000007ce6b0 in bpftrace::ast::Probe::accept (this=0x36fc8c0, v=...) at /mnt/src/bpftrace-rand/src/ast/ast.cpp:72
#8  0x00000000007d6d33 in bpftrace::ast::CodegenLLVM::visit (this=0x7ffc6c2ff6b0, program=...) at /mnt/src/bpftrace-rand/src/ast/codegen_llvm.cpp:812
#9  0x00000000007ce718 in bpftrace::ast::Program::accept (this=0x36fc930, v=...) at /mnt/src/bpftrace-rand/src/ast/ast.cpp:80
#10 0x00000000007da194 in bpftrace::ast::CodegenLLVM::compile (this=0x7ffc6c2ff6b0, debug=false, out=...)
    at /mnt/src/bpftrace-rand/src/ast/codegen_llvm.cpp:1110
#11 0x00000000007c3a98 in main (argc=3, argv=0x7ffc6c2ffa58) at /mnt/src/bpftrace-rand/src/main.cpp:144

Just to find out where it came from.

All abort() paths should have an error message, however that makes most sense to implement in C++ (you can do it much better than I did for that path).

scripts should be stand-alone executable

test.bt:

#!/mnt/src/bpftrace/build/src/bpftrace
/*
 * test program
 */

BEGIN
{
	printf("Tracing... Ctrl-C to end.\n");
	@epoch = nsecs;
}

tracepoint:sched:sched_process_exec
{
	printf("exec at: %d ms\n", (nsecs - @epoch) / 1000000);
}

END
{
	delete(@epoch);
}

(Modify /mnt/src/bpftrace/build/src/bpftrace to be your location of bpftrace.)

This should work:

# chmod 755 test.bt
# ./test.bt
1.1: invalid character '#'
1.1-2: syntax error, unexpected !, expecting #include or builtin or identifier

I guess bpftrace needs some lexer.l/parser.yy to identify when the program starts with '#!'?

This should continue to work, also:

./src/bpftrace test.bt

add ppid builtin

ppid for parent process ID.

This may require a kernel addition to do in a sane way: a ppid version of BPF_FUNC_get_current_pid_tgid(). Without it, we'd be digging the ppid from the curtask builtin (is there a sane way to know the right offset?).

runtime test suite

The /tests currently test parsing and BPF code generation, but does not test tracing functionality at runtime (actually running bpftrace and tracing an event). This is necessary to ensure everything actually works. It's a little tricky, as any given event (kprobe, tracepoint, etc) may or may not fire on its own, and may need to be triggered by the test suite.

I could imagine using the tracepoint:syscalls:sys_exit_nanosleep event as much as possible, as it's a stable API and can be easily triggered (from the shell: sleep 1).

Here's a very basic hacky example:

test=pid; sleep 1 & sleep 15 & ./src/bpftrace -e 'tracepoint:syscalls:sys_exit_nanosleep { printf("SUCCESS '$test' %d\n", pid); exit(); }' | grep '^SUCCESS '$test' [0-9][0-9]*$' || echo "FAILURE $test"
test=uid; sleep 1 & sleep 15 & ./src/bpftrace -e 'tracepoint:syscalls:sys_exit_nanosleep { printf("SUCCESS '$test' %d\n", uid); exit(); }' | grep '^SUCCESS '$test' [0-9][0-9]*$' || echo "FAILURE $test"
test=reg; sleep 1 & sleep 15 & ./src/bpftrace -e 'tracepoint:syscalls:sys_exit_nanosleep { printf("SUCCESS '$test' %d\n", reg("ip")); exit(); }' | grep '^SUCCESS '$test' [0-9][0-9]*$' || echo "FAILURE $test"

The idea of spawning a "sleep 1" and a "sleep 15" is to allow the test to complete quickly (sleep 1), but if bpftrace was slow to initialize, then there is a 15 second timeout (or shorter, since prior tests that complete in 1 second leave 15 second sleepers still running).

Running these produces the output

SUCCESS pid 1446
SUCCESS uid 0
SUCCESS reg 0

So the output says "SUCCESS" or "FAILURE", the name of the test, and some argument to aid debugging.

I think it would be better to write this in C++, or C, or Perl, or Python, and have better control over the sleeps.

The functionality that should be tested should be everything from the reference guide.

probe type short names

The following aliases for the full type names can be implemented:

t: tracepoint
k: kprobe
kr: kretprobe
u: uprobe
ur: uretprobe
p: profile
h: hardware
s: software
i: interval

(Note that I didn't include an abbreviation for "usdt" yet. We can add one later.)

So both of these should work:

./src/bpftrace -e 'tracepoint:sched:sched_switch { @[stack] = count(); }'
./src/bpftrace -e 't:sched:sched_switch { @[stack] = count(); }'

Note that these are referred to as either probe types, probe libraries, or probe providers.

add system() call

This ticket is to add a system() call (like awk's), so that you can do this:

./src/bpftrace -e 'kprobe:do_nanosleep { system("ps -p %d\n", pid); }'

I'd study how printf() is currently implemented, and base it on that.

printf() currently works by turning its format string into an identifier, the printf_id, which is generated in ast/codegen_llvm.cpp. That integer printf_id is passed back to the user-level bpftrace program, which can turn it back into the string. This is all to avoid having to return a string from the BPF program itself -- since BPF operates on a limited instruction and stack depth. Much better to map that constant string to an integer identifier, and pass back the int.

printf() also has a bunch of code for passing back the arguments, which will be the same for system().

It's unlikely that a bpftrace program will have more than a few dozen printf() statements, so the highest printf_id we'd expect would be about 50. Certainly much lower than, say, 10000.

So... printf_ids that 10000 and higher are used for a different special purpose. See this from types.h:

enum class AsyncAction
{
  // printf reserves 0-9999 for printf_ids
  exit = 10000,
  print,
  clear,
  zero,
  time,
  join,
};

printf() is an async action: the in-kernel BPF trace program populates the arguments, and passes it back through the perf_event buffer (via the b_.CreatePerfEventOutput() call in codegen), but then the bpftrace user-level program does the actual printing. These other actions (print(), clear(), zero(), time(), and join()), are similar: the in-kernel BPF program schedules them to be executed by the bpftrace user-level program vi b_.CreatePerfEventOutput(). It basically becomes a special printf() action that has a reserved printf_id, that is picked up by perf_event_printer() in bpftrace.cpp.

So perhaps system() could be implemented like this: If the printf_id is between 0 and 9999, it's printf(). If it's between 10000 and 19999, it's system(). And then the other AsyncActions are edited to start at 20000 (exit = 20000,).

I'd start by copy-n-pasting the whole (call.func == "printf") block in codegen, and changing:

-    static int printf_id = 0;
+    static int system_id = 10000;

Of course, it may be possible to avoid so much code duplication, but that'd be a start.

hist missing ASCII histogram bars

# ./src/bpftrace -e 'kprobe:do_nanosleep { @ = hist(pid); }'
Attaching 1 probe...
^C

@:
[0, 1]                 0 |                                                    |
[2, 4)                 0 |                                                    |
[4, 8)                 0 |                                                    |
[8, 16)                0 |                                                    |
[16, 32)               0 |                                                    |
[32, 64)               0 |                                                    |
[64, 128)              0 |                                                    |
[128, 256)             0 |                                                    |
[256, 512)             0 |                                                    |
[512, 1k)              0 |                                                    |
[1k, 2k)               2 |                                                    |
[2k, 4k)               0 |                                                    |
[4k, 8k)               0 |                                                    |
[8k, 16k)              2 |                                                    |

What happened to the ASCII bars? ("@"s) Here they are in lhist()?

# ./src/bpftrace -e 'kprobe:do_nanosleep { @ = lhist(pid, 0, 65000, 10000); }'
Attaching 1 probe...
^C

@:
[0, 10000)             4 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[10000, 20000)         1 |@@@@@@@@@@@@@                                       |

lhist human readable units

Compare this (lhist):

# ./src/bpftrace -e 'kprobe:do_nanosleep { @ = lhist(pid, 0, 65000, 10000); }'
Attaching 1 probe...
^C

@:
(...,0]                0 |                                                    |
[0, 10000)             7 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[10000, 20000)         1 |@@@@@@@                                             |
[20000, 30000)         0 |                                                    |
[30000, 40000)         0 |                                                    |
[40000, 50000)         0 |                                                    |
[50000, 60000)         0 |                                                    |
[65000,...)            0 |                                                    |

To this (hist):

# ./src/bpftrace -e 'kprobe:do_nanosleep { @ = hist(pid); }'
Attaching 1 probe...
^C

@:
[0, 1]                 0 |                                                    |
[2, 4)                 0 |                                                    |
[4, 8)                 0 |                                                    |
[8, 16)                0 |                                                    |
[16, 32)               0 |                                                    |
[32, 64)               0 |                                                    |
[64, 128)              0 |                                                    |
[128, 256)             0 |                                                    |
[256, 512)             0 |                                                    |
[512, 1k)              0 |                                                    |
[1k, 2k)               3 |                                                    |
[2k, 4k)               0 |                                                    |
[4k, 8k)               0 |                                                    |
[8k, 16k)              4 |                                                    |

Note that the hist() version uses "k" once it gets to 1024. Perhaps we want the same behavior for lhist(). But, for hist(), k == 1024, so for consistency we may want to keep that.

Hmm.

Perhaps the behavior should be to only use "k" and "m" if the range value is divisible by either 1024 (k) or 1024*1024 (m).

# ./src/bpftrace -e 'kprobe:do_nanosleep { @ = lhist(pid, 0, 65000, 10000); }'
Attaching 1 probe...
^C

@:
(...,0]                0 |                                                    |
[0, 10000)             7 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[10000, 20000)         1 |@@@@@@@                                             |
[20000, 30000)         0 |                                                    |
[30000, 40000)         0 |                                                    |
[40000, 50000)         0 |                                                    |
[50000, 60000)         0 |                                                    |
[65000,...)            0 |                                                    |

Is already working as intended, since the parameters are not divisible by either 1024 or 1024*1024 .

But if you used, say, 10240 (10 * 1024), it could do this:

# ./src/bpftrace -e 'kprobe:do_nanosleep { @ = lhist(pid, 0, 65000, 10240); }'
Attaching 1 probe...
^C

@:
(...,0]                0 |                                                    |
[0, 10k)               7 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[10k, 20k)             1 |@@@@@@@                                             |
[20k, 30k)             0 |                                                    |
[30k, 40k)             0 |                                                    |
[40k, 50k)             0 |                                                    |
[50k, 60k)             0 |                                                    |
[65000,...)            0 |                                                    |

Since you've used 10240, it's inserted the "k"s as appropriate.

So this ticket is: if a number divisible by 1024 or 1024*1024 are used in lhist() parameters, then use "k" and "m" as appropriate.

USDT probes are not firing

The USDT example below is not working:

#include <sys/sdt.h>
#include <stdio.h>
#include <unistd.h>

int main() {
  char myStr[] = "My string";
  while (1) {
    printf("%s\n", myStr);
    DTRACE_PROBE(example, first_probe);
    DTRACE_PROBE1(example, second_probe, myStr);
    printf("%s\n", myStr);
    sleep(1);
  }
  return 0;
}

$ sudo bpftrace -e 'usdt:/path/to/example:second_probe { printf("probe fired!\n"); }'
Attaching 1 probe...

...and it never fires.

Using bcc's trace tool it works:

$ sudo /usr/share/bcc/tools/trace 'u:/path/to/example:second_probe'
PID     TID     COMM            FUNC
15925   15925   example         second_probe
15925   15925   example         second_probe
15925   15925   example         second_probe
15925   15925   example         second_probe
15925   15925   example         second_probe

Am I doing something wrong?

script --custom options

See #18 first.

For advanced scripting, this should be possible:

# ./src/bpftrace slower.bt --verbose --file=file1 --minlatency=10
# ./slower.bt --verbose --file=file1 --minlatency=10

I picked the double dash ("--") so that these can be differentiated from bpftrace's own options which use a single dash (and so far are "-d", "-e", "-l", and "-v").

These custom options could then be provided in the script as $verbose (0 or 1), $file, and $minlatency.

See #18 for a discussion on type detection. If approach (B) is used, they would then be usable from the script as: $verbose, str($file), and $minlatency.

if-else statements

bpftrace currently supports ternary operators:

# ./src/bpftrace -e 'kprobe:do_nanosleep { printf("%d is %s\n", pid, pid > 10000 ? "high" : "low") }'
Attaching 1 probe...
9608 is low
1446 is low
23735 is high
9608 is low
1446 is low

But it could also support if-else statements: if () { ... } else { ... } and if () { ... }.

These should work:

# ./src/bpftrace -e 'kprobe:do_nanosleep { if (pid > 10000) { $s = "high"; } else { $s = "low"; } printf("%d is %s\n", pid, $s) }'
# ./src/bpftrace -e 'kprobe:do_nanosleep { if (pid > 10000) { printf("%d is high\n", pid); } else { printf("%d is low\n", pid) } }'
# ./src/bpftrace -e 'kprobe:do_nanosleep { if (pid > 10000) { printf("%d is high\n", pid); } }'

See how ternary operators are implemented in parser.yy:

#  grep -iC3 ternary parser.yy
%type <ast::ProbeList *> probes
%type <ast::Probe *> probe
%type <ast::Predicate *> pred
%type <ast::Ternary *> ternary
%type <ast::StatementList *> block stmts
%type <ast::Statement *> stmt
%type <ast::Expression *> expr
--
     |                  { $$ = nullptr; }
     ;

ternary : expr QUES expr COLON expr { $$ = new ast::Ternary($1, $3, $5); }
     ;

block : "{" stmts "}"     { $$ = $2; }
--
expr : INT             { $$ = new ast::Integer($1); }
     | STRING          { $$ = new ast::String($1); }
     | BUILTIN         { $$ = new ast::Builtin($1); }
     | ternary         { $$ = $1; }
     | map             { $$ = $1; }
     | var             { $$ = $1; }
     | call            { $$ = $1; }

This will be different since the if-else statements are dealing with stmt and not expr.

There's plenty of examples of if-else parsing in lex/yacc out there. Eg, the ANSI C grammer:

http://www.lysator.liu.se/c/ANSI-C-grammar-l.html
http://www.lysator.liu.se/c/ANSI-C-grammar-y.html

add join() optional delimiter

Currently join() will take a **char argument and print the array of strings:

bpftrace -e 'kprobe:sys_execve { join(arg1); }'

Note that kprobe:sys_execve no longer works in the most recent kernels (~4.18), as the syscalls changed. We'll switch this example to tracepoints:syscalls:sys_enter_execve when we have argument support.

This ticket is to add an optional second argument, that is the delimiter. The delimiter is currently a space, " ". You should be able to customize it, eg, to make it a comma: join(arg1, ",").

update install instructions

@ajor's structs patch doesn't build on the Ubuntu servers I've tried, I think due to a problem with how llvm is currently packaged.

This ticket is to update the install instructions for Ubuntu. A follow up can be to see if we can get the official Ubuntu packages fixed as well.

Here's my notes from my last install:

# remove any prior llvm/clang, if necessary, eg:
apt-get remove llvm-3.9 llvm-3.9-dev llvm-3.9-runtime libllvm3.9 libclang1-3.9 libclang-common-3.9-dev libclang-3.9-dev
vi /etc/apt/sources.list
---append---
# from https://apt.llvm.org/:
deb http://apt.llvm.org/xenial/ llvm-toolchain-xenial main
deb-src http://apt.llvm.org/xenial/ llvm-toolchain-xenial main
# 5.0
deb http://apt.llvm.org/xenial/ llvm-toolchain-xenial-5.0 main
deb-src http://apt.llvm.org/xenial/ llvm-toolchain-xenial-5.0 main
# 6.0
deb http://apt.llvm.org/xenial/ llvm-toolchain-xenial-6.0 main
deb-src http://apt.llvm.org/xenial/ llvm-toolchain-xenial-6.0 main
---append---
apt-get update
apt-get install -y bison cmake flex g++ git libelf-dev zlib1g-dev libfl-dev
apt-get install clang-5.0 libclang-5.0-dev libclang-common-5.0-dev libclang1-5.0 libllvm5.0 llvm-5.0 llvm-5.0-dev llvm-5.0-runtime
git clone https://github.com/ajor/bpftrace
cd bpftrace
mkdir build; cd build; cmake -DCMAKE_BUILD_TYPE=DEBUG ..
make -j8
./tests/bpftrace_test
./src/bpftrace -e 'kprobe:do_nanosleep { printf("sleep by %s\n", comm); }'

improve bpftrace -l

The current bpftrace -l implementation does partial matches only, and ditches wildcards. It should to wildcards properly, based on file globbing (not regular expressions).

For example, currently:

# ./src/bpftrace -l 'sleep*'
tracepoint:xfs:xfs_log_grant_sleep
tracepoint:sunrpc:rpc_task_sleep
tracepoint:block:block_sleeprq
tracepoint:compaction:mm_compaction_kcompactd_sleep
tracepoint:vmscan:mm_vmscan_kswapd_sleep
tracepoint:sched:sched_stat_sleep
tracepoint:syscalls:sys_enter_clock_nanosleep
tracepoint:syscalls:sys_exit_clock_nanosleep
[...]

That's probably not doing what people expect.

Since we have multiple fields colon delimited, it's a bit trickier than it might look. Here's my proposed solution:

"sleep*" would match any field that begins with "sleep", followed by anything.

But you must also be able to include the colons in your search. So "tracepoint:*" would list all tracepoints, and "syscalls:*" would list all "tracepoint:syscalls:*" plus anywhere else that "syscalls:" is a match. And "do_nano*" would match kprobe:do_nanosleep, etc. "do_nano" by itself would match nothing. "nano" would match anything that contains "nano".

TODOs in the code

This is a placeholder ticket to cover all the TODOs in the code, and will require several pull requests to clean up. These (so far):

# grep -R TODO
ast/codegen_llvm.cpp:     * TODO: consider stashing top & div in a printf_args_ like struct, so we don't need to pass
ast/codegen_llvm.cpp:  // TODO
ast/codegen_llvm.cpp:  // TODO
ast/irbuilderbpf.cpp:  AllocaInst *alloca = CreateAlloca(ty, arraysize, name); // TODO dodgy
Binary file ast/.semantic_analyser.cpp.swp matches
ast/semantic_analyser.cpp:     * TODO: this code ensures that map keys are consistent, but
ast/semantic_analyser.cpp:    err_ << call.func << "() should take " << expected_nargs << " arguments ("; // TODO plural
ast/semantic_analyser.cpp:    err_ << call.func << "() requires at least " << min_nargs << " argument ("; // TODO plural
imap.h:  // used by lhist(). TODO: move to separate Map object.
bpftrace.cpp:  // TODO: deal with these:
bpftrace.cpp:  // TODO: deal with process exit and clearing its psyms entry
list.cpp:  // TODO: glob searching instead of discarding wildcards
list.cpp:  // TODO: add here
list.cpp:  // TODO: add here
main.cpp:    // TODO: allow both
attached_probe.cpp:  // TODO: fn_name may need a unique suffix for each attachment on the same probe:

add inet_ntop() call: IPv6 support

This will be useful for all networking one-liners/scripts. See the man page for inet_ntop(3), but maybe we should just call it ntop() for short (or something else).

Both of these should work, and print "127.0.0.1" and "::1":

bpftrace -e 'BEGIN { print("%s\n", ntop(AF_INET, 1)); }'
bpftrace -e 'BEGIN { print("%s\n", ntop(AF_INET6, 1)); }'

IPv4 should be shown as a dotted-quad, and IPV6 as a most-shortened address representation: https://tools.ietf.org/html/rfc5952

kprobe arguments

This depends on #31. One of these should work:

bpftrace -e 'kprobe:do_nanosleep { printf("expires %lld\n", arg0->_softexpires); }'
bpftrace -e 'kprobe:do_nanosleep { printf("expires %lld\n", ((struct hrtimer_sleeper *)arg0)->_softexpires); }'

Or both. The first, where the kernel is self-aware of the arg0 type, may depend on the presence of BTF or debuginfo. In their absence, with kernel headers only, it may be required to specify the type.

This is likely a significant amount of work, and may involve sources struct information from these locations, if any are present:

kernel headers
kernel debuginfo
BTF

scripts should accept positional parameters

Fix #17 beforehand.

We should be able to provide parameters to the script:

# ./src/bpftrace fsslower.bt 10 file1
# ./fsslower.bt 10 file1

They should be available as the variables $1, $2, etc.

I see this could be processed in at least one of two ways:

A) type detection

In this case, "10" would available as the integer $1, and "file1" would be available as the string $2. Integers can be identified as those that match the regexp /^[0-9]+$/

If $1 and $2 (etc) are accessed and were not supplied on the command line, they should default to 0 or "", depending on their use as an integer or a string. That may be tricky to implement. So,

B) integers

All positional parameters, $1, $2, etc, are provided as integer types (uint64_t). Strings are provided as the pointer to the string. If they are not provided, they are zero.

In the script with the invocation example above, we can then use:

$1 to refer to "10"
str($2) to refer to "file1"

I think this (B) approach is easier and probably more sane.

struct support

@ajor is working on this. This will be used by tracepoint, usdt, and kprobe arguments.

Add support for kprobe at custom offsets

(from ajor/bpftrace#44)

New option added to libbcc

Possibly something like: kprobe:sys_open+8 { ... }

add uaddr() call

This is the same as kaddr() from ticket #3, but for user-level symbols and variables.

For example:

# objdump -tT /bin/bash | grep prompt
0000000000422ab0 g    DF .text	0000000000000c81  Base        decode_prompt_string
00000000006fce18 g    DO .bss	0000000000000008  Base        prompt_string_pointer
00000000004a8110 g    DF .text	000000000000001b  Base        _rl_reset_prompt
0000000000700850 g    DO .bss	0000000000000008  Base        rl_display_prompt
0000000000701920 g    DO .bss	0000000000000008  Base        ps1_prompt
00000000007003d0 g    DO .bss	0000000000000008  Base        rl_prompt
00000000007003c8 g    DO .bss	0000000000000004  Base        rl_visible_prompt_length
0000000000422a80 g    DF .text	0000000000000023  Base        set_current_prompt_level
00000000004a8d50 g    DF .text	00000000000000e1  Base        rl_restore_prompt
00000000006fce08 g    DO .bss	0000000000000008  Base        current_readline_prompt
00000000006f4ae0 g    DO .data	0000000000000008  Base        secondary_prompt
0000000000458580 g    DF .text	0000000000000111  Base        expand_prompt_string
00000000004a8a20 g    DF .text	00000000000000b7  Base        rl_save_prompt
[...]

Ok, let's say I want to print the contents of ps1_prompt. This ticket is to make the following one-liner work:

# ./src/bpftrace -e 'uprobe:/bin/bash:readline { printf("%s\n", str(*uaddr("ps1_prompt"))); }'

Here's a demonstration by copy-n-pasting the address manually (this actually works):

# ./src/bpftrace -e 'uprobe:/bin/bash:readline { printf("%s\n", str(*0x0000000000701920)); }'
Attaching 1 probe...
\n\[\033[1;31m\]${NETFLIX_ACCOUNT:-${NETFLIX_ENVIRONMENT:-no en
\n\[\033[1;31m\]${NETFLIX_ACCOUNT:-${NETFLIX_ENVIRONMENT:-no en
^C

bpftrace -l should list software and hardware types

The software and hardware probe types are not listed in -l. They should be.

This might also be a chance to refactor code like this, from attached_probe.cpp:

  else if (probe_.path == "cache-misses")
  {
    type = PERF_COUNT_HW_CACHE_MISSES;
  }
  else if (probe_.path == "branch-instructions" || probe_.path == "branches")
  {
    type = PERF_COUNT_HW_BRANCH_INSTRUCTIONS;
    defaultp = 100000;
  }
  else if (probe_.path == "bus-cycles")
  {
    type = PERF_COUNT_HW_BUS_CYCLES;
    defaultp = 100000;
  }

Perhaps there should just be one lookup table that had the types, paths, and counts, and all other code (including the new listing code) can refer to that one lookup table.

stack/ustack per-event printing

This works, which frequency counts stack traces:

# ./src/bpftrace -e 'kprobe:decay_load { @[stack] = count(); }'
Attaching 1 probe...

But neither of these work, which should just print out each stack trace as it happens:

# ./src/bpftrace -e 'kprobe:do_nanosleep { printf("%s", stack); }'
printf: %s specifier expects a value of type string (stack supplied)

# ./src/bpftrace -e 'kprobe:do_nanosleep { stack; }'
Attaching 1 probe...
^C

Maybe they should both work? Maybe just the printf() one? Maybe printf() needs a different format code, like %S for stack trace? See printf_format_types.h.

This ticket is to make at least one of these work.

It might involve adding a %S, then having printf.cpp call BPFtrace::get_stack from bpftrace.cpp.

If the printf() method is insane, then the standalone "stack" may be fine, similar to how join() currently works. I think I prefer the printf() method.

Perhaps this should use the new BPF_FUNC_get_stack() support in 4.18? https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=de2ff05f48afcde816ff4edb217417f62f624ab5

print BPF bytecode for debugging

A failed program will print the BPF bytecode. But this doesn't appear with either the "-v" or "-d" debugging flags for a working program. It should.

Here's an example:

# ./src/bpftrace -e 'kprobe:do_nanosleep { @ = reg("ip"); }'
Attaching 1 probe...
Error loading program: kprobe:do_nanosleep (try -v)

# ./src/bpftrace -v -e 'kprobe:do_nanosleep { @ = reg("ip"); }'
Attaching 1 probe...

Error log:
0: (bf) r3 = r1
1: (07) r3 += 128
2: (bf) r6 = r10
3: (07) r6 += -8
4: (bf) r1 = r6
5: (b7) r2 = 8
6: (85) call bpf_probe_read#4
7: (b7) r1 = 0
8: (7b) *(u64 *)(r10 -16) = r1
9: (7b) *(u64 *)(r10 -8) = r6
10: (18) r1 = 0xffff97b4f5aa9800
12: (bf) r2 = r10
13: (07) r2 += -16
14: (bf) r3 = r10
15: (07) r3 += -8
16: (b7) r4 = 0
17: (85) call bpf_map_update_elem#2
invalid indirect read from stack off -8+0 size 8

Error loading program: kprobe:do_nanosleep

I induced this failure by introducing the following error in ast/codegen_llvm.cpp (I'm modifying the (call.func == "reg") block):

diff --git a/src/ast/codegen_llvm.cpp b/src/ast/codegen_llvm.cpp
index 27fa477..bfbd081 100644
--- a/src/ast/codegen_llvm.cpp
+++ b/src/ast/codegen_llvm.cpp
@@ -352,7 +363,7 @@ void CodegenLLVM::visit(Call &call)
     AllocaInst *dst = b_.CreateAllocaBPF(call.type, call.func+"_"+reg_name);
     Value *src = b_.CreateGEP(ctx_, b_.getInt64(offset * sizeof(uintptr_t)));
     b_.CreateProbeRead(dst, 8, src);
-    expr_ = b_.CreateLoad(dst);
+    expr_ = dst;
     b_.CreateLifetimeEnd(dst);
   }
   else if (call.func == "printf")

I include that diff just so you can temporarily introduce this error and see what it looks like.

This ticket is to emit that BPF bytecode for a working program, eg:

./src/bpftrace -d -e 'kprobe:do_nanosleep { @ = reg("ip"); }'

When the debug option -d is used (at the end of the current output). eg, here is a mock up:

# ./src/bpftrace -d -e 'kprobe:do_nanosleep { @ = reg("ip"); }'
Program
 kprobe:do_nanosleep
  =
   map: @
   call: reg
    string: ip

; ModuleID = 'bpftrace'
source_filename = "bpftrace"
target datalayout = "e-m:e-p:64:64-i64:64-n32:64-S128"
target triple = "bpf-pc-linux"

; Function Attrs: nounwind
declare i64 @llvm.bpf.pseudo(i64, i64) #0

; Function Attrs: argmemonly nounwind
declare void @llvm.lifetime.start.p0i8(i64, i8* nocapture) #1

define i64 @"kprobe:do_nanosleep"(i8*) local_unnamed_addr section "s_kprobe:do_nanosleep" {
entry:
  %"@_val" = alloca i64, align 8
  %"@_key" = alloca i64, align 8
  %reg_ip = alloca i64, align 8
  %1 = bitcast i64* %reg_ip to i8*
  call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %1)
  %2 = getelementptr i8, i8* %0, i64 128
  %probe_read = call i64 inttoptr (i64 4 to i64 (i8*, i64, i8*)*)(i64* nonnull %reg_ip, i64 8, i8* %2)
  %3 = load i64, i64* %reg_ip, align 8
  call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %1)
  %4 = bitcast i64* %"@_key" to i8*
  call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %4)
  store i64 0, i64* %"@_key", align 8
  %5 = bitcast i64* %"@_val" to i8*
  call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %5)
  store i64 %3, i64* %"@_val", align 8
  %pseudo = call i64 @llvm.bpf.pseudo(i64 1, i64 1)
  %update_elem = call i64 inttoptr (i64 2 to i64 (i8*, i8*, i8*, i64)*)(i64 %pseudo, i64* nonnull %"@_key", i64* nonnull %"@_val", i64 0)
  call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %4)
  call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %5)
  ret i64 0
}

; Function Attrs: argmemonly nounwind
declare void @llvm.lifetime.end.p0i8(i64, i8* nocapture) #1

attributes #0 = { nounwind }
attributes #1 = { argmemonly nounwind }

0: (bf) r3 = r1
1: (07) r3 += 128
2: (bf) r6 = r10
3: (07) r6 += -8
4: (bf) r1 = r6
5: (b7) r2 = 8
6: (85) call bpf_probe_read#4
7: (b7) r1 = 0
8: (7b) *(u64 *)(r10 -16) = r1
9: (7b) *(u64 *)(r10 -8) = r6
10: (18) r1 = 0xffff97b4f5aa9800
12: (bf) r2 = r10
13: (07) r2 += -16
14: (bf) r3 = r10
15: (07) r3 += -8
16: (b7) r4 = 0
17: (85) call bpf_map_update_elem#2

If it's too tricky to add to -d, it could be added to -v instead.

add path() call

prototype:

char *path(struct file *file)

So that this would work:

bpftrace -e 'kprobe:vfs_read { printf("reading path: %s\n", path(arg0)); }'

(or trace kprobe:__vfs_read) and it would print the full path.

For this to work, it may require a new BPF function added to the kernel to make this sane.

Custom map printing: printf() for maps

This ticket is to support custom ways to print maps. Perhaps by enhancing either print() or printf() or both.

Consider this one-liner, and how the output has extra new lines now and then:

# ./src/bpftrace -e 'tracepoint:sched:sched_process_fork { @ = count(); } 
    interval:s:1 { printf("pids/sec: "); print(@); clear(@); }'
Attaching 2 probes...
pids/sec:
pids/sec:
pids/sec: @: 1

pids/sec:
pids/sec: @: 425

pids/sec: @: 1105

pids/sec: @: 1105

pids/sec: @: 746

pids/sec:
pids/sec:
pids/sec: @: 2

pids/sec:
pids/sec:
^C

The extra newlines are due to std::endl's in BPFtrace::print_map(), and its behavior when the map is unpopulated. There's also the "@" which is unnecessary to print in this case.

I might fix the newlines in a separate PR, but I can't remove the "@" or do other customizations. The following are suggestions for custom map printing.

Zero keys

Perhaps if the map has zero keys, it can be treated as the following types in printf():

A) integer == integer
B) string == string
C) count(), min(), max(), avg() == integer

(A) and (B) already work. Look:

# ./src/bpftrace -e 'BEGIN { @ = 42; } interval:s:1 { printf("pids/sec: %d\n", @); }'
Attaching 2 probes...
pids/sec: 42
pids/sec: 42
[...]
# ./src/bpftrace -e 'BEGIN { @ = "hi there"; } interval:s:1 { printf("pids/sec: %s\n", @); }'
Attaching 2 probes...
pids/sec: hi there
pids/sec: hi there
[...]

But (C) does not:

# bpftrace -e 'tracepoint:sched:sched_process_fork { @ = count(); }
    interval:s:1 { printf("pids/sec: %d\n", @); clear(@); }'
printf: %d specifier expects a value of type integer (count supplied)

I'd explore making this work. If a Type::count,min,max,avg map has zero keys, it can be printed as an integer in printf().

Note that multiple maps should also work, as they currently do for (A) and (B):

# ./src/bpftrace -e 'BEGIN { @a = 42; @b = "hi there"; }
    interval:s:1 { printf("test: %d %s\n", @a, @b); }'
Attaching 2 probes...
test: 42 hi there
test: 42 hi there

One or more keys

This may be supported in an additional PR. Since we're formatting the output, I'm tempted to make printf() work, and leave print() for printing things without extra formatting (or to print things with a default formatting).

Imagine this working (this is a mock up):

# bpftrace -e 'kprobe:do_nanosleep { @[pid, comm] = count(); }
END { printf("%6s %12s %s\n", "PID", "COMM", "SLEEPS"); printf("%6d %12s %d\n", @); }'
...
   PID        COMM SLEEPS
    81        sshd 5
   181        bash 15
   361         tar 162

Because @ was a multi-key map, it uses the format string for each key in the map. So bpftrace effectively expands @ to be key1, key2, integer:

printf("%6d %12s %d\n", @)
   ^ becomes:
printf("%6d %12s %d\n", key1, key2, integer);
   ^ for each key

add username builtin

This works (UID):

# ./src/bpftrace -e 'kprobe:do_nanosleep { printf("%d\n", uid); }'
Attaching 1 probe...
0
0
0

This doesn't (username):

# ./src/bpftrace -e 'kprobe:do_nanosleep { printf("%s\n", username); }'
1.46: syntax error, unexpected ), expecting (

Here's my guess on how to implement it (this could be wrong, or there could be a better way):

Add "username" to this line from lexer.l:

lexer.l:pid|tid|uid|gid|nsecs|cpu|comm|stack|ustack|arg[0-9]|retval|func|name|curtask {

Also add username to enum class Type from types.h, and to types.cpp.
Now ast/semantic_analyser.cpp can have:

  else if (builtin.ident == "username") {
    builtin.type = SizedType(Type::username, 8);
  }

What that's doing is really saying we're only going to use 8 bytes for it (uint64_t), but decorate it with "Type::username" so we know to map it later.

ast/codegen_llvm.ccp could have the uid/gid block modified to be:

  else if (builtin.ident == "uid" || builtin.ident == "gid" || builtin.ident == "username")
  {
    Value *uidgid = b_.CreateGetUidGid();
    if (builtin.ident == "uid" || builtin.ident == "username")
    {
      expr_ = b_.CreateAnd(uidgid, 0xffffffff);
    }
    else if (builtin.ident == "gid")
    {
      expr_ = b_.CreateLShr(uidgid, 32);
    }
  }

So we're just storing the UID, whether you used "uid" or "username".

This will confuse the printf() verifier, but we can fix that. Modify these lines in printf.cpp to add Type::username:

    if (arg_type == Type::sym || arg_type == Type::usym || arg_type == Type::name || arg_type == Type::username)
      arg_type = Type::string; // Symbols should be printed as strings

Now see perf_event_printer() in bpftrace.cpp, and how it handles things like Type::sym (which is an integer that we map back to a string, and add to a cache for future lookups). Add code like this:

  std::vector<std::unique_ptr<char>> resolved_usernames;
[...]
      case Type::sym:
        resolved_usernames.emplace_back(strdup(
              bpftrace->resolve_uid(*(uint64_t*)arg_data).c_str()));
        arg_values.push_back((uint64_t)resolved_usernames.back().get());
        break;

Then you need to add a BPFtrace::resolve_uid() function to bpftrace.cpp, that loads /etc/passwd and turns the UID into the username. Type::username -> resolve_uid will also need to happen in BPFtrace::print_map. And resolve_uid will need to be added to bpftrace.h.
Add tests to tests/codegen.cpp, tests/parser.cpp, tests/semantic_analyser.cpp.

stack/ustack to accept limit argument

stack and ustack are currently implemented as builtins, and return the full kernel and user stack traces. This ticket is to change them to accept an argument: the top number of frames to print. So instead of "stack" for the full stack, we can use "stack(3)" to print the top three frames only, and "stack()" to print all frames.

This should work:

# ./src/bpftrace -e 'kprobe:decay_load { @[stack(3)] = count(); }'
BDG: value_size 8
Attaching 1 probe...
[...]
]: 172
@[
decay_load+1
update_load_avg+1609
enqueue_entity+104
]: 172
@[
decay_load+1
update_blocked_averages+1334
run_rebalance_domains+114
]: 173

So that's showing the traced function (decay_load) and two more frames only, for the maximum of 3 frames.

This will mean changing them from a builtin to a call, and pulling in the argument in ast/codegen_llvm.cpp. There's more than one way to pull in an argument. Eg:

call.vargs->front()->accept(*this);   // it becomes expr_

or, from lhist:

Integer &value_arg = static_cast<Integer&>(*call.vargs->at(0));
Integer &min_arg = static_cast<Integer&>(*call.vargs->at(1));
Integer &max_arg = static_cast<Integer&>(*call.vargs->at(2));
Integer &step_arg = static_cast<Integer&>(*call.vargs->at(3));

Now the tricky part: how do we limit frames? Ideally, this is done in-kernel. However, I don't know how we do that. I've discussed it in:

iovisor/bcc#1103

So this might involve editing CreateGetStackId from ast/irbuildirbpf.cpp, or, it might involve setting and restoring the sysctl kernel.perf_event_max_stack. Or something else: maybe this requires a kernel change to support it (the reserved bits).

stack()/ustack() with no arguments should print the full stack.

Later on (and as a separate ticket) we can add a 2nd argument for skip frames, which bpf_get_stackid() supports.

add unroll() loops

Sometimes we'd like to run the same block of code a small number of times. This ticket is to add an unroll() loop operator for this purpose.

Eg, this:

 ./src/bpftrace -e 'kprobe:do_nanosleep { $i = 0; unroll(5) { printf("i: %d\n", $i); $i = $i + 1; } }'

Should run the block of code 5 times, and print:

i: 1
i: 2
i: 3
i: 4
i: 5

(It may print out-of-order if do_nanosleep() executes on different CPUs simultaneously.)

I imagine this will involve editing lexer.l and parser.yy, and the codegen may call "accept" on the same "stmt" node in a loop. I'd do if-else #8 first, since it will show how to wrap stmts in codegen, and builds upon the existing ternary code that wraps exprs.

There should be a limit to the unroll number, eg, 20, enforced in semantic_analyser.cpp, since otherwise it'll be too easy for people to bust the BPF instruction limit.

Note that BPF recently (Linux 4.18-ish?) added better support for unrolled loops, so this may become a BPF call if that capability is detected, and support a much higher unroll number.

Note I'm also assuming that we'll only want to support constant ints as the unroll() argument, at least to begin with, and not an expression like unroll($max).

print buffer issue with type sizes

With #46, the following script, biocompletion.bt:

struct tracepoint__block__block_rq_complete {
	long __do_not_use__;
	int dev;
	long sector;
	int nr_sector;
	int error;
	char rwbs[8];
	int data_loc_cmd;
}

tracepoint:block:block_rq_complete
{
	$p = ctx;
	$tparg = (tracepoint__block__block_rq_complete *)ctx;
	printf("manual offsets -: %d %d %d %s\n", *($p+8), *($p+16), *($p+24), str($p+32));
	printf("struct offsets 1: %d %d %d %s\n", $tparg->dev, $tparg->sector, $tparg->nr_sector, $tparg->rwbs);
	printf("struct offsets 2: %d ", $tparg->dev);
	printf("%d ", $tparg->sector);
	printf("%d ", $tparg->nr_sector);
	printf("%s\n\n", $tparg->rwbs);
}

produces the following output:

# ./src/bpftrace biocompletion.bt
Attaching 1 probe...
manual offsets -: 271581185 4088456 8 WM
struct offsets 1: 271581185 0 0
struct offsets 2: 271581185 4088456 8 WM

manual offsets -: 271581185 7751432 8 WM
struct offsets 1: 271581185 0 0
struct offsets 2: 271581185 7751432 8 WM

manual offsets -: 271581185 12477968 8 WM
struct offsets 1: 271581185 0 0
struct offsets 2: 271581185 12477968 8 WM

So the middle line (when passing all struct members to printf) is messed up, but the others are fine. I guess something is going wrong when packing those types into the perf buffer.

add hardware:branch-misses:

From attached_probe.cpp:

  else if (probe_.path == "branch-instructions" || probe_.path == "branches")
  {
    type = PERF_COUNT_HW_BRANCH_INSTRUCTIONS;
    defaultp = 100000;
  }
  else if (probe_.path == "bus-cycles")
  {
    type = PERF_COUNT_HW_BUS_CYCLES;
    defaultp = 100000;
  }

Oops. I missed PERF_COUNT_HW_BRANCH_MISSES by mistake -- it should be after branch-instructions. It should have the probe name "branch-misses", and the same defaultp of 100000. ast/semantic_analyser.cpp also needs it added.

test failures: LLVM 5 vs 6

From bpfrace_test:

[  FAILED  ] 3 tests, listed below:
[  FAILED  ] codegen.map_assign_string
[  FAILED  ] codegen.map_key_string
[  FAILED  ] codegen.string_propagation

they all look like a minor change in llvm, inserting "nonnull":

@@ -28,5 +28,5 @@
   %str.repack8 = getelementptr inbounds [64 x i8], [64 x i8]* %str, i64 0, i64 4
   %2 = bitcast i64* %\"@x_key\" to i8*
-  call void @llvm.memset.p0i8.i64(i8* %str.repack8, i8 0, i64 60, i32 1, i1 false)
+  call void @llvm.memset.p0i8.i64(i8* nonnull %str.repack8, i8 0, i64 60, i32 1, i1 false)
   call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %2)
   store i64 0, i64* %\"@x_key\", align 8

maybe this changed because of the llvm/clang version I'm using? I usually use 5.0, but looks like I'm on 6.0 on this system:

# dpkg -l | egrep 'llvm|clang'
ii  clang-6.0                           1:6.0.1~svn334776-1~exp1~20180726133222.87 amd64        C, C++ and Objective-C compiler
ii  libclang-6.0-dev                    1:6.0.1~svn334776-1~exp1~20180726133222.87 amd64        clang library - Development package
ii  libclang-common-6.0-dev             1:6.0.1~svn334776-1~exp1~20180726133222.87 amd64        clang library - Common development package
ii  libclang1-6.0:amd64                 1:6.0.1~svn334776-1~exp1~20180726133222.87 amd64        C interface to the clang library
ii  libllvm6.0:amd64                    1:6.0.1~svn334776-1~exp1~20180726133222.87 amd64        Modular compiler and toolchain technologies, runtime library
ii  llvm-6.0                            1:6.0.1~svn334776-1~exp1~20180726133222.87 amd64        Modular compiler and toolchain technologies
ii  llvm-6.0-dev                        1:6.0.1~svn334776-1~exp1~20180726133222.87 amd64        Modular compiler and toolchain technologies, libraries and headers
ii  llvm-6.0-runtime                    1:6.0.1~svn334776-1~exp1~20180726133222.87 amd64        Modular compiler and toolchain technologies, IR interpreter

add kaddr() call (bpftrace internals tutorial)

Since I was half way through coding this, it would be a good starting point for anyone else to learn some bpftrace internals.

The aim is to have a call, kaddr(), that takes a kernel symbol name and returns the address. For example, kaddr("avenrun") should return 0xffffffff8dd0bcd0 on my system, because that's its address:

# grep -w avenrun /proc/kallsyms
ffffffff8dd0bcd0 B avenrun

So in bpftrace, this program should work (and not print zero):

# ./src/bpftrace -e 'kprobe:do_nanosleep { printf("%x\n", kaddr("avenrun")); }'
Attaching 1 probe...
0
0
0

I'm just using do_nanosleep() as a dummy function to trace. If you get no output, you might need to run "sleep 1" in another window.

Here is my current diff:

diff --git a/src/ast/codegen_llvm.cpp b/src/ast/codegen_llvm.cpp
index 27fa477..307053b 100644
--- a/src/ast/codegen_llvm.cpp
+++ b/src/ast/codegen_llvm.cpp
@@ -342,6 +342,14 @@ void CodegenLLVM::visit(Call &call)
     b_.CreateStore(pid, pid_offset);
     expr_ = buf;
   }
+  else if (call.func == "kaddr")
+  {
+    uint64_t addr;
+    auto &name = static_cast<String&>(*call.vargs->at(0)).str;
+    addr = bpftrace_.resolve_kname(name.c_str());
+printf("BDG: name %s got %llx\n", name.c_str(), addr);
+    expr_ = b_.getInt64(addr);
+  }
   else if (call.func == "reg")
   {
     auto &reg_name = static_cast<String&>(*call.vargs->at(0)).str;
diff --git a/src/ast/semantic_analyser.cpp b/src/ast/semantic_analyser.cpp
index 8eb5744..c2060d3 100644
--- a/src/ast/semantic_analyser.cpp
+++ b/src/ast/semantic_analyser.cpp
@@ -206,6 +206,14 @@ void SemanticAnalyser::visit(Call &call)

     call.type = SizedType(Type::integer, 8);
   }
+  else if (call.func == "kaddr") {
+    if (check_nargs(call, 1)) {
+      if (check_arg(call, Type::string, 0, true)) {
+         ;
+      }
+    }
+    call.type = SizedType(Type::integer, 8);
+  }
   else if (call.func == "printf") {
     check_assignment(call, false, false);
     if (check_varargs(call, 1, 7)) {
diff --git a/src/bpftrace.cpp b/src/bpftrace.cpp
index 939d692..daaabaa 100644
--- a/src/bpftrace.cpp
+++ b/src/bpftrace.cpp
@@ -1062,6 +1062,16 @@ std::string BPFtrace::resolve_sym(uintptr_t addr, bool show_offset)
   return symbol.str();
 }

+uint64_t BPFtrace::resolve_kname(const char *name)
+{
+  uint64_t addr = 0;
+// BDG TODO I was hoping ksyms_ would support this (which maps to bcc library functions),
+// BDG but it doesn't. For now, we'll need to do it ourselves here. Eg, read /proc/kallsyms.
+// BDG There's a file I/O example in find_wildcard_matches. Later on, we could add this
+// BDG to bcc too, and call it via ksyms_.
+  return addr;
+}
+
 std::string BPFtrace::resolve_usym(uintptr_t addr, int pid, bool show_offset)
 {
   struct bcc_symbol sym;
diff --git a/src/bpftrace.h b/src/bpftrace.h
index 24d47fc..7cfd109 100644
--- a/src/bpftrace.h
+++ b/src/bpftrace.h
@@ -37,6 +37,7 @@ public:
   std::string get_stack(uint64_t stackidpid, bool ustack, int indent=0);
   std::string resolve_sym(uintptr_t addr, bool show_offset=false);
   std::string resolve_usym(uintptr_t addr, int pid, bool show_offset=false);
+  uint64_t resolve_kname(const char *name);
   std::string resolve_name(uint64_t name_id);
   int pid_;

diff --git a/tests/codegen.cpp b/tests/codegen.cpp
index 38918ca..c4e0bf3 100644
--- a/tests/codegen.cpp
+++ b/tests/codegen.cpp
@@ -880,6 +880,23 @@ attributes #1 = { argmemonly nounwind }
 )EXPECTED");
 }

+TEST(codegen, call_kaddr)
+{
+  test("kprobe:f { @x = kaddr(\"avenrun\") }",
+
+R"EXPECTED(; Function Attrs: nounwind
+declare i64 @llvm.bpf.pseudo(i64, i64) #0
+
+BDG run bpftrace_test, and copy-n-paste the '+' diff output here until bpftrace_test passes this test
+
+; Function Attrs: argmemonly nounwind
+declare void @llvm.lifetime.end.p0i8(i64, i8* nocapture) #1
+
+attributes #0 = { nounwind }
+attributes #1 = { argmemonly nounwind }
+)EXPECTED");
+}
+
 TEST(codegen, call_hist)
 {
   test("kprobe:f { @x = hist(pid) }",
diff --git a/tests/parser.cpp b/tests/parser.cpp
index cff201b..b94b7cb 100644
--- a/tests/parser.cpp
+++ b/tests/parser.cpp
@@ -296,6 +296,17 @@ TEST(Parser, call_unknown_function)
       "  call: myfunc\n");
 }

+TEST(Parser, call_kaddr)
+{
+  test("kprobe:f { @ = kaddr(\"avenrun\") }",
+      "Program\n"
+      " kprobe:f\n"
+      "  =\n"
+      "   map: @\n"
+      "   call: kaddr\n"
+      "    string: avenrun\n");
+}
+
 TEST(Parser, multiple_probes)
 {
   test("kprobe:sys_open { 1; } kretprobe:sys_open { 2; }",
diff --git a/tests/semantic_analyser.cpp b/tests/semantic_analyser.cpp
index d6e26b8..730b0ea 100644
--- a/tests/semantic_analyser.cpp
+++ b/tests/semantic_analyser.cpp
@@ -260,6 +260,14 @@ TEST(semantic_analyser, call_usym)
   test("kprobe:f { usym(\"hello\"); }", 10);
 }

+TEST(semantic_analyser, call_kaddr)
+{
+  test("kprobe:f { kaddr(\"avenrun\"); }", 0);
+  test("kprobe:f { @x = kaddr(\"avenrun\"); }", 0);
+  test("kprobe:f { kaddr(); }", 1);
+  test("kprobe:f { kaddr(123); }", 1);
+}
+
 TEST(semantic_analyser, call_reg)
 {
   test("kprobe:f { reg(\"ip\"); }", 0);

Delete lines that contain "BDG" -- that's my own debugging and working comments. Note that I also already added the tests.

This capability will ultimately be used to read kernel variables. For example, this works (on my system):

# ./src/bpftrace -e 'kprobe:do_nanosleep { printf("%x\n", *0xffffffff8dd0bcd0); }'
Attaching 1 probe...
16
14
14
[...]

I'm dereferencing the avenrun location, which contains metrics used for load average calculations. Of course, I'd much rather type kaddr("avenrun") than the raw (kernel/server-specific) address.

For this ticket, the above kaddr() one-liner should print the address, and ./build/tests/bpftrace_test should not introduce any new failures.

improve cast errors

Just to simulate one of these, I used:

diff --git a/src/ast/codegen_llvm.cpp b/src/ast/codegen_llvm.cpp
index 27fa477..f97467f 100644
--- a/src/ast/codegen_llvm.cpp
+++ b/src/ast/codegen_llvm.cpp
@@ -47,7 +47,7 @@ void CodegenLLVM::visit(Builtin &builtin)
     Value *pidtgid = b_.CreateGetPidTgid();
     if (builtin.ident == "pid")
     {
-      expr_ = b_.CreateLShr(pidtgid, 32);
+      expr_ = b_.CreateLoad(b_.CreateLShr(pidtgid, 32));
     }
     else if (builtin.ident == "tid")
     {

and then:

# ./src/bpftrace -v -e 'kprobe:do_nanosleep { printf("%d\n", pid); }'
bpftrace: /usr/lib/llvm-6.0/include/llvm/Support/Casting.h:255: typename llvm::cast_retty<X, Y*>::ret_type llvm::cast(Y*) [with X = llvm::PointerType; Y = llvm::Type; typename llvm::cast_retty<X, Y*>::ret_type = llvm::PointerType*]: Assertion `isa<X>(Val) && "cast<Ty>() argument of incompatible type!"' failed.
Aborted

This error message is not very readable. Is there any way to improve this?

64-bit int constants become 32 bit

I had something weird with a script that I've resolved to this:

# bpftrace -e 'BEGIN { $a = 0x4444444412345678; printf("%llx %x\n", $a, $a); }'
Attaching 1 probe...
12345678 12345678
^C

The first should print the full 64-bit number, right?

Looking at the -d output for the first shows it's only keeping 32 bits (decimal 305419896):

define i64 @BEGIN(i8*) local_unnamed_addr section "s_BEGIN" {
entry:
  %printf_args = alloca %printf_t, align 8
  %1 = bitcast %printf_t* %printf_args to i8*
  call void @llvm.lifetime.start.p0i8(i64 -1, i8* nonnull %1)
  %2 = getelementptr inbounds %printf_t, %printf_t* %printf_args, i64 0, i32 1
  %3 = getelementptr inbounds %printf_t, %printf_t* %printf_args, i64 0, i32 0
  store i64 0, i64* %3, align 8
  store i64 305419896, i64* %2, align 8
  %4 = getelementptr inbounds %printf_t, %printf_t* %printf_args, i64 0, i32 2
  store i64 305419896, i64* %4, align 8
  %pseudo = tail call i64 @llvm.bpf.pseudo(i64 1, i64 1)
  %get_cpu_id = tail call i64 inttoptr (i64 8 to i64 ()*)()
  %perf_event_output = call i64 inttoptr (i64 25 to i64 (i8*, i8*, i64, i8*, i64)*)(i8* %0, i64 %pseudo, i64 %get_cpu_id, %printf_t* nonnull %printf_args, i64 24)
  call void @llvm.lifetime.end.p0i8(i64 -1, i8* nonnull %1)
  ret i64 0
}

Here's a different, and I think related, issue:

# bpftrace -e 'BEGIN { $a = 0x123456789abcdef; printf("%llx %x\n", $a, $a); }'
Attaching 1 probe...
ffffffff89abcdef 89abcdef
^C

I think it's keeping 0x89abcdef, but then treating it as a signed 32-bit int, hence the output of 0xffffffff89abcdef.

Just taking a quick look, I can see by the time it hits this in ast/codegen_llvm.cpp:

void CodegenLLVM::visit(Integer &integer)
{
  expr_ = b_.getInt64(integer.n);
}

integer.n is already the 32-bit version. As it is in the AST shown by -d:

Program
 BEGIN
  =
   variable: $a
   int: 305419896
  call: printf
   string: %llx %x\n
   variable: $a
   variable: $a

More debugging...

I checked, and it's still 64 bit in this line in lexer.l:

{int}                   { return Parser::make_INT(strtoul(yytext, NULL, 0), loc); }

Yes, even if I change that to strtoull(), it later on becomes 32 bit.

But by parser.yy, it's 32 bit:

expr : INT             { $$ = new ast::Integer($1); }

So it's being lost in between lexer.l and parser.yy.

I thought it might be due to this in ast/ast.h:

class Integer : public Expression {
public:
  explicit Integer(int n) : n(n) { is_literal = true; }
  int n;

  void accept(Visitor &v) override;
};

But changing int to something else (eg, long unsigned int) didn't fix it. hmm. Maybe I was doing it wrong, or missed something.

name vector::_M_range_check error with multiple events and printf()

This fails:

 # ./src/bpftrace -e 'tracepoint:syscalls:sys_enter_sync,tracepoint:syscalls:sys_enter_syncfs { printf("%s\n", name); }'
terminate called after throwing an instance of 'std::out_of_range'
  what():  vector::_M_range_check: __n (which is 1) >= this->size() (which is 1)
Aborted

But a single event works:

# ./src/bpftrace -e 'tracepoint:syscalls:sys_enter_sync { printf("%s\n", name); }'
Attaching 1 probe...
tracepoint:syscalls:sys_enter_sync

Or, as a map, it also works:

# ./src/bpftrace -e 'tracepoint:syscalls:sys_enter_sync,tracepoint:syscalls:sys_enter_syncfs { @[name] = count(); }'
Attaching 2 probes...
^C

@[tracepoint:syscalls:sys_enter_sync]: 1

So something is wrong with the combination of: multiple events + printf() + name

lhist truncate zero ranges

lhist() currently works like this:

# ./src/bpftrace -e 'kprobe:do_nanosleep { @ = lhist(pid, 0, 65000, 1000); }'
Attaching 1 probe...
^C

@:
(...,0]                0 |                                                    |
[0, 1000)              0 |                                                    |
[1000, 2000)           4 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[2000, 3000)           0 |                                                    |
[3000, 4000)           0 |                                                    |
[4000, 5000)           0 |                                                    |
[5000, 6000)           0 |                                                    |
[6000, 7000)           0 |                                                    |
[7000, 8000)           0 |                                                    |
[8000, 9000)           0 |                                                    |
[9000, 10000)          2 |@@@@@@@@@@@@@@@@@@@@@@@@@@                          |
[10000, 11000)         0 |                                                    |
[11000, 12000)         0 |                                                    |
[12000, 13000)         0 |                                                    |
[13000, 14000)         0 |                                                    |
[14000, 15000)         0 |                                                    |
[15000, 16000)         0 |                                                    |
[16000, 17000)         0 |                                                    |
[17000, 18000)         0 |                                                    |
[18000, 19000)         0 |                                                    |
[19000, 20000)         0 |                                                    |
[20000, 21000)         0 |                                                    |
[21000, 22000)         0 |                                                    |
[22000, 23000)         0 |                                                    |
[23000, 24000)         0 |                                                    |
[24000, 25000)         0 |                                                    |
[25000, 26000)         0 |                                                    |
[26000, 27000)         0 |                                                    |
[27000, 28000)         0 |                                                    |
[28000, 29000)         0 |                                                    |
[29000, 30000)         0 |                                                    |
[30000, 31000)         0 |                                                    |
[31000, 32000)         0 |                                                    |
[32000, 33000)         0 |                                                    |
[33000, 34000)         0 |                                                    |
[34000, 35000)         0 |                                                    |
[35000, 36000)         0 |                                                    |
[36000, 37000)         0 |                                                    |
[37000, 38000)         0 |                                                    |
[38000, 39000)         0 |                                                    |
[39000, 40000)         0 |                                                    |
[40000, 41000)         0 |                                                    |
[41000, 42000)         0 |                                                    |
[42000, 43000)         0 |                                                    |
[43000, 44000)         0 |                                                    |
[44000, 45000)         0 |                                                    |
[45000, 46000)         0 |                                                    |
[46000, 47000)         0 |                                                    |
[47000, 48000)         0 |                                                    |
[48000, 49000)         0 |                                                    |
[49000, 50000)         0 |                                                    |
[50000, 51000)         0 |                                                    |
[51000, 52000)         0 |                                                    |
[52000, 53000)         0 |                                                    |
[53000, 54000)         0 |                                                    |
[54000, 55000)         0 |                                                    |
[55000, 56000)         0 |                                                    |
[56000, 57000)         0 |                                                    |
[57000, 58000)         0 |                                                    |
[58000, 59000)         0 |                                                    |
[59000, 60000)         0 |                                                    |
[60000, 61000)         0 |                                                    |
[61000, 62000)         0 |                                                    |
[62000, 63000)         0 |                                                    |
[63000, 64000)         0 |                                                    |
[64000, 65000)         0 |                                                    |
[65000,...)            0 |                                                    |

Outside the range of values, we don't need to print all those zero lines. The output should be this:

# ./src/bpftrace -e 'kprobe:do_nanosleep { @ = lhist(pid, 0, 65000, 1000); }'
Attaching 1 probe...
^C

@:
[1000, 2000)           4 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[2000, 3000)           0 |                                                    |
[3000, 4000)           0 |                                                    |
[4000, 5000)           0 |                                                    |
[5000, 6000)           0 |                                                    |
[6000, 7000)           0 |                                                    |
[7000, 8000)           0 |                                                    |
[8000, 9000)           0 |                                                    |
[9000, 10000)          2 |@@@@@@@@@@@@@@@@@@@@@@@@@@                          |

I imagine this involves fixing BPFtrace::print_lhist() in bpftrace.cpp.

hardware custom PMC probes

The Linux perf commands allows custom PMCs (performance monitoring counters) to be specified. For example, CPU_CLK_UNHALTED.THREAD_P, which is accessed via Event Select = 0x3c and Umask = 0x0:

perf stat -e r003c -a         # r == raw mode
perf stat -e cpu/event=0x3c,umask=0x0/ -a

Flags can also be added, eg:

perf stat -e r003c:u -a       # user-mode only
perf stat -e r003c:up -a      # user-mode and first level of precise
perf stat -e r003c:upp -a     # user-mode and second level of precise

All the PMCs are listed in the Intel 64 and IA-32 Architectures Software Developer's Manual Volume 3B: System Programming Guide, Part 2 and the BIOS and Kernel Developer's Guide (BKDG) For AMD Family 10h Processors. I last wrote about them here: http://www.brendangregg.com/perf.html#HardwareEvents

For this addition, we can discuss how best to present them, but here's a starting suggestion, where both forms are supported:

hardware:event=0x3c,umask=0x0:            # default count
hardware:event=0x3c,umask=0x0:1000000     # specified count
hardware:event=0x3c,umask=0x0,flags=up:   # example with flags
hardware:r003c:            # default count
hardware:r003c:1000000     # specified count
hardware:r003c,flags=up:   # example with flags

I think as version 1 of this change we should do the verbose version ("event=..."), and only do the raw version ("rMMEE") later on if people really need it.

not resolving dev_t type properly

In the #47 issue, I had modified types to get that far. Here's the real types from /sys:

# cat /sys/kernel/debug/tracing/events/block/block_rq_complete/format
name: block_rq_complete
ID: 1084
format:
        field:unsigned short common_type;       offset:0;       size:2; signed:0;
        field:unsigned char common_flags;       offset:2;       size:1; signed:0;
        field:unsigned char common_preempt_count;       offset:3;       size:1; signed:0;
        field:int common_pid;   offset:4;       size:4; signed:1;

        field:dev_t dev;        offset:8;       size:4; signed:0;
        field:sector_t sector;  offset:16;      size:8; signed:0;
        field:unsigned int nr_sector;   offset:24;      size:4; signed:0;
        field:int error;        offset:28;      size:4; signed:1;
        field:char rwbs[8];     offset:32;      size:8; signed:1;
        field:__data_loc char[] cmd;    offset:40;      size:4; signed:1;

 print fmt: "%d,%d %s (%s) %llu + %u [%d]", ((unsigned int) ((REC->dev) >> 20)), ((unsigned int) ((REC->dev) & ((1U << 20) - 1))), REC->rwbs, __get_str(cmd), (unsigned long long)REC->sector, REC->nr_sector, REC->error

If we use those:

struct tracepoint__block__block_rq_complete {
        u64 __do_not_use__;
        dev_t dev;
        sector_t sector;
        unsigned int nr_sector;
        int error;
        char rwbs[8];
        int data_loc_cmd;
}

tracepoint:block:block_rq_complete
{
        $p = ctx;
        $tparg = (tracepoint__block__block_rq_complete *)ctx;
        printf("manual offsets -: %d %d %d %s\n", *($p+8), *($p+16), *($p+24), str($p+32));
        printf("struct offsets 1: %d %d %d %s\n", $tparg->dev, $tparg->sector, $tparg->nr_sector, $tparg->rwbs);
        printf("struct offsets 2: %d ", $tparg->dev);
        printf("%d ", $tparg->sector);
        printf("%d ", $tparg->nr_sector);
        printf("%s\n\n", $tparg->rwbs);
}

we get:

# ./src/bpftrace biocompletion.bt
Attaching 1 probe...
manual offsets -: 271581185 18035560 8 WS
struct offsets 1: 1893854416 1893854416 1893854416 ���p����
struct offsets 2: 1893854416 1893854416 1893854416 ���p����

manual offsets -: 271581184 3227776 56 W
struct offsets 1: 1893854416 1893854416 1893854416 ���p����
struct offsets 2: 1893854416 1893854416 1893854416 ���p����

I suspect it's not resolving u64 or dev_t etc, so is returning offset zero for everything, and therefore returning the internal "__do_not_use__" member in all locations.

compound assignment operators

This should work:

$x += 1;

as a shortcut for $x = $x + 1;.

The same for -=, *=, /=. This might be doable just from additions to parser.yy.

A separate ticket can track pre- and post-increment (++$x, $x++), if we want them too.

Multiple map failure

# ./src/bpftrace -e 'kprobe:do_nanosleep { @a = pid; @b = pid; @c = pid; @d = pid; @e = pid; }'
Error creating map: '@e'
Attaching 1 probe...
Error loading program: kprobe:do_nanosleep (try -v)

Why does this fail? -v doesn't help that much. Maybe -d is more useful, but I still don't see it.

If you delete one of the maps, it works. So it sounds like it's hitting a limit: like the BPF instruction limit or BPF stack limit. What's the issue, can it be fixed, or can the error message be more meaningful?

join() as a printf() argument

This doesn't work:

bpftrace -e 'kprobe:sys_execve { printf("%s\n", join(arg1)); }'

(See #25 for notes about kprobe:sys_execve going away.)

Is it possible to have join() be used as a printf() argument? This may be more trouble that it's worth.

tracepoint arguments

@ajor is working on this. This depends on #31.

These should work:

bpftrace -e 'tracepoint:block:block_rq_complete { printf("%d %s %d", pid, comm, args->nr_sector * 512); }'
bpftrace -e 'tracepoint:syscalls:sys_exit_read /args->ret > 0/ { @ = sum(args->ret) }'

args should be the tracepoint argument struct from /sys, eg:

# cat /sys/kernel/debug/tracing/events/block/block_rq_complete/format
name: block_rq_complete
ID: 1084
format:
	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
	field:int common_pid;	offset:4;	size:4;	signed:1;

	field:dev_t dev;	offset:8;	size:4;	signed:0;
	field:sector_t sector;	offset:16;	size:8;	signed:0;
	field:unsigned int nr_sector;	offset:24;	size:4;	signed:0;
	field:int error;	offset:28;	size:4;	signed:1;
	field:char rwbs[8];	offset:32;	size:8;	signed:1;
	field:__data_loc char[] cmd;	offset:40;	size:4;	signed:1;

print fmt: "%d,%d %s (%s) %llu + %u [%d]", ((unsigned int) ((REC->dev) >> 20)), ((unsigned int) ((REC->dev) & ((1U << 20) - 1))), REC->rwbs, __get_str(cmd), (unsigned long long)REC->sector, REC->nr_sector, REC->error

iovisor / bpftrace Goto Github PK

bpftrace's People

Contributors

Stargazers

Watchers

Forkers

bpftrace's Issues

A) type detection

B) integers

Zero keys

One or more keys

Recommend Projects

Recommend Topics

Recommend Org

Jobs