Comments (3)
Hi.
I investigated on the code.
First, the bug does not occur everytime, I would say it occurs 1/30 of the time when we are not lucky.
Regarding the code, I think the following:
- I would say that BPF code is correct.
- The problem comes from our handling in
events.go
.
More particularly, I think this snippet cause the problem:
if e.Typ == 0 && i+e.ContNr < len(events) {
for j := 0; j < e.ContNr; j++ {
param := events[i+1+j].Param
paramIdx := events[i+1+j].ParamIdx
argsStr[paramIdx] = ¶m
}
i += e.ContNr
}
Indeed, we do not check the type of events[i+1+j]
to see if this event continues the event before.
When we have multiple CPU, I think this is possible that events following one event are not continuing it but are totally different.
So, I would suggest to change it like this:
diff --git a/pkg/straceback/event.go b/pkg/straceback/event.go
index 7f942fc..ff96e1d 100644
--- a/pkg/straceback/event.go
+++ b/pkg/straceback/event.go
@@ -139,10 +139,16 @@ func eventsToString(events []Event) (ret string) {
switch e.Typ {
case 0:
var argsStr [6]*string
- if e.Typ == 0 && i+e.ContNr < len(events) {
+ if i+e.ContNr < len(events) {
for j := 0; j < e.ContNr; j++ {
- param := events[i+1+j].Param
- paramIdx := events[i+1+j].ParamIdx
+ continuedEvent := events[i + 1 + j]
+
+ if continuedEvent.Typ != 2 {
+ continue
+ }
+
+ param := continuedEvent.Param
+ paramIdx := continuedEvent.ParamIdx
ret += fmt.Sprintf("Cont.type: %d Param: %s ParamIdx: %d ", events[i+1+j].Typ, param, paramIdx)
argsStr[paramIdx] = ¶m
}
But I am not sure this would suffice as I was not able to test it thoroughly.
Best regards.
from traceloop.
I think it is a good idea to check the type of the event before processing it. But I don't think it would fix this:
- After fetching events from ring buffers of each cpu, the events are re-ordered by the timestamp field in userspace.
- The timestamps of SYSCALL_EVENT_TYPE_CONT events are set to the same value as the event SYSCALL_EVENT_TYPE_ENTER that created them.
- So SYSCALL_EVENT_TYPE_CONT events from the same syscall are next to each other.
Example (source):
00:00.039139009 cpu#0 pid 42218 [runc:[2:INIT]] ...openat() = 3
00:00.039139009 cpu#0 pid 42218 [runc:[2:INIT]] ...openat() = 3
00:00.039172211 cpu#0 pid 42218 [runc:[2:INIT]] futex(94492690465560, 129, 1, 0, 0, 0)...
00:00.039172211 cpu#0 pid 42218 [runc:[2:INIT]] futex(94492690465560, 129, 1, 0, 0, 0) = 1
00:00.039182512 cpu#0 pid 42218 [runc:[2:INIT]] ...futex() = 1
00:00.039189913 cpu#0 pid 42218 [runc:[2:INIT]] write("", 824634585143, 1)...
00:00.039189913 "0"
00:00.039189913 "0"
00:00.039193813 cpu#1 pid 42219 [runc:[2:INIT]] ...futex() = 0
00:00.039210214 cpu#0 pid 42218 [runc:[2:INIT]] ...write() = 1
00:00.039210214 cpu#0 pid 42218 [runc:[2:INIT]] ...write() = 1
00:00.039212015 cpu#1 pid 42219 [runc:[2:INIT]] epoll_pwait(7, 140642389710816, 128, 0, 0, 973)...
00:00.039216815 cpu#0 pid 42218 [runc:[2:INIT]] close(5)...
00:00.039216815 cpu#0 pid 42218 [runc:[2:INIT]] close(5)...
00:00.039219015 cpu#1 pid 42219 [runc:[2:INIT]] ...epoll_pwait() = 0
00:00.039222615 cpu#0 pid 42218 [runc:[2:INIT]] ...close() = 0
If you add the test if continuedEvent.Typ != 2
, I think it should break
from the loop instead of continuing to avoid consuming events.
It seems the root cause is that all events for cpu#0 are present twice in the array. This would cause the printing problem because the second "write" event would be consumed by eventsToString() while trying to get the "cont" event. Then, the 2 "cont" events are printed out of context.
from traceloop.
Closing as traceloop
was totally reworked in inspektor-gadget/inspektor-gadget#1023.
I am wondering if we should even archive this repository?
from traceloop.
Related Issues (20)
- Live tracing HOT 1
- Display errors not as unsigned big numbers but -1 and lookup error codes HOT 1
- syscall fstat() parameters are not displayed HOT 3
- fix Go 11 modules
- error reporting with no such file or directory when running traceloop with docker HOT 3
- Keep compressed logs of retired tracers in a separate data structure HOT 2
- Wrong syscalls reported for i386 binaries
- Add support for ARM HOT 3
- Don't print buffer when read() returns 0 on EOF HOT 1
- kernel trace with "General protection fault in user access. Non-canonical address?"
- Option to enable/disable log to stdout HOT 1
- traceloop fails to publish traces on cri-o
- trace creation date sometimes missing
- Registering a cgroup sometimes fails
- HTTP API does not return HTTP error codes
- Print all syscalls with the same details as strace
- Add mntns filter support
- Build failure on getting github.com/fatih/hclfmt
- fatal error: concurrent map iteration and map write
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from traceloop.