Comments (7)
I tried to install OpenJDK with debug information via debuginfo-install, and that didn't seem to help (I still don't get line numbers on Java's functions), but strangely, now I get a different stack trace - the bug is now in _callout_thread():
#13 0x000000000023f74a in _callout_thread ()
at ../../bsd/porting/callout.cc:166
Line 166 of callout.cc is
fn(arg);
but fn==0, causing a crash.
from osv.
Hi @nyh,
Can you please turn on the tracing for the callouts and attach the part of the trace that is relevant?
sudo ./scripts/run.py -n -e "--trace=callout* " -c1 -m1G
We can figure out what is the flow that lead to this crash.
Although we need to check fn != nullptr, it seems to me that it's a bug with a freed callout or something else.
from osv.
Some new information:
-
The bug still exists with -c1 (one CPU).
-
In one run the crash was a bit earlier, in mtx_lock(c_mtx) a few lines before calling the function.
-
In another run, where the crash was again in running fn, fn was not zero but was rather tcp_timer_delack() (tcp_timer.c:180). This function called rwlock which called mutex lock and that crashed.
I am guessing there's some sort of race between running this "delack" callout and deleting the socket (?) where this "delack" is allocated, or something? As I said I'm not familiar with this code, so it takes me a while to understand what's going on here.
I'll provide traces later - right now I get an error trying to show the trace, and I'm trying to debug this problem:
(gdb) osv trace
Python Exception <class 'gdb.error'> There is no member named sig.:
Error occurred in Python command: There is no member named sig.
from osv.
Here is one example trace as you asked for
(note to self: before "osv trace" I need to do "p trace_callout_init.sig". I don't know why...)
0xffffc00033c8d000 0 1371986488.081629 callout_stop C=0xffffc00032620b08 flags=6, is_drain=0
0xffffc00033c8d000 0 1371986488.081632 callout_stop C=0xffffc00032620b50 flags=6, is_drain=0
0xffffc00033c8d000 0 1371986488.081636 callout_stop C=0xffffc00032620b98 flags=0, is_drain=0
0xffffc00033c8d000 0 1371986488.085358 callout_reset C=0xffffc0003422a810 to_ticks=2999999 fn=0x000000000028cca0 arg=0xffffc0003422a7d0
0xffffc00033c8d000 0 1371986488.085363 callout_stop C=0xffffc0003422a810 flags=6, is_drain=0
0xffffc00033c8d000 0 1371986488.085594 callout_init C=0xffffc00032620a78
0xffffc00033c8d000 0 1371986488.085596 callout_init C=0xffffc00032620ac0
0xffffc00033c8d000 0 1371986488.085596 callout_init C=0xffffc00032620b08
0xffffc00033c8d000 0 1371986488.085597 callout_init C=0xffffc00032620b50
0xffffc00033c8d000 0 1371986488.085598 callout_init C=0xffffc00032620b98
0xffffc00033c8d000 0 1371986488.085669 callout_reset C=0xffffc00032620b08 to_ticks=75000000 fn=0x000000000028fce0 arg=0xffffc00032620800
0xffffc00033c8d000 0 1371986488.085671 callout_stop C=0xffffc00032620b08 flags=0, is_drain=0
0xffffc00033c8d000 0 1371986488.085699 callout_reset C=0xffffc00032620b08 to_ticks=2905032704 fn=0x000000000028fce0 arg=0xffffc00032620800
0xffffc00033c8d000 0 1371986488.085700 callout_stop C=0xffffc00032620b08 flags=6, is_drain=0
0xffffc00033c8d000 0 1371986488.085706 callout_stop C=0xffffc00032620a78 flags=0, is_drain=0
0xffffc00033c8d000 0 1371986488.087241 callout_reset C=0xffffc00032620b08 to_ticks=2905032704 fn=0x000000000028fce0 arg=0xffffc00032620800
0xffffc00033c8d000 0 1371986488.087244 callout_stop C=0xffffc00032620b08 flags=6, is_drain=0
0xffffc00033c8d000 0 1371986488.087260 callout_reset C=0xffffc00032620b98 to_ticks=100000 fn=0x000000000028f8d4 arg=0xffffc00032620800
0xffffc00033c8d000 0 1371986488.087261 callout_stop C=0xffffc00032620b98 flags=0, is_drain=0
0xffffc00034320000 0 1371986488.089337 callout_thread_dispatching C=0xffffc00032625b98 fn=0x0000000000000000
It looks like race between callout_stop and callout_thread_dispatching? Guy does this help?
from osv.
Yes, no point in continuing this discussion both here and on email, let's continue via email.
from osv.
Guy found the bug.
_callout_thread() finds the first callout, then waits the necessary time, and then runs it. But it's possible by the time the wait ends, the callout was already deleted, and we shouldn't run it.
Guy's fix:
diff --git a/bsd/porting/callout.cc b/bsd/porting/callout.cc
index 92ed775..196a06f 100644
--- a/bsd/porting/callout.cc
+++ b/bsd/porting/callout.cc
@@ -135,7 +135,7 @@ static void _callout_thread(void)
expired = t.expired();
}
- if (!expired) {
+ if ((!expired) || (!callouts::have_callout(c))) {
trace_callout_thread_retry(c);
continue;
from osv.
Sorry, It's one of those "How did it work before?", very stupid bugs.
Thanks for finding and investigating this!
from osv.
Related Issues (20)
- Blocking signals may prevent execution of corresponding signal handler
- Running signal handler that uses thread local variables in statically linked executables crashes HOT 1
- how to solve it HOT 1
- I have two problems. HOT 1
- why? HOT 1
- waitqueue disarm() sometimes misbehaves HOT 5
- Implement GICv3
- Add MSI support to aarch64 port HOT 1
- OSv does not support 5-level paging HOT 10
- Add acpica support to aarch64
- Support AWS graviton
- Compile errors on archlinux host HOT 1
- Asking for support for archlinux host HOT 1
- Error reading disk (real mode): 00000000000000e7 HOT 1
- Undefined references to osv's glibc layer HOT 17
- Failed to run RVVM in osv HOT 10
- Failing build on Fedora 39 HOT 2
- ramfs and /dev/ HOT 1
- posix_memalign fails when size and alignment is large HOT 1
- Missing setfsuid() implementation HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from osv.