GithubHelp home page GithubHelp logo

Comments (13)

rustedsword avatar rustedsword commented on June 29, 2024 2

So, this is how I understand what is happening:

  1. write_handler: nxt_conn_io_write() is called with only one buffer in the buffer chain (i.e when b->next == NULL) and its size equals sb.limit 10485760 bytes (1024 * 1024 * 10)
  2. nxt_conn_io_write() sends all contents of this single buffer by calling c->io->sendbuf() (nxt_conn_io_sendbuf())
  3. nxt_conn_io_write() then marks this buffer as 'used' by setting mem.pos = mem.end by calling nxt_sendbuf_update()
  4. nxt_conn_io_write() is called again before c->write_state->ready_handler is called (nxt_h1p_conn_sent() function). this function should trim the buffer list by removing 'used' buffers from the front of the list.
  5. Now we are inside of nxt_conn_io_write() and have the same chain of buffers: 1 buffer that was already marked as 'used' during the previous call to nxt_conn_io_write().
  6. when calling c->io->sendbuf() (nxt_conn_io_sendbuf()) with a single buffer in the list that was already marked as 'used' by setting mem.pos = mem.end on the previous run nxt_sendbuf_mem_coalesce0() correctly returns 0, but the code in nxt_conn_io_sendbuf() does not handle the situation when sb->sync == 0 and then it goes to dereference sb->buf, which is NULL at this point

if nxt_conn_io_write() is called again with more than one buffer in the list and when the first buffer in the list was already processed during the previous call (it contains mem.pos = mem.end) then nxt_sendbuf_mem_coalesce0() iterates to the next buffer and returns a number of iovs to send.

So, i believe to fix this in a normal way:

  1. do not put nxt_conn_io_write() to the queue more than once. this function should put itself to the queue if there are more buffers to send.
  2. remove buffers from the list as soon as they are used, do not delay that to ready_handler

But... for a simple fix it seems enough to just check for (niov == 0 && sb->buf == NULL) in the nxt_conn_io_sendbuf()

from unit.

ac000 avatar ac000 commented on June 29, 2024

Hi @rustedsword Thanks for your report and initial investigations, thanks for getting the backtrace from the coredump.

I'll see if i can repreducing this, thanks for the script! (though I probably won't be using docker).

from unit.

Tatikoma avatar Tatikoma commented on June 29, 2024

it only reproduces when php reads file. It doesnt fault with following code:

<?php
$chunkSize = 1048576;
$iterations = 1024;
for ($i = 0; $i < $iterations; $i++) {
    echo openssl_random_pseudo_bytes($chunkSize);
}

from unit.

ac000 avatar ac000 commented on June 29, 2024

Just a quick note to say that I can indeed reproduce this...

EDIT: Although it is by no means 100% reproducible, once every couple dozen fifty times or so...

EDIT 2: Reliably reproduces when running unit under strace -f...

from unit.

ac000 avatar ac000 commented on June 29, 2024

The failure scenario seems to be when we get into the following situation in nxt_conn_io_sendbuf()

After the call to nxt_sendbuf_mem_coalesce0()

...
sb->buf : (nil), niov : 1, sb->sync : 0
sb->buf : (nil), niov : 0, sb->sync : 0

When both niov and sb->sync are 0 we then try to deference the NULL pointer sb->buf

By comparison, the tail end of the working state

...
sb->buf : (nil), niov : 1, sb->sync : 1
sb->buf : (nil), niov : 0, sb->sync : 1

from unit.

ac000 avatar ac000 commented on June 29, 2024

We get into the above situation because b->mem.pos is never NULL in nxt_sendbuf_mem_coalesce0()

b is a linked list of buffers and the last one should look like

(gdb) p *b->next->next->next->next->next->next
$10 = {
  data = 0x0,
  completion_handler = 0x433bba <nxt_router_http_request_done>,
  parent = 0x7fb454003850,
  next = 0x0,
  retain = 0,
  cache_hint = 0 '\000',
  is_file = 0 '\000',
  is_mmap = 0 '\000',
  is_port_mmap = 0 '\000',
  is_sync = 1 '\001',
  is_nobuf = 0 '\000',
  is_flush = 0 '\000',
  is_last = 1 '\001',
  is_port_mmap_sent = 0 '\000',
  is_ts = 0 '\000',
  mem = {
    pos = 0x0,
    free = 0x7fb454003d30 "",
    start = 0x800000018 <error: Cannot access memory at address 0x800000018>,
    end = 0x7fb454003790 "\220:"
  },
  file = 0x0,
  file_pos = 3,
  file_end = 60196710123
}

However we sometimes end up in a situation like

(gdb) p *b
$1 = {
  data = 0x7f677c000c30,
  completion_handler = 0x40f15c <nxt_port_mmap_buf_completion>,
  parent = 0x7f677c005180,
  next = 0x0,
  retain = 0,
  cache_hint = 0 '\000',
  is_file = 0 '\000',
  is_mmap = 0 '\000',
  is_port_mmap = 1 '\001',
  is_sync = 0 '\000',
  is_nobuf = 0 '\000',
  is_flush = 0 '\000',
  is_last = 0 '\000',
  is_port_mmap_sent = 0 '\000',
  is_ts = 1 '\001',
  mem = {
    pos = 0x7f677a1fd000 "\001",
    free = 0x7f677a1fd000 "\001",
    start = 0x7f67797fd000 "",
    end = 0x7f677a1fd000 "\001"
  },
  file = 0x0,
  file_pos = 4302261,
  file_end = 5252496
}

Trying to track down where these buffers are created...

from unit.

ac000 avatar ac000 commented on June 29, 2024

At the moment I'm not seeing anything that makes this obviously a PHP specific thing.

When trying ti to replicate this from something else, e.g a WASI 0.2.0 component that simply attempts to send a file, small files (maybe less < 256MiB) seem to work reliably, 256MiB works sometimes, often hangs and I've seen the router process crash once.

Here's the crash backtrace

#0  0x00007feffac5ec71 in nxt_nncq_head (q=0x7feffb3f500c) at src/nxt_nncq.h:27
27          return q->head;
[Current thread is 1 (Thread 0x7feff98006c0 (LWP 220568))]
(gdb) bt
#0  0x00007feffac5ec71 in nxt_nncq_head (q=0x7feffb3f500c) at src/nxt_nncq.h:27
#1  0x00007feffac5ee8f in nxt_nncq_dequeue (q=0x7feffb3f500c)
    at src/nxt_nncq.h:138
#2  0x00007feffac6c31a in nxt_port_queue_recv (p=0x5df5a0, q=0x7feffb3e5000)
    at src/nxt_port_queue.h:84
#3  nxt_unit_port_queue_recv (port=0x6673f0, rbuf=0x5df540)
    at src/nxt_unit.c:6261
#4  0x00007feffac6be44 in nxt_unit_ctx_port_recv (ctx=0x5f31e8, port=0x6673f0, 
    rbuf=0x5df540) at src/nxt_unit.c:6062
#5  0x00007feffac663d1 in nxt_unit_wait_shm_ack (ctx=0x5f31e8)
    at src/nxt_unit.c:3660
#6  0x00007feffac6624d in nxt_unit_mmap_get (ctx=0x5f31e8, port=0x634930, 
    c=0x7feff97fea40, n=0x7feff97fea44, min_n=1) at src/nxt_unit.c:3589
#7  0x00007feffac66f0e in nxt_unit_get_outgoing_buf (ctx=0x5f31e8, 
    port=0x634930, size=10485760, min_size=16384, mmap_buf=0x7feff97feeb0, 
    local_buf=0x7feff97feaa0 "") at src/nxt_unit.c:3968
#8  0x00007feffac6498e in nxt_unit_response_write_nb (req=0x5f73c0, 
    start=0x7fedbbe00010, size=27262976, min_size=27262976)
    at src/nxt_unit.c:2937
#9  0x00007feffac6478f in nxt_unit_response_write (req=0x5f73c0, 
    start=0x7fedbbe00010, size=268435456) at src/nxt_unit.c:2879
#10 0x00007feffac29a50 in nxt_wasmtime::GlobalState::run::{{closure}}::{{closure}} () from /opt/unit/modules/wasm_wasi_component.unit.so
#11 0x00007feffac22805 in tokio::runtime::task::core::Core<T,S>::poll ()
   from /opt/unit/modules/wasm_wasi_component.unit.so
#12 0x00007feff97ff730 in ?? ()
#13 0x0000000000000001 in ?? ()
#14 0x0000000000000004 in ?? ()
#15 0xfffffffe00000001 in ?? ()
#16 0x00007feffb32cca0 in ?? ()
   from /opt/unit/modules/wasm_wasi_component.unit.so
#17 0x0000000000000000 in ?? ()

When it's hanging the router threads are sitting in epoll_wait(2)

(gdb) info threads
  Id   Target Id                                  Frame 
* 1    Thread 0x7ffbf9747940 (LWP 220906) "unitd" 0x00007ffbf985fc12 in epoll_wait (epfd=3, events=0x9349d0, maxevents=32, timeout=-1)
    at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
  2    Thread 0x7ffbf8a006c0 (LWP 220916) "unitd" 0x00007ffbf985fc12 in epoll_wait (epfd=22, events=0x953820, maxevents=32, timeout=-1)
    at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
  3    Thread 0x7ffbf80006c0 (LWP 220917) "unitd" 0x00007ffbf985fc12 in epoll_wait (epfd=24, events=0x95ced0, maxevents=32, timeout=-1)
    at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
  4    Thread 0x7ffbf76006c0 (LWP 220918) "unitd" 0x00007ffbf985fc12 in epoll_wait (epfd=26, events=0x966580, maxevents=32, timeout=-1)
    at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
  5    Thread 0x7ffbf6c006c0 (LWP 220919) "unitd" 0x00007ffbf985fc12 in epoll_wait (epfd=28, events=0x96fc30, maxevents=32, timeout=-1)
    at ../sysdeps/unix/sysv/linux/epoll_wait.c:30

And the wasm module is sitting in poll(2)

(gdb) bt
#0  0x00007ffbf9851bed in __GI___poll (fds=0x7ffec0b679a0, nfds=2, timeout=-1)
    at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x00007ffbf8e6830a in nxt_unit_read_buf (ctx=0xaa5d08, rbuf=0xaa5e80)
    at src/nxt_unit.c:4675
#2  0x00007ffbf8e680df in nxt_unit_run_once_impl (ctx=0xaa5d08)
    at src/nxt_unit.c:4586
#3  0x00007ffbf8e67fa3 in nxt_unit_run (ctx=0xaa5d08) at src/nxt_unit.c:4546
#4  0x00007ffbf8df304d in nxt_wasmtime::start ()
   from /opt/unit/modules/wasm_wasi_component.unit.so
#5  0x0000000000449c2f in nxt_app_setup (task=0x92bc10, process=0x93ed70)
    at src/nxt_application.c:1020
#6  0x000000000040a602 in nxt_process_do_start (task=0x92bc10, 
    process=0x93ed70) at src/nxt_process.c:722
#7  0x000000000040ab38 in nxt_process_whoami_ok (task=0x92bc10, 
    msg=0x7ffec0b68090, data=0x93ed70) at src/nxt_process.c:850
#8  0x0000000000411989 in nxt_port_rpc_handler (task=0x92bc10, 
    msg=0x7ffec0b68090) at src/nxt_port_rpc.c:347
#9  0x00000000004125db in nxt_port_handler (task=0x92bc10, msg=0x7ffec0b68090)
    at src/nxt_port.c:184
#10 0x000000000040ebc8 in nxt_port_read_msg_process (task=0x92bc10, 
    port=0x93ef10, msg=0x7ffec0b68090) at src/nxt_port_socket.c:1271
#11 0x000000000040d910 in nxt_port_read_handler (task=0x92bc10, obj=0x93ef10, 
    data=0x0) at src/nxt_port_socket.c:778
#12 0x000000000041ddae in nxt_event_engine_start (engine=0x92bc10)
    at src/nxt_event_engine.c:542
#13 0x0000000000407ce7 in main (argc=2, argv=0x7ffec0b68338)
    at src/nxt_main.c:35

I'm currently thinking there's a general problem with sending large amounts of data through a single call to nxt_unit_response_write().

from unit.

ac000 avatar ac000 commented on June 29, 2024

Testing with our current wasm language module where transfers between the language module and unit core happen in at most 32MiB chunks, but I was limiting the transfer to 2MiB chunks.

It all looked good up until a 256MiB file and then I got the exact same crash as originally...

#0  0x000000000045cc65 in nxt_conn_io_sendbuf (task=0x7f776c003968, 
    sb=0x7f77717ffad0) at src/nxt_conn_write.c:175
175         if (niov == 0 && nxt_buf_is_file(sb->buf)) {
[Current thread is 1 (Thread 0x7f77718006c0 (LWP 228604))]
(gdb) bt
#0  0x000000000045cc65 in nxt_conn_io_sendbuf (task=0x7f776c003968, 
    sb=0x7f77717ffad0) at src/nxt_conn_write.c:175
#1  0x000000000045c933 in nxt_conn_io_write (task=0x7f776c003968, 
    obj=0x7f776c002840, data=0x7f776c003380) at src/nxt_conn_write.c:58
#2  0x000000000041ddae in nxt_event_engine_start (engine=0xc06810)
    at src/nxt_event_engine.c:542
#3  0x000000000042f364 in nxt_router_thread_start (data=0xc035b0)
    at src/nxt_router.c:3641
#4  0x000000000041b50c in nxt_thread_trampoline (data=0xc035b0)
    at src/nxt_thread.c:126
#5  0x00007f77725c4897 in start_thread (arg=<optimized out>)
    at pthread_create.c:444
#6  0x00007f777264b80c in clone3 ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

from unit.

ac000 avatar ac000 commented on June 29, 2024

And reproduced from an "external" C application, again running under strace(1) for reliable reproduction... maybe some race condition...

#0  0x000000000045cca3 in nxt_conn_io_sendbuf (task=0x7f12b4003968, 
    sb=0x7f12c1dffad0) at src/nxt_conn_write.c:175
175         if (niov == 0 && nxt_buf_is_file(sb->buf)) {
[Current thread is 1 (Thread 0x7f12c1e006c0 (LWP 273175))]
(gdb) bt
#0  0x000000000045cca3 in nxt_conn_io_sendbuf (task=0x7f12b4003968, 
    sb=0x7f12c1dffad0) at src/nxt_conn_write.c:175
#1  0x000000000045c971 in nxt_conn_io_write (task=0x7f12b4003968, 
    obj=0x7f12b40014f0, data=0x7f12b4003380) at src/nxt_conn_write.c:58
#2  0x000000000041ddcb in nxt_event_engine_start (engine=0xc1b5c0)
    at src/nxt_event_engine.c:542
#3  0x000000000042f381 in nxt_router_thread_start (data=0xc104a0)
    at src/nxt_router.c:3641
#4  0x000000000041b529 in nxt_thread_trampoline (data=0xc104a0)
    at src/nxt_thread.c:126
#5  0x00007f12c2fd9897 in start_thread (arg=<optimized out>)
    at pthread_create.c:444
#6  0x00007f12c306080c in clone3 ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

from unit.

ac000 avatar ac000 commented on June 29, 2024

Somewhere to put this...

When looking at unit under valgrind we get lots of errors. Just starting unit with a single external C application produces

==274664== Syscall param sendmsg(msg.msg_iov[0]) points to uninitialised byte(s)
==274664==    at 0x4B03B34: sendmsg (sendmsg.c:28)
==274664==    by 0x4537DB: nxt_sendmsg (nxt_socket_msg.c:32)
==274664==    by 0x453593: nxt_socketpair_send (nxt_socketpair.c:96)
==274664==    by 0x40CD41: nxt_port_write_handler (nxt_port_socket.c:446)
==274664==    by 0x40C74F: nxt_port_socket_write2 (nxt_port_socket.c:244)
==274664==    by 0x40B284: nxt_port_socket_write (nxt_port.h:345)
==274664==    by 0x40B284: nxt_process_send_ready (nxt_process.c:1066)
==274664==    by 0x40A65B: nxt_process_do_start (nxt_process.c:734)
==274664==    by 0x40A585: nxt_process_setup (nxt_process.c:701)
==274664==    by 0x40A212: nxt_process_create (nxt_process.c:579)
==274664==    by 0x4098BB: nxt_process_start (nxt_process.c:216)
==274664==    by 0x40975C: nxt_process_init_start (nxt_process.c:170)
==274664==    by 0x4222E2: nxt_main_process_start (nxt_main_process.c:103)
==274664==  Address 0x1ffefff547 is on thread 1's stack
==274664==  in frame #4, created by nxt_port_socket_write2 (nxt_port_socket.c:163)
==274664==  Uninitialised value was created by a stack allocation
==274664==    at 0x40C2F5: nxt_port_socket_write2 (nxt_port_socket.c:163)
==274664== 
==274660== Syscall param sendmsg(msg.msg_iov[0]) points to uninitialised byte(s)
==274660==    at 0x4B03B34: sendmsg (sendmsg.c:28)
==274660==    by 0x4537DB: nxt_sendmsg (nxt_socket_msg.c:32)
==274660==    by 0x453593: nxt_socketpair_send (nxt_socketpair.c:96)
==274660==    by 0x40CD41: nxt_port_write_handler (nxt_port_socket.c:446)
==274660==    by 0x40C74F: nxt_port_socket_write2 (nxt_port_socket.c:244)
==274660==    by 0x424A11: nxt_port_socket_write (nxt_port.h:345)
==274660==    by 0x424A11: nxt_main_port_modules_handler (nxt_main_process.c:1439)
==274660==    by 0x4125DA: nxt_port_handler (nxt_port.c:184)
==274660==    by 0x40EBC7: nxt_port_read_msg_process (nxt_port_socket.c:1271)
==274660==    by 0x40D90F: nxt_port_read_handler (nxt_port_socket.c:778)
==274660==    by 0x41DDAD: nxt_event_engine_start (nxt_event_engine.c:542)
==274660==    by 0x407CE6: main (nxt_main.c:35)
==274660==  Address 0x1ffefff567 is on thread 1's stack
==274660==  in frame #4, created by nxt_port_socket_write2 (nxt_port_socket.c:163)
==274660==  Uninitialised value was created by a stack allocation
==274660==    at 0x40C2F5: nxt_port_socket_write2 (nxt_port_socket.c:163)
==274660== 
==274664== 
==274664== HEAP SUMMARY:
==274664==     in use at exit: 54,184 bytes in 35 blocks
==274664==   total heap usage: 260 allocs, 225 frees, 150,688 bytes allocated
==274664== 
==274664== LEAK SUMMARY:
==274664==    definitely lost: 528 bytes in 1 blocks
==274664==    indirectly lost: 0 bytes in 0 blocks
==274664==      possibly lost: 1,664 bytes in 13 blocks
==274664==    still reachable: 51,992 bytes in 21 blocks
==274664==         suppressed: 0 bytes in 0 blocks
==274664== Rerun with --leak-check=full to see details of leaked memory
==274664== 
==274664== For lists of detected and suppressed errors, rerun with: -s
==274664== ERROR SUMMARY: 2 errors from 1 contexts (suppressed: 0 from 0)
==274660== Syscall param sendmsg(msg.msg_iov[1]) points to uninitialised byte(s)
==274660==    at 0x4B03B34: sendmsg (sendmsg.c:28)
==274660==    by 0x4537DB: nxt_sendmsg (nxt_socket_msg.c:32)
==274660==    by 0x453593: nxt_socketpair_send (nxt_socketpair.c:96)
==274660==    by 0x40CD41: nxt_port_write_handler (nxt_port_socket.c:446)
==274660==    by 0x40C74F: nxt_port_socket_write2 (nxt_port_socket.c:244)
==274660==    by 0x412761: nxt_port_send_port (nxt_port.c:253)
==274660==    by 0x412A56: nxt_port_send_new_port (nxt_port.c:221)
==274660==    by 0x412A56: nxt_port_process_ready_handler (nxt_port.c:340)
==274660==    by 0x4125DA: nxt_port_handler (nxt_port.c:184)
==274660==    by 0x40EBC7: nxt_port_read_msg_process (nxt_port_socket.c:1271)
==274660==    by 0x40D90F: nxt_port_read_handler (nxt_port_socket.c:778)
==274660==    by 0x41DDAD: nxt_event_engine_start (nxt_event_engine.c:542)
==274660==    by 0x407CE6: main (nxt_main.c:35)
==274660==  Address 0x4bed6fa is 122 bytes inside a block of size 1,024 alloc'd
==274660==    at 0x484A83D: posix_memalign (vg_replace_malloc.c:2099)
==274660==    by 0x4083A4: nxt_memalign (nxt_malloc.c:134)
==274660==    by 0x4146F6: nxt_mp_alloc_cluster (nxt_mp.c:685)
==274660==    by 0x414610: nxt_mp_alloc_page (nxt_mp.c:652)
==274660==    by 0x414350: nxt_mp_alloc_small (nxt_mp.c:578)
==274660==    by 0x413E84: nxt_mp_alloc (nxt_mp.c:397)
==274660==    by 0x41A04C: nxt_buf_mem_ts_alloc (nxt_buf.c:62)
==274660==    by 0x41268C: nxt_port_send_port (nxt_port.c:235)
==274660==    by 0x412A56: nxt_port_send_new_port (nxt_port.c:221)
==274660==    by 0x412A56: nxt_port_process_ready_handler (nxt_port.c:340)
==274660==    by 0x4125DA: nxt_port_handler (nxt_port.c:184)
==274660==    by 0x40EBC7: nxt_port_read_msg_process (nxt_port_socket.c:1271)
==274660==    by 0x40D90F: nxt_port_read_handler (nxt_port_socket.c:778)
==274660==  Uninitialised value was created by a heap allocation
==274660==    at 0x484A83D: posix_memalign (vg_replace_malloc.c:2099)
==274660==    by 0x4083A4: nxt_memalign (nxt_malloc.c:134)
==274660==    by 0x4146F6: nxt_mp_alloc_cluster (nxt_mp.c:685)
==274660==    by 0x414610: nxt_mp_alloc_page (nxt_mp.c:652)
==274660==    by 0x414350: nxt_mp_alloc_small (nxt_mp.c:578)
==274660==    by 0x413E84: nxt_mp_alloc (nxt_mp.c:397)
==274660==    by 0x41A04C: nxt_buf_mem_ts_alloc (nxt_buf.c:62)
==274660==    by 0x41268C: nxt_port_send_port (nxt_port.c:235)
==274660==    by 0x412A56: nxt_port_send_new_port (nxt_port.c:221)
==274660==    by 0x412A56: nxt_port_process_ready_handler (nxt_port.c:340)
==274660==    by 0x4125DA: nxt_port_handler (nxt_port.c:184)
==274660==    by 0x40EBC7: nxt_port_read_msg_process (nxt_port_socket.c:1271)
==274660==    by 0x40D90F: nxt_port_read_handler (nxt_port_socket.c:778)
==274660== 
==274660== Syscall param sendmsg(msg.msg_control) points to uninitialised byte(s)
==274660==    at 0x4B03B34: sendmsg (sendmsg.c:28)
==274660==    by 0x4537DB: nxt_sendmsg (nxt_socket_msg.c:32)
==274660==    by 0x453593: nxt_socketpair_send (nxt_socketpair.c:96)
==274660==    by 0x40CD41: nxt_port_write_handler (nxt_port_socket.c:446)
==274660==    by 0x40C74F: nxt_port_socket_write2 (nxt_port_socket.c:244)
==274660==    by 0x412761: nxt_port_send_port (nxt_port.c:253)
==274660==    by 0x412A56: nxt_port_send_new_port (nxt_port.c:221)
==274660==    by 0x412A56: nxt_port_process_ready_handler (nxt_port.c:340)
==274660==    by 0x4125DA: nxt_port_handler (nxt_port.c:184)
==274660==    by 0x40EBC7: nxt_port_read_msg_process (nxt_port_socket.c:1271)
==274660==    by 0x40D90F: nxt_port_read_handler (nxt_port_socket.c:778)
==274660==    by 0x41DDAD: nxt_event_engine_start (nxt_event_engine.c:542)
==274660==    by 0x407CE6: main (nxt_main.c:35)
==274660==  Address 0x1ffeffeb2c is on thread 1's stack
==274660==  in frame #2, created by nxt_socketpair_send (nxt_socketpair.c:88)
==274660==  Uninitialised value was created by a stack allocation
==274660==    at 0x453470: nxt_socketpair_send (nxt_socketpair.c:88)
==274660== 
==274666== Syscall param sendmsg(msg.msg_control) points to uninitialised byte(s)
==274666==    at 0x4B03B34: sendmsg (sendmsg.c:28)
==274666==    by 0x4537DB: nxt_sendmsg (nxt_socket_msg.c:32)
==274666==    by 0x453593: nxt_socketpair_send (nxt_socketpair.c:96)
==274666==    by 0x40CD41: nxt_port_write_handler (nxt_port_socket.c:446)
==274666==    by 0x40C74F: nxt_port_socket_write2 (nxt_port_socket.c:244)
==274666==    by 0x4261CD: nxt_port_socket_write (nxt_port.h:345)
==274666==    by 0x4261CD: nxt_controller_conf_send (nxt_controller.c:642)
==274666==    by 0x425B8C: nxt_controller_send_current_conf (nxt_controller.c:434)
==274666==    by 0x425C9E: nxt_controller_router_ready_handler (nxt_controller.c:480)
==274666==    by 0x4125DA: nxt_port_handler (nxt_port.c:184)
==274666==    by 0x40EBC7: nxt_port_read_msg_process (nxt_port_socket.c:1271)
==274666==    by 0x40D90F: nxt_port_read_handler (nxt_port_socket.c:778)
==274666==    by 0x41DDAD: nxt_event_engine_start (nxt_event_engine.c:542)
==274666==  Address 0x1ffeffeabc is on thread 1's stack
==274666==  in frame #2, created by nxt_socketpair_send (nxt_socketpair.c:88)
==274666==  Uninitialised value was created by a stack allocation
==274666==    at 0x453470: nxt_socketpair_send (nxt_socketpair.c:88)
==274666== 
==274667== Syscall param sendmsg(msg.msg_iov[0]) points to uninitialised byte(s)
==274667==    at 0x4B03B6B: __libc_sendmsg (sendmsg.c:28)
==274667==    by 0x4B03B6B: sendmsg (sendmsg.c:25)
==274667==    by 0x4537DB: nxt_sendmsg (nxt_socket_msg.c:32)
==274667==    by 0x453593: nxt_socketpair_send (nxt_socketpair.c:96)
==274667==    by 0x40CD41: nxt_port_write_handler (nxt_port_socket.c:446)
==274667==    by 0x40C74F: nxt_port_socket_write2 (nxt_port_socket.c:244)
==274667==    by 0x42B5FE: nxt_port_socket_write (nxt_port.h:345)
==274667==    by 0x42B5FE: nxt_router_conf_send (nxt_router.c:1403)
==274667==    by 0x42B2B6: nxt_router_conf_ready (nxt_router.c:1320)
==274667==    by 0x42B26B: nxt_router_conf_wait (nxt_router.c:1303)
==274667==    by 0x41DDAD: nxt_event_engine_start (nxt_event_engine.c:542)
==274667==    by 0x407CE6: main (nxt_main.c:35)
==274667==  Address 0x1ffefff797 is on thread 1's stack
==274667==  in frame #4, created by nxt_port_socket_write2 (nxt_port_socket.c:163)
==274667==  Uninitialised value was created by a stack allocation
==274667==    at 0x40C2F5: nxt_port_socket_write2 (nxt_port_socket.c:163)

This patch reduces them to

diff --git a/src/nxt_port_socket.c b/src/nxt_port_socket.c
index 5752d5ab..432bb012 100644
--- a/src/nxt_port_socket.c
+++ b/src/nxt_port_socket.c
@@ -164,7 +164,7 @@ nxt_port_socket_write2(nxt_task_t *task, nxt_port_t *port, nxt_uint_t type,
     int                  notify;
     uint8_t              qmsg_size;
     nxt_int_t            res;
-    nxt_port_send_msg_t  msg;
+    nxt_port_send_msg_t  msg = {};
     struct {
         nxt_port_msg_t   pm;
         uint8_t          buf[NXT_PORT_MAX_ENQUEUE_BUF_SIZE];
==274796== Syscall param sendmsg(msg.msg_iov[1]) points to uninitialised byte(s)
==274796==    at 0x4B03B34: sendmsg (sendmsg.c:28)
==274796==    by 0x4537F8: nxt_sendmsg (nxt_socket_msg.c:32)
==274796==    by 0x4535B0: nxt_socketpair_send (nxt_socketpair.c:96)
==274796==    by 0x40CD5E: nxt_port_write_handler (nxt_port_socket.c:446)
==274796==    by 0x40C76C: nxt_port_socket_write2 (nxt_port_socket.c:244)
==274796==    by 0x41277E: nxt_port_send_port (nxt_port.c:253)
==274796==    by 0x412A73: nxt_port_send_new_port (nxt_port.c:221)
==274796==    by 0x412A73: nxt_port_process_ready_handler (nxt_port.c:340)
==274796==    by 0x4125F7: nxt_port_handler (nxt_port.c:184)
==274796==    by 0x40EBE4: nxt_port_read_msg_process (nxt_port_socket.c:1271)
==274796==    by 0x40D92C: nxt_port_read_handler (nxt_port_socket.c:778)
==274796==    by 0x41DDCA: nxt_event_engine_start (nxt_event_engine.c:542)
==274796==    by 0x407CE6: main (nxt_main.c:35)
==274796==  Address 0x4bed6fa is 122 bytes inside a block of size 1,024 alloc'd
==274796==    at 0x484A83D: posix_memalign (vg_replace_malloc.c:2099)
==274796==    by 0x4083A4: nxt_memalign (nxt_malloc.c:134)
==274796==    by 0x414713: nxt_mp_alloc_cluster (nxt_mp.c:685)
==274796==    by 0x41462D: nxt_mp_alloc_page (nxt_mp.c:652)
==274796==    by 0x41436D: nxt_mp_alloc_small (nxt_mp.c:578)
==274796==    by 0x413EA1: nxt_mp_alloc (nxt_mp.c:397)
==274796==    by 0x41A069: nxt_buf_mem_ts_alloc (nxt_buf.c:62)
==274796==    by 0x4126A9: nxt_port_send_port (nxt_port.c:235)
==274796==    by 0x412A73: nxt_port_send_new_port (nxt_port.c:221)
==274796==    by 0x412A73: nxt_port_process_ready_handler (nxt_port.c:340)
==274796==    by 0x4125F7: nxt_port_handler (nxt_port.c:184)
==274796==    by 0x40EBE4: nxt_port_read_msg_process (nxt_port_socket.c:1271)
==274796==    by 0x40D92C: nxt_port_read_handler (nxt_port_socket.c:778)
==274796==  Uninitialised value was created by a heap allocation
==274796==    at 0x484A83D: posix_memalign (vg_replace_malloc.c:2099)
==274796==    by 0x4083A4: nxt_memalign (nxt_malloc.c:134)
==274796==    by 0x414713: nxt_mp_alloc_cluster (nxt_mp.c:685)
==274796==    by 0x41462D: nxt_mp_alloc_page (nxt_mp.c:652)
==274796==    by 0x41436D: nxt_mp_alloc_small (nxt_mp.c:578)
==274796==    by 0x413EA1: nxt_mp_alloc (nxt_mp.c:397)
==274796==    by 0x41A069: nxt_buf_mem_ts_alloc (nxt_buf.c:62)
==274796==    by 0x4126A9: nxt_port_send_port (nxt_port.c:235)
==274796==    by 0x412A73: nxt_port_send_new_port (nxt_port.c:221)
==274796==    by 0x412A73: nxt_port_process_ready_handler (nxt_port.c:340)
==274796==    by 0x4125F7: nxt_port_handler (nxt_port.c:184)
==274796==    by 0x40EBE4: nxt_port_read_msg_process (nxt_port_socket.c:1271)
==274796==    by 0x40D92C: nxt_port_read_handler (nxt_port_socket.c:778)
==274796== 
==274796== Syscall param sendmsg(msg.msg_control) points to uninitialised byte(s)
==274796==    at 0x4B03B34: sendmsg (sendmsg.c:28)
==274796==    by 0x4537F8: nxt_sendmsg (nxt_socket_msg.c:32)
==274796==    by 0x4535B0: nxt_socketpair_send (nxt_socketpair.c:96)
==274796==    by 0x40CD5E: nxt_port_write_handler (nxt_port_socket.c:446)
==274796==    by 0x40C76C: nxt_port_socket_write2 (nxt_port_socket.c:244)
==274796==    by 0x41277E: nxt_port_send_port (nxt_port.c:253)
==274796==    by 0x412A73: nxt_port_send_new_port (nxt_port.c:221)
==274796==    by 0x412A73: nxt_port_process_ready_handler (nxt_port.c:340)
==274796==    by 0x4125F7: nxt_port_handler (nxt_port.c:184)
==274796==    by 0x40EBE4: nxt_port_read_msg_process (nxt_port_socket.c:1271)
==274796==    by 0x40D92C: nxt_port_read_handler (nxt_port_socket.c:778)
==274796==    by 0x41DDCA: nxt_event_engine_start (nxt_event_engine.c:542)
==274796==    by 0x407CE6: main (nxt_main.c:35)
==274796==  Address 0x1ffeffeb2c is on thread 1's stack
==274796==  in frame #2, created by nxt_socketpair_send (nxt_socketpair.c:88)
==274796==  Uninitialised value was created by a stack allocation
==274796==    at 0x45348D: nxt_socketpair_send (nxt_socketpair.c:88)
==274796== 
==274801== Syscall param sendmsg(msg.msg_control) points to uninitialised byte(s)
==274801==    at 0x4B03B34: sendmsg (sendmsg.c:28)
==274801==    by 0x4537F8: nxt_sendmsg (nxt_socket_msg.c:32)
==274801==    by 0x4535B0: nxt_socketpair_send (nxt_socketpair.c:96)
==274801==    by 0x40CD5E: nxt_port_write_handler (nxt_port_socket.c:446)
==274801==    by 0x40C76C: nxt_port_socket_write2 (nxt_port_socket.c:244)
==274801==    by 0x4261EA: nxt_port_socket_write (nxt_port.h:345)
==274801==    by 0x4261EA: nxt_controller_conf_send (nxt_controller.c:642)
==274801==    by 0x425BA9: nxt_controller_send_current_conf (nxt_controller.c:434)
==274801==    by 0x425CBB: nxt_controller_router_ready_handler (nxt_controller.c:480)
==274801==    by 0x4125F7: nxt_port_handler (nxt_port.c:184)
==274801==    by 0x40EBE4: nxt_port_read_msg_process (nxt_port_socket.c:1271)
==274801==    by 0x40D92C: nxt_port_read_handler (nxt_port_socket.c:778)
==274801==    by 0x41DDCA: nxt_event_engine_start (nxt_event_engine.c:542)
==274801==  Address 0x1ffeffeabc is on thread 1's stack
==274801==  in frame #2, created by nxt_socketpair_send (nxt_socketpair.c:88)
==274801==  Uninitialised value was created by a stack allocation
==274801==    at 0x45348D: nxt_socketpair_send (nxt_socketpair.c:88)

This patch further reduces them

diff --git a/src/nxt_malloc.c b/src/nxt_malloc.c
index 5ea7322f..911896f9 100644
--- a/src/nxt_malloc.c
+++ b/src/nxt_malloc.c
@@ -132,6 +132,7 @@ nxt_memalign(size_t alignment, size_t size)
     nxt_err_t   err;
 
     err = posix_memalign(&p, alignment, size);
+    nxt_memzero(p, size);
 
     if (nxt_fast_path(err == 0)) {
         nxt_thread_log_debug("posix_memalign(%uz, %uz): %p",
==274888== Syscall param sendmsg(msg.msg_control) points to uninitialised byte(s)
==274888==    at 0x4B03B34: sendmsg (sendmsg.c:28)
==274888==    by 0x45380D: nxt_sendmsg (nxt_socket_msg.c:32)
==274888==    by 0x4535C5: nxt_socketpair_send (nxt_socketpair.c:96)
==274888==    by 0x40CD73: nxt_port_write_handler (nxt_port_socket.c:446)
==274888==    by 0x40C781: nxt_port_socket_write2 (nxt_port_socket.c:244)
==274888==    by 0x412793: nxt_port_send_port (nxt_port.c:253)
==274888==    by 0x412A88: nxt_port_send_new_port (nxt_port.c:221)
==274888==    by 0x412A88: nxt_port_process_ready_handler (nxt_port.c:340)
==274888==    by 0x41260C: nxt_port_handler (nxt_port.c:184)
==274888==    by 0x40EBF9: nxt_port_read_msg_process (nxt_port_socket.c:1271)
==274888==    by 0x40D941: nxt_port_read_handler (nxt_port_socket.c:778)
==274888==    by 0x41DDDF: nxt_event_engine_start (nxt_event_engine.c:542)
==274888==    by 0x407CE6: main (nxt_main.c:35)
==274888==  Address 0x1ffeffeb2c is on thread 1's stack
==274888==  in frame #2, created by nxt_socketpair_send (nxt_socketpair.c:88)
==274888==  Uninitialised value was created by a stack allocation
==274888==    at 0x4534A2: nxt_socketpair_send (nxt_socketpair.c:88)
==274888== 
==274893== Syscall param sendmsg(msg.msg_control) points to uninitialised byte(s)
==274893==    at 0x4B03B34: sendmsg (sendmsg.c:28)
==274893==    by 0x45380D: nxt_sendmsg (nxt_socket_msg.c:32)
==274893==    by 0x4535C5: nxt_socketpair_send (nxt_socketpair.c:96)
==274893==    by 0x40CD73: nxt_port_write_handler (nxt_port_socket.c:446)
==274893==    by 0x40C781: nxt_port_socket_write2 (nxt_port_socket.c:244)
==274893==    by 0x4261FF: nxt_port_socket_write (nxt_port.h:345)
==274893==    by 0x4261FF: nxt_controller_conf_send (nxt_controller.c:642)
==274893==    by 0x425BBE: nxt_controller_send_current_conf (nxt_controller.c:434)
==274893==    by 0x425CD0: nxt_controller_router_ready_handler (nxt_controller.c:480)
==274893==    by 0x41260C: nxt_port_handler (nxt_port.c:184)
==274893==    by 0x40EBF9: nxt_port_read_msg_process (nxt_port_socket.c:1271)
==274893==    by 0x40D941: nxt_port_read_handler (nxt_port_socket.c:778)
==274893==    by 0x41DDDF: nxt_event_engine_start (nxt_event_engine.c:542)
==274893==  Address 0x1ffeffeabc is on thread 1's stack
==274893==  in frame #2, created by nxt_socketpair_send (nxt_socketpair.c:88)
==274893==  Uninitialised value was created by a stack allocation
==274893==    at 0x4534A2: nxt_socketpair_send (nxt_socketpair.c:88)

from unit.

ac000 avatar ac000 commented on June 29, 2024

@rustedsword Thanks for your explanation, it'll take a little time to go through that!

I do wonder though if that can explain the intermittent behaviour or that it's much more readily reproducible under strace(1).

I gave your simple suggested fix a try

diff --git a/src/nxt_conn_write.c b/src/nxt_conn_write.c
index 714a3e15..06f4db7f 100644
--- a/src/nxt_conn_write.c
+++ b/src/nxt_conn_write.c
@@ -172,7 +172,7 @@ nxt_conn_io_sendbuf(nxt_task_t *task, nxt_sendbuf_t *sb)
         return 0;
     }
 
-    if (niov == 0 && nxt_buf_is_file(sb->buf)) {
+    if (niov == 0 && sb->buf && nxt_buf_is_file(sb->buf)) {
         return nxt_conn_io_sendfile(task, sb);
     }
 

and indeed while it does fix the crash, it doesn't fix the underlying issue of short transfers...

 $ curl -v localhost:8080/ -o /dev/null
* processing: localhost:8080/
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying [::1]:8080...
* Connected to localhost (::1) port 8080
> GET / HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/8.2.1
> Accept: */*
> 
< HTTP/1.1 200 OK
< Content-Type: application/octet-stream
< Content-Length: 268435456
< Server: Unit/1.32.0
< Date: Fri, 16 Feb 2024 13:11:05 GMT
< 
{ [16640 bytes data]
* transfer closed with 58720256 bytes remaining to read
 78  256M   78  200M    0     0  1684M      0 --:--:-- --:--:-- --:--:-- 1694M
* Closing connection
curl: (18) transfer closed with 58720256 bytes remaining to read

from unit.

rustedsword avatar rustedsword commented on June 29, 2024

@ac000 Well, it is more like this:

diff --git a/src/nxt_conn_write.c b/src/nxt_conn_write.c
index 714a3e15..4ccafdf1 100644
--- a/src/nxt_conn_write.c
+++ b/src/nxt_conn_write.c
@@ -172,6 +172,10 @@ nxt_conn_io_sendbuf(nxt_task_t *task, nxt_sendbuf_t *sb)
         return 0;
     }
 
+    if (niov == 0 && sb->buf == NULL) {
+        return 0;
+    }
+
     if (niov == 0 && nxt_buf_is_file(sb->buf)) {
         return nxt_conn_io_sendfile(task, sb);
     }


Otherwise you will call nxt_conn_io_writev()

from unit.

ac000 avatar ac000 commented on June 29, 2024

from unit.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.