Sometimes it looks like the tail end of the output from a container is being truncated.
This is using an agent with fix #124 applied (actually using commit 827351a).
I noticed the problem when trying to run https://github.com/clearcontainers/tests/blob/master/metrics/time/launch_times.sh#L101, specifically that line that is looking for the kernel dmesg text: Freeing unused kernel memory
. With kata
, after a few iterations (4 in my recent test), the script fails to find the line.
Running dmesg
by hand inside a kata container repeatedly, I can see that every now and then (say every 4 or 5 runs), the output of the container appears to be truncated. As an example, the tail of a good run looks like:
$ docker run --rm -ti --runtime=kata-runtime ubuntu bash
# dmesg
... <repeat as necessary>...
[ 1.076596] tsc: Refined TSC clocksource calibration: 1799.973 MHz
[ 1.076618] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x19f2102e78d, max_idle_ns: 440795264358 ns
[ 27.164537] random: crng init done
and the tail of a bad run looks like:
[ 0.180340] Write protecting the kernel read-only data: 10240k
[ 0.180745] Freeing unused kernel memory: 2024K
[ 0.181211] Freeing unused kernel memory: 220K
[ 0.186141] systemd[1]: systemd 234 running in system mode. (+PAM -AUDIT -SELINUX +IMA -APPARMOR -SMACK
As a sanity check I did this in the container:
for i in $(seq 1 20); do
dmesg | wc
done
and that appears fine, which leads me to think the problem is in the stdout transport to the host.
One more data point. When my test script failed, and I dumped the dmesg text that I did get to check why it failed, it looked like maybe some of the dmesg had gotten 're-ordered', and I see output like:
^[[32m[ 0.200145] ^[[0m^[[33mrandom^[[0m: systemd: uninitialized urandom read (16 bytes read)
^[[32m[ 0.200172] ^[[0m^[[33mrandom^[[0m: systemd: uninitialized urandom read (16 bytes read)
^[[32m[ 0.200190] ^[[0m^[[33mrandom^[[0m: systemd: uninitialized urandom read (16 bytes read)
^[[32m[ 0.216145] ^[[0m^[[33msystemd-journald[118]^[[0m: Received request to flush runtime journal from PID 1
^[[32m[ 0.881863] ^[[0m^[[33mpci 0000:00:03.0^[[0m: PCI bridge to [b after 0 usecs
^[[32m[ 0.177057] ^[[0mcalling xfrm6_mode_tunnel_init+0x0/0x17 @ 1
^[[32m[ 0.177059] ^[[0minitcall xfrm6_mode_tunnel_init+0x0/0x17 returned 0 after 0 usecs
(sorry about the ANSI noise) - where you can see the timestamps are mis-ordered. This may or may not be related... something to check if/after we've found the truncation issue maybe?