cloudius-systems / osv Goto Github PK

OSv, a new operating system for the cloud.

Home Page: osv.io

License: Other

C++ 36.02% Assembly 0.62% C 55.92% Makefile 1.01% Python 2.64% Shell 0.94% Java 1.03% Lua 0.36% Roff 0.97% Perl 0.47%

osv's Introduction

OSv was originally designed and implemented by Cloudius Systems (now ScyllaDB) however currently, it is being maintained and enhanced by a small community of volunteers. If you are into systems programming or want to learn and help us improve OSv, then please contact us on OSv Google Group forum or feel free to pick up any good issues for newcomers. For details on how to format and send patches, please read this wiki (we do accept pull requests as well).

OSv

OSv is an open-source versatile modular unikernel designed to run single unmodified Linux application securely as microVM on top of a hypervisor, when compared to traditional operating systems which were designed for a vast range of physical machines. Built from the ground up for effortless deployment and management of microservices and serverless apps, with superior performance.

OSv has been designed to run unmodified x86-64 and aarch64 Linux binaries as is, which effectively makes it a Linux binary compatible unikernel (for more details about Linux ABI compatibility please read this doc). In particular, OSv can run many managed language runtimes including JVM, Python, Node.JS, Ruby, Erlang, and applications built on top of those runtimes. It can also run applications written in languages compiling directly to native machine code like C, C++, Golang and Rust as well as native images produced by GraalVM and WebAssembly/Wasmer.

OSv can boot as fast as ~5 ms on Firecracker using as low as 11 MB of memory. OSv can run on many hypervisors including QEMU/KVM, Firecracker, Cloud Hypervisor, Xen, VMWare, VirtualBox and Hyperkit as well as open clouds like AWS EC2, GCE and OpenStack.

For more information about OSv, see the main wiki page and http://osv.io/.

Building and Running Apps on OSv

To run an application on OSv, one needs to build an image by fusing the OSv kernel, and the application files together. This, at a high level, can be achieved in two ways, either:

by using the shell script located at ./scripts/build that builds the kernel from sources and fuses it with application files, or
by using the capstan tool that uses pre-built kernel and combines it with application files to produce a final image.

If you intend to try to run your app on OSv with the least effort possible, you should pursue the capstan route. For introduction please read this crash course. For more details about capstan please read this more detailed documentation. Pre-built OSv kernel files (osv-loader.qemu) can be automatically downloaded by capstan from the OSv regular releases page or manually from the nightly releases repo.

If you are comfortable with make and GCC toolchain and want to try the latest OSv code, then you should read this part of the readme to guide you how to set up your development environment and build OSv kernel and application images.

Releases

We aim to release OSv 2-3 times a year. You can find the latest one on github along with several published artifacts including kernel and some modules.

In addition, we have set up Travis-based CI/CD pipeline where each commit to the master and ipv6 branches triggers full build of the latest kernel and publishes some artifacts to the nightly releases repo. Each commit also triggers the publishing of new Docker "build toolchain" images to the Docker hub.

Design

A good bit of the design of OSv is pretty well explained in the Components of OSv wiki page. You can find even more information in the original USENIX paper and its presentation.

In addition, you can find a lot of good information about the design of specific OSv components on the main wiki page and http://osv.io/ and http://blog.osv.io/. Unfortunately, some of that information may be outdated (especially on http://osv.io/), so it is always best to ask on the mailing list if in doubt.

Component Diagram

In the diagram below, you can see the major components of OSv across the logical layers. Starting with libc at the top, which is greatly based on musl, the core layer in the middle comprises ELF dynamic linker, VFS, networking stack, thread scheduler, page cache, RCU, and memory management components. Then finally down, the layer is composed of the clock, block, and networking device drivers that allow OSv to interact with hypervisors like VMware and VirtualBox or the ones based on KVM and XEN.

Metrics and Performance

There are no official up-to date performance metrics comparing OSv to other unikernels or Linux. In general, OSv lags behind Linux in disk-I/O-intensive workloads partially due to coarse-grained locking in VFS around read/write operations as described in this issue. In network-I/O-intensive workloads, OSv should fare better (or at least used to as Linux has advanced a lot since) as shown with performance tests of Redis and Memcached. You can find some old "numbers" on the main wiki, http://osv.io/benchmarks, and some papers listed at the bottom of this readme.

So OSv is probably not best suited to run MySQL or ElasticSearch, but should deliver pretty solid performance for general stateless applications like microservices or serverless (at least as some papers show).

Kernel Size

At this moment (as of December 2022) the size of the universal OSv kernel (loader.elf artifact) built with all symbols hidden is around 3.6 MB. The size of the kernel linked with the full libstdc++.so.6 library and ZFS filesystem library included is 6.8 MB. Please read the Modularization wiki to better understand how kernel can be built and further reduced in size and customized to run on a specific hypervisor or a specific app.

The size of OSv kernel may be considered quite large compared to other unikernels. However, bear in mind that OSv kernel (being unikernel) provides subset of the functionality of the following Linux libraries (see their approximate size on Linux host):

libresolv.so.2 (100 K)
libc.so.6 (2 MB)
libm.so.6 (1.4 MB)
ld-linux-x86-64.so.2 (184 K)
libpthread.so.0 (156 K)
libdl.so.2 (20 K)
librt.so.1 (40 K)
libstdc++.so.6 (2 MB)
libaio.so.1 (16 K)
libxenstore.so.3.0 (32 K)
libcrypt.so.1 (44 K)

Boot Time

OSv, with Read-Only FS and networking off, can boot as fast as ~5 ms on Firecracker and even faster around ~3 ms on QEMU with the microvm machine. However, in general, the boot time will depend on many factors like hypervisor including settings of individual para-virtual devices, filesystem (ZFS, ROFS, RAMFS, or Virtio-FS), and some boot parameters. Please note that by default OSv images get built with ZFS filesystem.

For example, the boot time of ZFS image on Firecracker is ~40 ms, and regular QEMU ~200 ms these days. Also, newer versions of QEMU (>=4.0) are typically faster to boot. Booting on QEMU in PVH/HVM mode (aka direct kernel boot, enabled by -k option of run.py) should always be faster as OSv is directly invoked in 64-bit long mode. Please see this Wiki for a brief review of the boot methods OSv supports.

Finally, some boot parameters passed to the kernel may affect the boot time:

--console serial - this disables VGA console that is slow to initialize and can shave off 60-70 ms on QEMU
--nopci - this disables enumeration of PCI devices especially if we know none are present (QEMU with microvm or Firecracker) and can shave off 10-20 ms
--redirect=/tmp/out - writing to the console can impact the performance quite severely (30-40%) if application logs a lot, so redirecting standard output and error to a file might speed up performance quite a lot

You can always see boot time breakdown by adding --bootchart parameter:

./scripts/run.py -e '--bootchart /hello'
OSv v0.57.0-6-gb442a218
eth0: 192.168.122.15
	disk read (real mode): 58.62ms, (+58.62ms)
	uncompress lzloader.elf: 77.20ms, (+18.58ms)
	TLS initialization: 77.96ms, (+0.76ms)
	.init functions: 79.75ms, (+1.79ms)
	SMP launched: 80.11ms, (+0.36ms)
	VFS initialized: 81.62ms, (+1.52ms)
	Network initialized: 81.78ms, (+0.15ms)
	pvpanic done: 81.91ms, (+0.14ms)
	pci enumerated: 93.89ms, (+11.98ms)
	drivers probe: 93.89ms, (+0.00ms)
	drivers loaded: 174.80ms, (+80.91ms)
	ROFS mounted: 176.88ms, (+2.08ms)
	Total time: 178.01ms, (+1.13ms)
Cmdline: /hello
Hello from C code

Memory Utilization

OSv needs at least 11 M of memory to run a hello world app. Even though it is a third of what it was 4 years ago, it is still quite a lot compared to other unikernels. The applications spawning many threads may take advantage of building the kernel with the option conf_lazy_stack=1 to further reduce memory utilization (please see the comments of this patch to understand this feature better).

We are planning to further lower this number by adding self-tuning logic to L1/L2 memory pools.

Testing

OSv comes with around 140 unit tests that get executed upon every commit and run on ScyllaDB servers. There are also a number of extra tests located under tests/ sub-tree that are not automated at this point.

You can run unit tests in a number of ways:

./scripts/build check                  # Create ZFS test image and run all tests on QEMU

./scripts/build check fs=rofs          # Create ROFS test image and run all tests on QEMU

./scripts/build image=tests && \       # Create ZFS test image and run all tests on Firecracker
./scripts/test.py -p firecracker

./scripts/build image=tests && \       # Create ZFS test image and run all tests on QEMU
./scripts/test.py -p qemu_microvm      # with microvm machine

In addition, there is an Automated Testing Framework that can be used to run around 30 real apps, some of them under stress using ab or wrk tools. The intention is to catch any regressions that may be missed by unit tests.

Finally, one can use Docker files to test OSv on different Linux distributions.

Setting up Development Environment

OSv can only be built on a 64-bit x86 and ARM Linux distribution. Please note that this means the "x86_64" or "amd64" version for 64-bit x86 and "aarch64" or "arm64" version for ARM respectively.

To build the OSv kernel you need a physical or virtual machine with Linux distribution on it and GCC toolchain and all necessary packages and libraries OSv build process depends on. The fastest way to set it up is to use the Docker files that OSv comes with. You can use them to build your own Docker image and then start it in order to build OSv kernel or run an app on OSv inside of it. Please note that the main docker file depends on pre-built base Docker images for Ubuntu or Fedora that get published to DockerHub upon every commit. This should speed up building the final images as all necessary packages are installed as part of the base images.

Alternatively, you can manually clone the OSv repo and use setup.py to install all required packages and libraries, as long as it supports your Linux distribution, and you have both git and python 3 installed on your machine:

git clone https://github.com/cloudius-systems/osv.git
cd osv && git submodule update --init --recursive
./scripts/setup.py

The setup.py recognizes and installs packages for a number of Linux distributions including Fedora, Ubuntu, Debian, LinuxMint and RedHat ones (Scientific Linux, NauLinux, CentOS Linux, Red Hat Enterprise Linux, Oracle Linux). Please note that we actively maintain and test only Ubuntu and Fedora, so your mileage with other distributions may vary. The support of CentOS 7 has also been recently added and tested so it should work as well. The setup.py is used by Docker files internally to achieve the same result.

IDEs

If you like working in IDEs, we recommend either Eclipse CDT which can be setup as described in this wiki page or CLion from JetBrains which can be set to work with OSv makefile using so-called compilation DB as described in this guide.

Building OSv Kernel and Creating Images

Building OSv is as easy as using the shell script ./scripts/build that orchestrates the build process by delegating to the main makefile to build the kernel and by using a number of Python scripts like ./scripts/module.py to build application and fuse it together with the kernel into a final image placed at ./build/release/usr.img (or ./build/$(arch)/usr.img in general). Please note that building an application does not necessarily mean building from sources as in many cases the application binaries would be located on and copied from the Linux build machine using the shell script ./scripts/manifest_from_host.sh (see this Wiki page for details).

The shell script build can be used as the examples below illustrate:

# Create a default image that comes with a command line and REST API server
./scripts/build

# Create an image with native-example app
./scripts/build -j4 fs=rofs image=native-example

# Create an image with spring boot app with Java 10 JRE
./scripts/build JAVA_VERSION=10 image=openjdk-zulu-9-and-above,spring-boot-example

 # Create an image with 'ls' executable taken from the host
./scripts/manifest_from_host.sh -w ls && ./scripts/build --append-manifest

# Create a test image and run all tests in it
./scripts/build check

# Clean the build tree
./scripts/build clean

Command nproc will calculate the number of jobs/threads for make and ./scripts/build automatically. Alternatively, the environment variable MAKEFLAGS can be exported as follows:

export MAKEFLAGS=-j$(nproc)

In that case, make and scripts/build do not need the parameter -j.

For details on how to use the build script, please run ./scripts/build --help.

The ./scripts/build creates the image build/last/usr.img in qcow2 format. To convert this image to other formats, use the ./scripts/convert tool, which can convert an image to the vmdk, vdi or raw formats. For example:

./scripts/convert raw

Aarch64

By default, the OSv kernel gets built for the native host architecture (x86_64 or aarch64), but it is also possible to cross-compile kernel and modules on Intel machine for ARM by adding arch parameter like so:

./scripts/build arch=aarch64

At this point cross-compiling the aarch64 version of OSv is only supported on Fedora, Ubuntu, and CentOS 7, and relevant aarch64 gcc and libraries' binaries can be downloaded using the ./scripts/download_aarch64_packages.py script. OSv can also be built natively on Ubuntu on ARM hardware like Raspberry PI 4, Odroid N2+, or RockPro64.

Please note that as of the latest 0.57.0 release, the ARM part of OSv has been greatly improved and tested and is pretty much on par with the x86_64 port in terms of the functionality. In addition, all unit tests and many advanced apps like Java, golang, nginx, python, iperf3, etc can successfully run on QEMU and Firecraker on Raspberry PI 4 and Odroid N2+ with KVM acceleration enabled.

For more information about the aarch64 port please read this Wiki page.

Filesystems

At the end of the boot process, the OSv dynamic linker loads an application ELF and any related libraries from the filesystem on a disk that is part of the image. By default, the images built by ./scripts/build contain a disk formatted with the ZFS filesystem, which you can read more about here. ZFS is a great read-write file system and may be a perfect fit if you want to run MySQL on OSv. However, it may be an overkill if you want to run stateless apps in which case you may consider Read-Only FS. Finally, you can also have OSv read the application binary from RAMFS, in which case the filesystem gets embedded as part of the kernel ELF. You can specify which filesystem to build the image disk with by setting the parameter fs of ./scripts/build to one of the three values -zfs, rofs, or ramfs.

In addition, one can mount NFS filesystem, which had been recently transformed to be a shared library pluggable as a module, and newly implemented and improved Virtio-FS filesystem. The Virtio-FS mounts can be set up by adding proper entry /etc/fstab or by passing a boot parameter as explained in this Wiki. In addition, very recently OSv has been enhanced to be able to boot from Virtio-FS filesystem directly.

Moreover, we have added support for the ext2/3/4 filesystem, in the form of a shared pluggable module libext. One can add the libext module to an image and have OSv mount the ext filesystem from a separate disk like so (for more detailed examples please read here):

./scripts/build fs=rofs image=libext,native-example

./scripts/run.py --execute='--mount-fs=ext,/dev/vblk1,/data /hello' --second-disk-image ./ext4.img

Finally, the ZFS support has been also greatly improved as of the 0.57 release and there are many methods and setups to build and run ZFS images with OSv. For details please read the ZFS section of the Filesystems wiki.

Running OSv

Running an OSv image, built by scripts/build, is as easy as:

./scripts/run.py

By default, the run.py runs OSv under KVM, with 4 vCPUs and 2 GB of memory. You can control these and tens of others ones by passing relevant parameters to the run.py. For details, on how to use the script, please run ./scripts/run.py --help.

The run.py can run an OSv image on QEMU/KVM, Xen, and VMware. If running under KVM you can terminate by hitting Ctrl+A X.

Alternatively, you can use ./scripts/firecracker.py to run OSv on Firecracker. This script automatically downloads firecracker binary if missing, and accepts several parameters like the number of vCPUs, and memory named exactly like run.py does. You can learn more about running OSv on Firecracker from this wiki.

Please note that to run OSv with the best performance on Linux under QEMU or Firecracker you need KVM enabled (this is only possible on physical Linux machines, EC2 "bare metal" (i3) instances, or VMs that support nested virtualization with KVM on). The easiest way to verify if KVM is enabled is to check if /dev/kvm is present, and your user account can read from and write to it. Adding your user to the kvm group may be necessary like so:

usermod -aG kvm <user name>

For more information about building and running JVM, Node.JS, Python, and other managed runtimes as well as Rust, Golang, or C/C++ apps on OSv, please read this wiki page. For more information about various example apps you can build and run on OSv, please read the osv-apps repo README.

Application Types and Launch Modes

Regarding how applications are launched on OSv, they all fall into two categories - dynamically linked and statically linked executables. The dynamically linked executables can be launched by the OSv built-into-kernel dynamic linker or the Linux dynamic linker ld*.so. The statically linked executables are bootstrapped but OSv dynamic linker but then interact via system calls with OSv kernel. For more details please watch the 1st half of this presentation or read slides 2-7.

Dynamically Linked Executables

The dynamically linked executables require the dynamic linker (built-in or Linux one) to bootstrap the main application ELF file, load the libraries it depends on, resolve symbols and eventually call the main function.

Via Built-in Dynamic Linker and `libc`

The built-in dynamic linker plays the role of the program interpreter that performs similar steps as on Linux, but instead of loading the libraries it depends on from filesystem, it resolves the undefined symbols by pointing them to the implementations of those in OSv built-in libc. The OSv linker supports both Shared Libraries and Dynamically Linked Executables that are either position dependent or non-position dependent.

./scripts/build image=native-example
./scripts/run.py -e '/hello'

The benefit is that programs interact with the OSv kernel using the fast local function calls without the overhead of SYSCALL/SVC instruction. On the negative side, the Linux-compatibility is a moving target because GLIBc keeps adding new functions, and OSv needs to keep implementing them.

Via Linux Dynamic Linker `ld*.so` and `glibc`

Similarly to the built-in dynamic linker, OSv can also launch dynamically linked executables via the Linux dynamic linker ld*.so. The Linux dynamic linker ld*.so is bootstrapped the exact same way as a statically linked executable (see below) and then it orchestrates loading and execution of the specified dynamically linked executables. Just like with statically linked executable, the application interacts with OSv kernel via system calls.

dl=linux ./scripts/manifest_from_host.sh /bin/ls && ./scripts/build image=empty --append-manifest
./scripts/run.py -e '/lib64/ld-linux-x86-64.so.2 /hello'

Statically Linked Executables

The statically linked executables interact with OSv kernel by directly making system calls and reading from pseudo filesystems like procfs and sysfs like in Linux.

In this mode, the Linux-compatibility is should be improved. But compared to the dynamically linked executables that call local functions, the statically linked ones suffer from the ~110 ns system call overhead mainly paid to save and restore the state of regular registers and FPU. Having said that, most Linux applications have been written with the understanding that system calls are expensive and avoid them if possible so neither statically linked executables are affected negatively nor the dynamically linked ones launched via built-in dynamic linker benefit in any significant way.

For more information about OSv implementet syscalls please read this wiki.

Networking

By default, the run.py starts OSv with user networking/SLIRP on. To start OSv with more performant external networking, you need to enable -n and -v options like so:

sudo ./scripts/run.py -nv

The -v is for KVM's vhost that provides better performance and its setup requires tap device thus we use sudo.

Alternatively, one can run OSv as a non-privileged used with a tap device like so:

./scripts/create_tap_device.sh natted qemu_tap0 172.18.0.1 #You can pick a different address but then update all IPs below

./scripts/run.py -n -t qemu_tap0 \
  --execute='--ip=eth0,172.18.0.2,255.255.255.252 --defaultgw=172.18.0.1 --nameserver=172.18.0.1 /hello'

By default, OSv spawns a dhcpd-like thread that automatically configures virtual NICs. A static configuration can be done within OSv by configuring networking like so:

ifconfig virtio-net0 192.168.122.100 netmask 255.255.255.0 up
route add default gw 192.168.122.1

To enable networking on Firecracker, you have to explicitly enable -n option to firecracker.py.

Finally, please note that the master branch of OSv only implements IPV4 subset of the networking stack. If you need IPV6, please build from ipv6 branch or use IPV6 kernel published to nightly releases repo.

Debugging, Monitoring, Profiling OSv

OSv can be debugged with gdb; for more details please read this wiki
OSv kernel and application can be traced and profiled; for more details please read this wiki
OSv comes with the admin/monitoring REST API server; for more details please read this and that wiki page. There is also lighter monitoring REST API module that is effectively a read-only subset of the former one.

FAQ and Contact

If you want to learn more about OSv or ask questions, please contact us on OSv Google Group forum. You can also follow us on Twitter.

Papers and Articles about OSv

List of somewhat newer articles about OSv found on the Web:

FOSDEM Presentations

You can find some older articles and presentations at http://osv.io/resources and http://blog.osv.io/.

osv's People

Contributors

Stargazers

Watchers

Forkers

joshuamckenty nola-radar sashalevin hannibalhuang asias vipmike007 ashbert joewong ryanmt cduclos pkhuong husttsq vdt raphaelsc cdoru cellborn tgrabiec dk-dev gruninger bingbuidea jnider liuy zhangxu znode tracymacding squallssck simbasailor dschatzberg kingctan securitypotato alexcmd ankon giastfader haneefmubarak syuu1228 carenas glockenmeier fweisbec spclops raymond-xu nyh kevsmith avikivity tony-tu pieter-was alonha glommer xiaominzhang spindeadlock shanwei zabrane amitkumaar gblanchard l1x skuanr vyommani thejonwong posix4e gleb-cloudius rata hongnod wuzhy dmarti mcgrimm hw-claudio pradeepkumars prasad-joshi kstyrc 5kg orumin jinhy queer1 e74b254a966b1c sxhao xtfeng bhansaliakhil praveenjharbade pdziepak elcallio namhyung andy071001 sebgod janikokkonen huawei-erc respu kazu-zamasu justasabc zikaletrange jazeltq changliwei deanzhang elazarl tinti queniao otokunaga2 mirzawaqasahmed gordongong emaxerrno almeydifer kinwin-ustc

osv's Issues

Alternative to wake_with()

Right now, code like condvar_wake_one/all use wake_with() to allow waking a thread which might exit as soon as we wake it.

Avi doesn't like wake_with() for two reasons: It is unorthodox API (so users need to know to use it, when and how), and it slows down every wake() with two atomic fetch-add.

Why does condvar_wake_one/all() need to use wake_with at all?
It already does
m.lock();
waiter = wr->t;
wr->t = nullptr;
waiter->wake();
m.unlock();

If the waiter would regain the lock when waking up, then we would know the thread cannot exit before wake_one/all releases this lock, and the problem would be solved. But we didn't regain the lock (except in the case of timeout), because we feared that the waiter will often go back to sleep immediately after waking, to wait for the lock which the waker is still holding.

The idea (suggested by Avi) is, instead of finding wake_with() as a replacement to the above solution - just make the above solution efficient.

As explained above, the waiter will regain the lock after waiting. The waker will do:
m.lock();
preempt_disable();
waiter->wr->t;
wr->t = nullptr;
waiter->wake();
m.unlock();
preempt_enable();

If the waiter thread is on the same CPU as the waker, this is efficient - the waiter will not start to run (and try to regain the lock) until the preempt_enable() after we release the lock. This is more efficient than the wake_with() case because now we have a thread-local increment (preempt_disable()) instead of the atomic increment used by wake_with().

If the waiter thread is on a different CPU, the preempt_disable() won't help, and it can start running before we finish m.unlock(). In the usual case, it will take a short time until m.unlock() completes, so all we need to do is to add to mutex_lock() the ability to do a bit of spinning before going to sleep. We want this feature anyway. This will be a performance improvement over wake_with() only if in the typical case, m.unlock() will finish more quickly than the waiter thread starting to run so we'll have no spinning iterations at all. We'll need to verify that this is actually the case - otherwise all of this won't be an improvement at all.

OSv freezes when exception is thrown out of main

OSv freezes when running the following program:

  int main(int argc, char const *argv[])
  {
    throw 1;
  }

  (gdb) where
  #0  0x000000000046de20 in uw_frame_state_for ()
  #1  0x000000000046fa93 in _Unwind_RaiseException ()
  #2  0x000000000039e062 in __cxa_throw ()
  #3  0x0000100000000780 in ?? ()
  #4  0x00002000001fff20 in ?? ()
  #5  0x0000000000203024 in run_main (prog=<optimized out>, args=<optimized out>)
      at ../../loader.cc:214

Result on Linux:
terminate called after throwing an instance of 'int'
Aborted (core dumped)

thread dashboard may freeze when trying to interrupt it

There is a bug in crash 1.2.7 which causes that interrupting a command like 'thread top' may fail with ~9% probability and all subsequent attempts to interrupt it will be silently ignored.

As it turns out this is fixed in crash upstream master. It is a one-liner which can be easily applied to our fork once it's ready.

See crashub/crash@5cfc460

Handle xen migration

The xen drivers have stubs that are not implemented (some are just not tested) for suspend / resume. However, suspend / resume in xen driver's context also refer to the migration of a xen guest to another machine. The paravirtual drivers aid that process and will gracefully shutdown and later resume their functioning. Usually the process is very close to connect / disconnect, but some important differences are expected.

We need to implement and test those routines to make sure xen guests are migrateable.

Java crash

We have a crash in Java, reproduced by the following simple procedure:

Run the Shrew tiny web server, with the command line:
sudo scripts/run.py -d -n -m2G -e "java.so -jar /java/cli.jar java com.cloudius.cli.util.Shrew fg"
In another window on the host I run the simple curl loop:
while :; do; curl http://192.168.122.100:8080/; done

Java crashes in less than a second, after about a dozen requests.

By hbreak'ing in crash_handler(), we get the following stack trace (unfortunately, no debugging information):

#2  <signal handler called>
#3  0x0000100000ed0681 in Monitor::wait(bool, long, bool) ()
#4  0x0000100001064bcc in VMThread::execute(VM_Operation*) ()
#5  0x0000100000a790a6 in BiasedLocking::revoke_and_rebias(Handle, bool, Thread*) ()
#6  0x0000100000fe5924 in ObjectLocker::ObjectLocker(Handle, Thread*, bool) ()
#7  0x000010000101e125 in JavaThread::exit(bool, JavaThread::ExitType) ()
#8  0x000010000101f062 in JavaThread::thread_main_inner() ()
#9  0x0000100000f020c2 in java_start(Thread*) ()
#10 0x00000000003f365d in pthread_private::pthread::pthread(void* (*)(void*), void*, sigset_t, pthread_private::thread_attr const*)::{lambda()#1}::operator()() const () at ../../libc/pthread.cc:59

So Java crashes while shutting down a thread, in JavaThread::exit(), I still am not sure exactly where or why.
Note that this happens also with the complete() fix (don't wake the joiner while complete() is still running), so it's not a manifestation of that bug, but a new bug.

Memcached

We want to allow deploying a memcached "virtual appliance" on OSV.
We should start by seeing that the standard memcached runs on OSV, and measuring its performance (e.g., using the memslap benchmark). It will be nice if we can get better performance on SMP VMs than Linux (both taken with out-of-the-box configuration).

But we can go beyond this, an write a new version of memcached (perhaps even from scratch) which uses OSV features directly for zero copy, etc., and hopefully improve performance significantly.

Proposal: lock-free alternative to condvar

Condvar uses locks (mutexes).

Our condvar implementation could be changed to not use an internal mutex (condvar->m) and rather use some lock-free queue data structure, but this will gain little, because we are still left with the user mutex. I.e., to properly use a condvar one should use the idiom:

waiter:
     mtx.lock();
     if(!condition) {    // or even "while" instead of "if"
        cv.wait(mtx);
     }
     mtx.unlock();

waker:
    mtx.lock();
    condition=true;
    mtx.unlock;
    cv.wake_all();

Note how the waker needs to use a mutex, even if the wake_all() implementation doesn't internally, and if we have concurrent wakers some of them may go to sleep because of this "contention" - even if nobody is even waiting on the condition variable.

To design a truely lock-free alternative to convar, we also need to avoid the user mutex, so we can't use the traditional condvar API and need to use a slightly different one.

The reason that the user mutex was needed was to make the test of the condition, and the wait (in the waiter thread) one critical section - so that we don't lose a wakeup that happens after we tested the condition.
Instead of using a mutex for this purpose, we can do something similar to our wait_until() implementation, in some way rememebering that this thread is about to check the condition, so that a wakeup in this point will avoid a wait, even if the condition evaluated to false.

Basically, this will result in an API similar to our existing wait_until/wake(), just that wait_until() will register in a given object (call this perhaps a waitqueue?) and the target of a "wake" will not be a thread, but rather a waitqueue. For example:

waiter:
         waitq.wait_until([&] {return condition});

waker:
         condition = true;
         waitq.wake_all();

Note how no mutex is needed when setting condition = true (it should be an atomic variable, though, and set with memory_order_relaxed) - wait_until will correctly handle the case where it tests condition, it is still false, but before it starts waiting both the condition=true and the wake_all() come.

P.S. For simplicity, we can implement a first version of this new API using an internal mutex (just like condvar now has, and so do Linux's wait queues) and not a lock-free data structure. It won't be entirely lock-free, but at least the user will not have to use an additional mutex!

Improve scheduler's wakeup performance

For N cpus, the scheduler uses N^2 wakeup queues, where each CPU tells another CPU which thread it wants to wake.

We can improve performance of these wakeup queues in two ways - 1. replace the queue implementation but a faster one, and 2. avoid unnecessary IPIs

Some more details:

The current implementation uses a linked list. This implementation allows multiple producers, a feature we don't need here. Rather, we can use a ring-based single-producer single-consumer queue (see lockfree::ring). We measured the ring implementation to be 3 times faster than the linked list one. Note that the ring is limited in size, so if it gets full we need a fallback - which should probably be the existing linked list code.
Only send an IPI when adding an item to an empty queue.
Can we avoid IPIs in more ways?

Build fails with Java 1.8 (EA)

Mostly for reference: when JAVA_HOME is set in the environment, and points to a Java 8 version, the build fails. make V=1 reports

$ make V=1
ant -Dmode=release -Dout=/home/andreas/project/osv/build/release/tests/bench -e -f tests/bench/build.xml 
Buildfile: /home/andreas/project/osv/tests/bench/build.xml

init:

compile:

BUILD FAILED
/home/andreas/project/osv/tests/bench/build.xml:20: Class not found: javac1.8

Total time: 0 seconds
make: *** [all] Error 1

$ java -version
java version "1.8.0-ea"
Java(TM) SE Runtime Environment (build 1.8.0-ea-b109)
Java HotSpot(TM) 64-Bit Server VM (build 25.0-b51, mixed mode)

The build seems to pass this point (still building right now) when having JAVA_HOME points to Oracle's JDK 1.7.0_40.

Allow SSH connectivity to the CLI

Try using the pure Java, Apache Mina SSH server.

Mono support?

Is there any user stories dedicated to utilizing Mono on this platform? This stuff sounds interesting... but not being a Java developer, it may be lost on me.

distribute xen interrupts

All xen interrupts are arriving at CPU0. We should try to spread them more in an intelligent fashion. AFAIU, xen does not yet support multiqueue (See http://wiki.xenproject.org/wiki/Xen_Development_Projects#Multiqueue_support_for_Xen_netback.2Fnetfront_in_Linux_kernel), and Amazon certainly does not. So we won't take the interrupts at the same time. Still, we should be prepared to take them anywhere.

Zero-length vmas

In gdb's "osv mmap" I see output like this:

(gdb) osv mmap
0xffffc0003fff1008 0x0000000000000000 0x0000000000000000 [0 kB]
0xffffc0003ff63180 0x0000100000000000 0x0000100000005000 [20 kB]
0xffffc00037083a00 0x0000100000005000 0x0000100000005000 [0 kB]
0xffffc0003ff63700 0x0000100000204000 0x0000100000206000 [8 kB]
0xffffc00037083a40 0x0000100000206000 0x0000100000206000 [0 kB]
0xffffc0003ff5d6c0 0x0000200000000000 0x0000200000100000 [1024 kB]
0xffffc000386a1180 0x0000200000100000 0x0000200000200000 [1024 kB]
0xffffc0003fff0008 0x0000800000000000 0x0000800000000000 [0 kB]

All these 0-length vmas shouldn't be there. This is probably benign, but it would be a good idea to get rid of them anyway.

By the way, "osv mmap" should probably not show the vma address (who cares?), and for file-mapped vmas, it would be nice to see the file and offset.

Need a /dev/urandom

We're missing random support. Applications do try to open it (netperf).
We'll need an entropy from the drivers/kvmclock too

netserver on OSv doesn't work in TCP_MAERTS mode with test duration limited by time

Netserver relies on alarm() to set test completion timer and this part is disabled by osv.patch.
As a workaround test duration may be limited by amount of data sent instead of amount of time passed. Then netserver is working.

Hook the JVM dtrace to OSv shell

For GC events, we could try to hook our tracepoints to the dtrace
probes in Hotspot JVM:

http://docs.oracle.com/javase/7/docs/technotes/guides/vm/dtrace.html

Behavior of Ctrl+C is not intuitive when OSv is run with interactive shell

Currently when you hit Ctrl+C the qemu terminates and the OS is brought down. If I start OSv with intent to use its command line interface this behavior is not intuitive as I expect that when I'm in a shell hitting ^C will clear the command line.

A solution to this could be to:

start qemu with -nographic (unless in "daemon" mode), key strokes like Ctrl+C will be passed to the guest
change the CLI to react on Ctrl+C by clearing the command line
add a command to shut down the OS (convenience)

Downside of this approach would be that one would not be able to kill OSv from console using Ctrl+C.

I'm interested in your opinions on that one.

When crash CLI is started inside OSv it uses older version of jline than it was compiled with

Even though the "crash" module depends on jline-2.10, the library is masked in runtime by older version (jline-2.7) which comes first in classpath. The older version comes from web-1.0.jar, which gets it from jruby-core-1.7.jar.

A workaround is to change the default command line to have crash in front of web.jar like this:

java.so -cp /usr/mgmt/crash-1.0.jar:/usr/mgmt/web-1.0.jar org.jruby.JarBootstrapMain app prod

However I think it would be better if we could fix it in the gradle build somehow. I'm looking for advice from someone who knows it better.

I didn't find any serious issue related to this (yet), but we are lacking jline 2.10 features and it is potentially dangerous to have an older jar before newer. It also leads to surprises when debugging code.

Loops during boot running either on qemu or kvm

Built per the instructions and started using scripts/run.py with no arguments. OS is debian wheezy, amd64. zfs-fuse is running

Get this repeated on the terminal:

acpi 0 apic 0
acpi 1 apic 1
acpi 2 apic 2
acpi 3 apic 3
APIC base fee00000

connecting to the VNC console shows a "booting from hard disk" loop.

Reduce virtio and sglist allocations

Currently we use a virtio_net_req object for the tx of each packet.
We can reduce it's allocation by an array of such entries.
In addition, the sglist implementation uses std:list which allocs on the
head even thought the object itself lives on the stack. It's possible
to change add_buf to get a single buffer address each time and after several
of them change the index of the ring. This way there is no need for sglist within virtio-net

One loose end in FPU saving

Our FPU saving on context switch is almost correct: The FPU is correctly not saved on context switch when !preempt (i.e., the context switch was called by the program's function call, such as wait_until() or yield()), because the x86 calling conventions specifies that the FPU state is caller saved, so if the caller of that function wanted the FPU state to be saved, it needed to do it itself.

However, it seems that the FCW and MXCSR configuration registers, unlike the others, are callee-saved (or at least partially callee-saved) and we need to save/restore them in any case, not just !preempt, and we don't do this at the moment. It doesn't seem to cause any bugs I can notice now, but it will probably bite us sometime in the future.

We probably need to use functions like this: (just use inline functions, not macros...)

define stmxcsr(addr) asm volatile("stmxcsr %0" : "=m" (*(addr)))

define ldmxcsr(addr) asm volatile("ldmxcsr %0" : : "m" (*(addr)))

define fnstcw(addr) asm volatile("fnstcw %0" : "=m" (*(addr)))

define fldcw(addr) asm volatile("fldcw %0" : : "m" (*(addr)))

And in arch-cpu.hh, !preempt case, do something like
save:
u32 x = (u32)p->_fpu.s;
stmxcsr(x);
fnstcw(x+1);
restore:
u32 x = (u32)p->_fpu.s;
ldmxcsr(x);
fldcw(x+1);

Implement DHCP

Support DHCP

Porting of the Android permissive DHCP code
Make modifications to the netport as needed

condvar_wake_* optimization for no waiters

Suggestion made by Avi:

Currently condvar_wake_one()/all() take the internal mutex before checking if there's anybody to wake. This is inefficient in the interesting case where wake() is called many times on a frequent event that often nobody is waiting on.

wake() should check if the list is empty without the lock, and only take the lock when it isn't empty. Of course, this should be done correctly, with the correct memory barriers as needed.

Add "wait morphing" to improve condvar wake performance

The existing codvar_wake_one()/all() code wakes up one/all of the threads sleeping on the mutex, and they all wake up and try to take the user mutex associated with the condvar.

This creates two performance problems:

The "thundering herd" problem - When there are many threads waiting on the condvar and they are all woken with condvar_wake_all(), all these threads start to run (requiring a lot of work on scheduling, IPIs, etc.) but all of them except one will immediately go back to sleep waiting to take the user mutex. See a description of this problem and how it was solved in Linux in https://lwn.net/Articles/32746/.
The "locked wakeup" problem - Often, condvar_wake is done with the user mutex locked (see a good discussion in http://www.domaigne.com/blog/computing/condvars-signal-with-mutex-locked-or-not/). Most of the BSD code we use does this, for example. When condvar_wake() wakes the thread, if it wakes up quickly enough, it can try to take the mutex before the waker released it, and need to go back to sleep. This problem is especially noticable when both threads run on a single CPU, and the scheduler decides to context-switch to the wakee when it is woken.

Both problems can be solved by a technique called wait morphing: when the condvar has an associated mutex, condvar_wake_*() does not wake up the threads, but rather move them from waiting on the condition variable, to waiting on the mutex. Each of these threads will later be woken (separately) when the mutex becomes available for it, and when it wakes up it doesn't need to take the mutex, because it was already taken for it.

It is not clear that this issue actually has a lot of performance impact on any of our use cases (once the scheduler bugs are eliminated), but it seems there's popular demand for solving it.

monitor exception bug

When running the web interface through a browser, once someone click on the
monitor option we get the following exception (both in the web browser and on the console):

[/]% Java::JavaxManagement::InstanceNotFoundException - java.lang:type=MemoryPool,name=PS Perm Gen:
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1095)
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBeanInfo(DefaultMBeanServerInterceptor.java:1375)
com.sun.jmx.mbeanserver.JmxMBeanServer.getMBeanInfo(JmxMBeanServer.java:920)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:606)
org.jruby.javasupport.JavaMethod.invokeDirectWithExceptionHandling(JavaMethod.java:455)
org.jruby.javasupport.JavaMethod.invokeDirect(JavaMethod.java:316)
org.jruby.java.invokers.InstanceMethodInvoker.call(InstanceMethodInvoker.java:61)
org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:326)
org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:170)
org.jruby.ast.CallOneArgNode.interpret(CallOneArgNode.java:57)
org.jruby.ast.LocalAsgnNode.interpret(LocalAsgnNode.java:123)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
org.jruby.evaluator.ASTInterpreter.INTERPRET_METHOD(ASTInterpreter.java:74)
org.jruby.internal.runtime.methods.InterpretedMethod.call(InterpretedMethod.java:225)
org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:202)
org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:346)
org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:204)
org.jruby.ast.FCallTwoArgNode.interpret(FCallTwoArgNode.java:38)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
org.jruby.evaluator.ASTInterpreter.INTERPRET_METHOD(ASTInterpreter.java:74)
org.jruby.internal.runtime.methods.InterpretedMethod.call(InterpretedMethod.java:182)
org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:186)
org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:326)
org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:170)
org.jruby.ast.CallOneArgNode.interpret(CallOneArgNode.java:57)
org.jruby.ast.LocalAsgnNode.interpret(LocalAsgnNode.java:123)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
org.jruby.evaluator.ASTInterpreter.INTERPRET_METHOD(ASTInterpreter.java:74)
org.jruby.internal.runtime.methods.InterpretedMethod.call(InterpretedMethod.java:161)
org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:178)
org.jruby.RubyMethod.call(RubyMethod.java:118)
org.jruby.RubyMethod$INVOKER$i$call.call(RubyMethod$INVOKER$i$call.gen)
org.jruby.internal.runtime.methods.JavaMethod$JavaMethodZeroOrNBlock.call(JavaMethod.java:277)
org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:306)
org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:136)
org.jruby.ast.CallNoArgNode.interpret(CallNoArgNode.java:60)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.ast.IfNode.interpret(IfNode.java:118)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
org.jruby.ast.IfNode.interpret(IfNode.java:116)
org.jruby.ast.IfNode.interpret(IfNode.java:118)
org.jruby.ast.IfNode.interpret(IfNode.java:118)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
org.jruby.evaluator.ASTInterpreter.INTERPRET_METHOD(ASTInterpreter.java:74)
org.jruby.internal.runtime.methods.InterpretedMethod.call(InterpretedMethod.java:268)
org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:218)
org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:236)
org.jruby.ast.FCallThreeArgNode.interpret(FCallThreeArgNode.java:40)
org.jruby.ast.DAsgnNode.interpret(DAsgnNode.java:110)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
org.jruby.evaluator.ASTInterpreter.INTERPRET_BLOCK(ASTInterpreter.java:112)
org.jruby.runtime.Interpreted19Block.evalBlockBody(Interpreted19Block.java:206)
org.jruby.runtime.Interpreted19Block.yield(Interpreted19Block.java:157)
org.jruby.runtime.Block.yield(Block.java:142)
org.jruby.RubyArray.eachCommon(RubyArray.java:1610)
org.jruby.RubyArray.each(RubyArray.java:1617)
org.jruby.RubyArray$INVOKER$i$0$0$each.call(RubyArray$INVOKER$i$0$0$each.gen)
org.jruby.runtime.callsite.CachingCallSite.callBlock(CachingCallSite.java:143)
org.jruby.runtime.callsite.CachingCallSite.callIter(CachingCallSite.java:154)
org.jruby.ast.CallNoArgBlockNode.interpret(CallNoArgBlockNode.java:64)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
org.jruby.evaluator.ASTInterpreter.INTERPRET_METHOD(ASTInterpreter.java:74)
org.jruby.internal.runtime.methods.InterpretedMethod.call(InterpretedMethod.java:225)
org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:202)
org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:202)
org.jruby.ast.FCallTwoArgNode.interpret(FCallTwoArgNode.java:38)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.evaluator.ASTInterpreter.INTERPRET_METHOD(ASTInterpreter.java:74)
org.jruby.internal.runtime.methods.InterpretedMethod.call(InterpretedMethod.java:182)
org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:186)
org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:326)
org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:170)
org.jruby.ast.CallOneArgNode.interpret(CallOneArgNode.java:57)
org.jruby.ast.LocalAsgnNode.interpret(LocalAsgnNode.java:123)
org.jruby.ast.IfNode.interpret(IfNode.java:110)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.ast.DNode.appendToString(DNode.java:70)
org.jruby.ast.DNode.buildDynamicString(DNode.java:88)
org.jruby.ast.DNode.interpret(DNode.java:36)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.evaluator.ASTInterpreter.INTERPRET_METHOD(ASTInterpreter.java:74)
org.jruby.internal.runtime.methods.InterpretedMethod.call(InterpretedMethod.java:182)
org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:186)
org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:326)
org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:170)
org.jruby.ast.FCallOneArgNode.interpret(FCallOneArgNode.java:36)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
org.jruby.evaluator.ASTInterpreter.INTERPRET_METHOD(ASTInterpreter.java:74)
org.jruby.internal.runtime.methods.InterpretedMethod.call(InterpretedMethod.java:182)
org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:186)
org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:168)
org.jruby.ast.CallOneArgNode.interpret(CallOneArgNode.java:57)
org.jruby.ast.ReturnNode.interpret(ReturnNode.java:92)
org.jruby.ast.IfNode.interpret(IfNode.java:116)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
org.jruby.evaluator.ASTInterpreter.INTERPRET_METHOD(ASTInterpreter.java:74)
org.jruby.internal.runtime.methods.InterpretedMethod.call(InterpretedMethod.java:225)
org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:202)
org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:346)
org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:204)
org.jruby.ast.CallTwoArgNode.interpret(CallTwoArgNode.java:59)
org.jruby.ast.LocalAsgnNode.interpret(LocalAsgnNode.java:123)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
org.jruby.evaluator.ASTInterpreter.INTERPRET_METHOD(ASTInterpreter.java:74)
org.jruby.internal.runtime.methods.InterpretedMethod.call(InterpretedMethod.java:182)
org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:186)
org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:326)
org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:170)
org.jruby.ast.FCallOneArgNode.interpret(FCallOneArgNode.java:36)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.evaluator.ASTInterpreter.INTERPRET_BLOCK(ASTInterpreter.java:112)
org.jruby.runtime.Interpreted19Block.evalBlockBody(Interpreted19Block.java:206)
org.jruby.runtime.Interpreted19Block.yield(Interpreted19Block.java:194)
org.jruby.runtime.Interpreted19Block.call(Interpreted19Block.java:125)
org.jruby.runtime.Block.call(Block.java:101)
org.jruby.RubyProc.call(RubyProc.java:274)
org.jruby.internal.runtime.methods.ProcMethod.call(ProcMethod.java:64)
org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:201)
org.jruby.RubyMethod.call(RubyMethod.java:118)
org.jruby.RubyMethod$INVOKER$i$call.call(RubyMethod$INVOKER$i$call.gen)
org.jruby.internal.runtime.methods.JavaMethod$JavaMethodZeroOrNBlock.call(JavaMethod.java:277)
org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:134)
org.jruby.ast.CallNoArgNode.interpret(CallNoArgNode.java:60)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.evaluator.ASTInterpreter.INTERPRET_BLOCK(ASTInterpreter.java:112)
org.jruby.runtime.Interpreted19Block.evalBlockBody(Interpreted19Block.java:206)
org.jruby.runtime.Interpreted19Block.yield(Interpreted19Block.java:194)
org.jruby.runtime.Interpreted19Block.call(Interpreted19Block.java:125)
org.jruby.runtime.Block.call(Block.java:101)
org.jruby.RubyProc.call(RubyProc.java:274)
org.jruby.RubyProc.call19(RubyProc.java:255)
org.jruby.RubyProc$INVOKER$i$0$0$call19.call(RubyProc$INVOKER$i$0$0$call19.gen)
org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:217)
org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:213)
org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:202)
org.jruby.ast.CallSpecialArgNode.interpret(CallSpecialArgNode.java:69)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.evaluator.ASTInterpreter.INTERPRET_BLOCK(ASTInterpreter.java:112)
org.jruby.runtime.Interpreted19Block.evalBlockBody(Interpreted19Block.java:206)
org.jruby.runtime.Interpreted19Block.yield(Interpreted19Block.java:157)
org.jruby.runtime.Interpreted19Block.yieldSpecific(Interpreted19Block.java:130)
org.jruby.runtime.Block.yieldSpecific(Block.java:111)
org.jruby.ast.ZYieldNode.interpret(ZYieldNode.java:25)
org.jruby.ast.FCallTwoArgNode.interpret(FCallTwoArgNode.java:38)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.evaluator.ASTInterpreter.INTERPRET_METHOD(ASTInterpreter.java:74)
org.jruby.internal.runtime.methods.InterpretedMethod.call(InterpretedMethod.java:161)
org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:178)
org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:316)
org.jruby.runtime.callsite.CachingCallSite.callBlock(CachingCallSite.java:145)
org.jruby.runtime.callsite.CachingCallSite.callIter(CachingCallSite.java:154)
org.jruby.ast.FCallNoArgBlockNode.interpret(FCallNoArgBlockNode.java:32)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
org.jruby.evaluator.ASTInterpreter.INTERPRET_BLOCK(ASTInterpreter.java:112)
org.jruby.runtime.Interpreted19Block.evalBlockBody(Interpreted19Block.java:206)
org.jruby.runtime.Interpreted19Block.yield(Interpreted19Block.java:194)
org.jruby.runtime.Interpreted19Block.yield(Interpreted19Block.java:177)
org.jruby.runtime.Interpreted19Block.yieldSpecific(Interpreted19Block.java:140)
org.jruby.runtime.Block.yieldSpecific(Block.java:129)
org.jruby.ast.YieldTwoNode.interpret(YieldTwoNode.java:31)
org.jruby.ast.IfNode.interpret(IfNode.java:118)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
org.jruby.evaluator.ASTInterpreter.INTERPRET_BLOCK(ASTInterpreter.java:112)
org.jruby.runtime.Interpreted19Block.evalBlockBody(Interpreted19Block.java:206)
org.jruby.runtime.Interpreted19Block.yield(Interpreted19Block.java:157)
org.jruby.runtime.Block.yield(Block.java:142)
org.jruby.RubyContinuation.enter(RubyContinuation.java:107)
org.jruby.RubyKernel.rbCatch19Common(RubyKernel.java:1261)
org.jruby.RubyKernel.rbCatch19(RubyKernel.java:1254)
org.jruby.RubyKernel$INVOKER$s$rbCatch19.call(RubyKernel$INVOKER$s$rbCatch19.gen)
org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:336)
org.jruby.runtime.callsite.CachingCallSite.callBlock(CachingCallSite.java:179)
org.jruby.runtime.callsite.CachingCallSite.callIter(CachingCallSite.java:188)
org.jruby.ast.FCallOneArgBlockNode.interpret(FCallOneArgBlockNode.java:34)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
org.jruby.ast.EnsureNode.interpret(EnsureNode.java:96)
org.jruby.evaluator.ASTInterpreter.INTERPRET_METHOD(ASTInterpreter.java:74)
org.jruby.internal.runtime.methods.InterpretedMethod.call(InterpretedMethod.java:290)
org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:226)
org.jruby.runtime.callsite.CachingCallSite.callBlock(CachingCallSite.java:245)
org.jruby.runtime.callsite.CachingCallSite.callIter(CachingCallSite.java:256)
org.jruby.ast.FCallThreeArgBlockNode.interpret(FCallThreeArgBlockNode.java:36)
org.jruby.ast.LocalAsgnNode.interpret(LocalAsgnNode.java:123)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.evaluator.ASTInterpreter.INTERPRET_BLOCK(ASTInterpreter.java:112)
org.jruby.runtime.Interpreted19Block.evalBlockBody(Interpreted19Block.java:206)
org.jruby.runtime.Interpreted19Block.yield(Interpreted19Block.java:157)
org.jruby.runtime.Block.yield(Block.java:142)
org.jruby.RubyArray.eachCommon(RubyArray.java:1610)
org.jruby.RubyArray.each(RubyArray.java:1617)
org.jruby.RubyArray$INVOKER$i$0$0$each.call(RubyArray$INVOKER$i$0$0$each.gen)
org.jruby.runtime.callsite.CachingCallSite.callBlock(CachingCallSite.java:143)
org.jruby.runtime.callsite.CachingCallSite.callIter(CachingCallSite.java:154)
org.jruby.ast.CallNoArgBlockNode.interpret(CallNoArgBlockNode.java:64)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.ast.IfNode.interpret(IfNode.java:116)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
org.jruby.evaluator.ASTInterpreter.INTERPRET_METHOD(ASTInterpreter.java:74)
org.jruby.internal.runtime.methods.InterpretedMethod.call(InterpretedMethod.java:139)
org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:170)
org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:306)
org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:136)
org.jruby.ast.FCallNoArgNode.interpret(FCallNoArgNode.java:31)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
org.jruby.evaluator.ASTInterpreter.INTERPRET_BLOCK(ASTInterpreter.java:112)
org.jruby.runtime.Interpreted19Block.evalBlockBody(Interpreted19Block.java:206)
org.jruby.runtime.Interpreted19Block.yield(Interpreted19Block.java:157)
org.jruby.runtime.Interpreted19Block.yieldSpecific(Interpreted19Block.java:130)
org.jruby.runtime.Block.yieldSpecific(Block.java:111)
org.jruby.ast.ZYieldNode.interpret(ZYieldNode.java:25)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.evaluator.ASTInterpreter.INTERPRET_BLOCK(ASTInterpreter.java:112)
org.jruby.runtime.Interpreted19Block.evalBlockBody(Interpreted19Block.java:206)
org.jruby.runtime.Interpreted19Block.yield(Interpreted19Block.java:157)
org.jruby.runtime.Block.yield(Block.java:142)
org.jruby.RubyContinuation.enter(RubyContinuation.java:107)
org.jruby.RubyKernel.rbCatch19Common(RubyKernel.java:1261)
org.jruby.RubyKernel.rbCatch19(RubyKernel.java:1254)
org.jruby.RubyKernel$INVOKER$s$rbCatch19.call(RubyKernel$INVOKER$s$rbCatch19.gen)
org.jruby.runtime.callsite.CachingCallSite.callBlock(CachingCallSite.java:177)
org.jruby.runtime.callsite.CachingCallSite.callIter(CachingCallSite.java:188)
org.jruby.ast.FCallOneArgBlockNode.interpret(FCallOneArgBlockNode.java:34)
org.jruby.ast.LocalAsgnNode.interpret(LocalAsgnNode.java:123)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
org.jruby.evaluator.ASTInterpreter.INTERPRET_METHOD(ASTInterpreter.java:74)
org.jruby.internal.runtime.methods.InterpretedMethod.call(InterpretedMethod.java:161)
org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:178)
org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:316)
org.jruby.runtime.callsite.CachingCallSite.callBlock(CachingCallSite.java:145)
org.jruby.runtime.callsite.CachingCallSite.callIter(CachingCallSite.java:154)
org.jruby.ast.FCallNoArgBlockNode.interpret(FCallNoArgBlockNode.java:32)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.ast.RescueNode.executeBody(RescueNode.java:222)
org.jruby.ast.RescueNode.interpret(RescueNode.java:117)
org.jruby.ast.EnsureNode.interpret(EnsureNode.java:96)
org.jruby.evaluator.ASTInterpreter.INTERPRET_METHOD(ASTInterpreter.java:74)
org.jruby.internal.runtime.methods.InterpretedMethod.call(InterpretedMethod.java:139)
org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:170)
org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:306)
org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:136)
org.jruby.ast.FCallNoArgNode.interpret(FCallNoArgNode.java:31)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.evaluator.ASTInterpreter.INTERPRET_BLOCK(ASTInterpreter.java:112)
org.jruby.runtime.Interpreted19Block.evalBlockBody(Interpreted19Block.java:206)
org.jruby.runtime.Interpreted19Block.yield(Interpreted19Block.java:157)
org.jruby.runtime.Interpreted19Block.yieldSpecific(Interpreted19Block.java:130)
org.jruby.runtime.Block.yieldSpecific(Block.java:111)
org.jruby.ast.ZYieldNode.interpret(ZYieldNode.java:25)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.evaluator.ASTInterpreter.INTERPRET_BLOCK(ASTInterpreter.java:112)
org.jruby.runtime.Interpreted19Block.evalBlockBody(Interpreted19Block.java:206)
org.jruby.runtime.Interpreted19Block.yield(Interpreted19Block.java:157)
org.jruby.runtime.Block.yield(Block.java:142)
org.jruby.RubyContinuation.enter(RubyContinuation.java:107)
org.jruby.RubyKernel.rbCatch19Common(RubyKernel.java:1261)
org.jruby.RubyKernel.rbCatch19(RubyKernel.java:1254)
org.jruby.RubyKernel$INVOKER$s$rbCatch19.call(RubyKernel$INVOKER$s$rbCatch19.gen)
org.jruby.runtime.callsite.CachingCallSite.callBlock(CachingCallSite.java:177)
org.jruby.runtime.callsite.CachingCallSite.callIter(CachingCallSite.java:188)
org.jruby.ast.FCallOneArgBlockNode.interpret(FCallOneArgBlockNode.java:34)
org.jruby.ast.LocalAsgnNode.interpret(LocalAsgnNode.java:123)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
org.jruby.evaluator.ASTInterpreter.INTERPRET_METHOD(ASTInterpreter.java:74)
org.jruby.internal.runtime.methods.InterpretedMethod.call(InterpretedMethod.java:161)
org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:178)
org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:316)
org.jruby.runtime.callsite.CachingCallSite.callBlock(CachingCallSite.java:145)
org.jruby.runtime.callsite.CachingCallSite.callIter(CachingCallSite.java:154)
org.jruby.ast.FCallNoArgBlockNode.interpret(FCallNoArgBlockNode.java:32)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
org.jruby.evaluator.ASTInterpreter.INTERPRET_METHOD(ASTInterpreter.java:74)
org.jruby.internal.runtime.methods.InterpretedMethod.call(InterpretedMethod.java:182)
org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:186)
org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:326)
org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:170)
org.jruby.ast.CallOneArgNode.interpret(CallOneArgNode.java:57)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.evaluator.ASTInterpreter.INTERPRET_METHOD(ASTInterpreter.java:74)
org.jruby.internal.runtime.methods.InterpretedMethod.call(InterpretedMethod.java:182)
org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:186)
org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:326)
org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:170)
org.jruby.ast.CallOneArgNode.interpret(CallOneArgNode.java:57)
org.jruby.ast.MultipleAsgn19Node.interpret(MultipleAsgn19Node.java:104)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
org.jruby.evaluator.ASTInterpreter.INTERPRET_METHOD(ASTInterpreter.java:74)
org.jruby.internal.runtime.methods.InterpretedMethod.call(InterpretedMethod.java:182)
org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:186)
org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:168)
org.jruby.ast.CallOneArgNode.interpret(CallOneArgNode.java:57)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
org.jruby.ast.EnsureNode.interpret(EnsureNode.java:96)
org.jruby.evaluator.ASTInterpreter.INTERPRET_METHOD(ASTInterpreter.java:74)
org.jruby.internal.runtime.methods.InterpretedMethod.call(InterpretedMethod.java:182)
org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:186)
org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:168)
org.jruby.ast.CallOneArgNode.interpret(CallOneArgNode.java:57)
org.jruby.ast.MultipleAsgn19Node.interpret(MultipleAsgn19Node.java:104)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
org.jruby.evaluator.ASTInterpreter.INTERPRET_METHOD(ASTInterpreter.java:74)
org.jruby.internal.runtime.methods.InterpretedMethod.call(InterpretedMethod.java:182)
org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:186)
org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:326)
org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:170)
org.jruby.ast.CallOneArgNode.interpret(CallOneArgNode.java:57)
org.jruby.ast.OrNode.interpret(OrNode.java:100)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
org.jruby.evaluator.ASTInterpreter.INTERPRET_METHOD(ASTInterpreter.java:74)
org.jruby.internal.runtime.methods.InterpretedMethod.call(InterpretedMethod.java:182)
org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:186)
org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:326)
org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:170)
org.jruby.ast.CallOneArgNode.interpret(CallOneArgNode.java:57)
org.jruby.ast.OrNode.interpret(OrNode.java:100)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
org.jruby.evaluator.ASTInterpreter.INTERPRET_METHOD(ASTInterpreter.java:74)
org.jruby.internal.runtime.methods.InterpretedMethod.call(InterpretedMethod.java:182)
org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:186)
org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:168)
org.jruby.ast.CallOneArgNode.interpret(CallOneArgNode.java:57)
org.jruby.ast.MultipleAsgn19Node.interpret(MultipleAsgn19Node.java:104)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
org.jruby.evaluator.ASTInterpreter.INTERPRET_METHOD(ASTInterpreter.java:74)
org.jruby.internal.runtime.methods.InterpretedMethod.call(InterpretedMethod.java:182)
org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:186)
org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:168)
org.jruby.ast.CallOneArgNode.interpret(CallOneArgNode.java:57)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
org.jruby.evaluator.ASTInterpreter.INTERPRET_METHOD(ASTInterpreter.java:74)
org.jruby.internal.runtime.methods.InterpretedMethod.call(InterpretedMethod.java:182)
org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:186)
org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:168)
org.jruby.ast.CallOneArgNode.interpret(CallOneArgNode.java:57)
org.jruby.ast.MultipleAsgn19Node.interpret(MultipleAsgn19Node.java:104)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
org.jruby.evaluator.ASTInterpreter.INTERPRET_METHOD(ASTInterpreter.java:74)
org.jruby.internal.runtime.methods.InterpretedMethod.call(InterpretedMethod.java:182)
org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:186)
org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:168)
org.jruby.ast.CallOneArgNode.interpret(CallOneArgNode.java:57)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.ast.RescueNode.executeBody(RescueNode.java:222)
org.jruby.ast.RescueNode.interpret(RescueNode.java:117)
org.jruby.evaluator.ASTInterpreter.INTERPRET_METHOD(ASTInterpreter.java:74)
org.jruby.internal.runtime.methods.InterpretedMethod.call(InterpretedMethod.java:182)
org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:186)
org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:168)
org.jruby.ast.CallOneArgNode.interpret(CallOneArgNode.java:57)
org.jruby.ast.ArrayNode.interpretPrimitive(ArrayNode.java:94)
org.jruby.ast.ArrayNode.interpret(ArrayNode.java:84)
org.jruby.ast.MultipleAsgn19Node.interpret(MultipleAsgn19Node.java:104)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
org.jruby.evaluator.ASTInterpreter.INTERPRET_METHOD(ASTInterpreter.java:74)
org.jruby.internal.runtime.methods.InterpretedMethod.call(InterpretedMethod.java:182)
org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:186)
org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:168)
org.jruby.ast.CallOneArgNode.interpret(CallOneArgNode.java:57)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.evaluator.ASTInterpreter.INTERPRET_METHOD(ASTInterpreter.java:74)
org.jruby.internal.runtime.methods.InterpretedMethod.call(InterpretedMethod.java:182)
org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:186)
org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:168)
org.jruby.ast.CallOneArgNode.interpret(CallOneArgNode.java:57)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.evaluator.ASTInterpreter.INTERPRET_BLOCK(ASTInterpreter.java:112)
org.jruby.runtime.Interpreted19Block.evalBlockBody(Interpreted19Block.java:206)
org.jruby.runtime.Interpreted19Block.yield(Interpreted19Block.java:157)
org.jruby.runtime.Interpreted19Block.yieldSpecific(Interpreted19Block.java:130)
org.jruby.runtime.Block.yieldSpecific(Block.java:111)
org.jruby.ast.ZYieldNode.interpret(ZYieldNode.java:25)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.ast.IfNode.interpret(IfNode.java:118)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.evaluator.ASTInterpreter.INTERPRET_METHOD(ASTInterpreter.java:74)
org.jruby.internal.runtime.methods.InterpretedMethod.call(InterpretedMethod.java:161)
org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:178)
org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:316)
org.jruby.runtime.callsite.CachingCallSite.callBlock(CachingCallSite.java:145)
org.jruby.runtime.callsite.CachingCallSite.callIter(CachingCallSite.java:154)
org.jruby.ast.FCallNoArgBlockNode.interpret(FCallNoArgBlockNode.java:32)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.evaluator.ASTInterpreter.INTERPRET_METHOD(ASTInterpreter.java:74)
org.jruby.internal.runtime.methods.InterpretedMethod.call(InterpretedMethod.java:182)
org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:186)
org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:326)
org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:170)
org.jruby.ast.CallOneArgNode.interpret(CallOneArgNode.java:57)
org.jruby.ast.MultipleAsgn19Node.interpret(MultipleAsgn19Node.java:104)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
org.jruby.ast.RescueNode.executeBody(RescueNode.java:222)
org.jruby.ast.RescueNode.interpret(RescueNode.java:117)
org.jruby.ast.BeginNode.interpret(BeginNode.java:83)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
org.jruby.ast.EnsureNode.interpret(EnsureNode.java:96)
org.jruby.ast.BeginNode.interpret(BeginNode.java:83)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
org.jruby.evaluator.ASTInterpreter.INTERPRET_METHOD(ASTInterpreter.java:74)
org.jruby.internal.runtime.methods.InterpretedMethod.call(InterpretedMethod.java:225)
org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:202)
org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:202)
org.jruby.ast.FCallTwoArgNode.interpret(FCallTwoArgNode.java:38)
org.jruby.ast.CaseNode.interpret(CaseNode.java:121)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.ast.WhileNode.interpret(WhileNode.java:131)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
org.jruby.ast.RescueNode.executeBody(RescueNode.java:222)
org.jruby.ast.RescueNode.interpret(RescueNode.java:117)
org.jruby.ast.EnsureNode.interpret(EnsureNode.java:96)
org.jruby.ast.BeginNode.interpret(BeginNode.java:83)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.evaluator.ASTInterpreter.INTERPRET_METHOD(ASTInterpreter.java:74)
org.jruby.internal.runtime.methods.InterpretedMethod.call(InterpretedMethod.java:225)
org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:202)
org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:202)
org.jruby.ast.FCallTwoArgNode.interpret(FCallTwoArgNode.java:38)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.ast.IfNode.interpret(IfNode.java:116)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.ast.RescueNode.interpret(RescueNode.java:160)
org.jruby.ast.BeginNode.interpret(BeginNode.java:83)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
org.jruby.evaluator.ASTInterpreter.INTERPRET_BLOCK(ASTInterpreter.java:112)
org.jruby.runtime.Interpreted19Block.evalBlockBody(Interpreted19Block.java:206)
org.jruby.runtime.Interpreted19Block.yield(Interpreted19Block.java:194)
org.jruby.runtime.Interpreted19Block.call(Interpreted19Block.java:125)
org.jruby.runtime.Block.call(Block.java:101)
org.jruby.RubyProc.call(RubyProc.java:274)
org.jruby.RubyProc.call19(RubyProc.java:255)
org.jruby.RubyProc$INVOKER$i$0$0$call19.call(RubyProc$INVOKER$i$0$0$call19.gen)
org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:217)
org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:213)
org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:202)
org.jruby.ast.CallSpecialArgNode.interpret(CallSpecialArgNode.java:69)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
org.jruby.ast.WhileNode.interpret(WhileNode.java:131)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
org.jruby.evaluator.ASTInterpreter.INTERPRET_BLOCK(ASTInterpreter.java:112)
org.jruby.runtime.Interpreted19Block.evalBlockBody(Interpreted19Block.java:206)
org.jruby.runtime.Interpreted19Block.yield(Interpreted19Block.java:194)
org.jruby.runtime.Interpreted19Block.call(Interpreted19Block.java:125)
org.jruby.runtime.Block.call(Block.java:101)
org.jruby.RubyProc.call(RubyProc.java:274)
org.jruby.RubyProc.call(RubyProc.java:215)
org.jruby.internal.runtime.RubyRunnable.run(RubyRunnable.java:100)
java.lang.Thread.run(Thread.java:724)

virtio-net: tx_gc_thread is being awaken too many times

tx_gc_thread is waken x30 times more than the receiver thread.
There are x14 more tx interrupts than rx interrupts

To see the relevant tracepoints, you can start osv using the following command:

$ sudo ./scripts/run.py -n -e "--trace=msi* --trace=virtio_net_?x_wake java.so -jar /java/cli.jar" -c2 -m1G

(gdb) set pagination off
(gdb) set logging on
(gdb) osv trace
(gdb) set logging off

And count either tx_wake or rx_wake in gdb.txt

virtio related crash on boot

Starting osv in debug mode, may crash 1 out of 15 times with the message

qemu-system-x86_64: virtio: trying to map MMIO memory

complete output:

$ sudo ./scripts/run.py -n -d -e "--trace=condvar* java.so -jar /java/cli.jar" -c4 -H -m4G
Loader Copyright 2013 Cloudius Systems
locale works

acpi 0 apic 0
acpi 1 apic 1
acpi 2 apic 2
acpi 3 apic 3
VFS: mounting ramfs at /
VFS: mounting devfs at /dev
ACPI: RSDP 0xfd8a0 00014 (v00 BOCHS )ACPI: RSDT 0xdfffe380 00034 (v01 BOCHS BXPCRSDT 00000001 BXPC 00000001)ACPI: FACP 0xdfffff80 00074 (v01 BOCHS BXPCFACP 00000001 BXPC 00000001)ACPI: DSDT 0xdfffe3c0 011A9 (v01 BXPC BXDSDT 00000001 INTL 20100528)ACPI: FACS 0xdfffff40 00040ACPI: SSDT 0xdffff6e0 00858 (v01 BOCHS BXPCSSDT 00000001 BXPC 00000001)ACPI: APIC 0xdffff5b0 00090 (v01 BOCHS BXPCAPIC 00000001 BXPC 00000001)ACPI: HPET 0xdffff570 00038 (v01 BOCHS BXPCHPET 00000001 BXPC 00000001)RAM disk at 0x3f23e000 (4096K bytes)
Initializing network stack...
Done!
eth0: ethernet address: 52:54:0:12:34:56
VFS: mounting zfs at /usr
zfs: mounting osv/usr from device /dev/vblk1
using unsupported TASKQ_THREADS_CPU_PCT flag
using unsupported TASKQ_THREADS_CPU_PCT flag
using unsupported TASKQ_THREADS_CPU_PCT flag
using unsupported TASKQ_THREADS_CPU_PCT flag
DKIOCFLUSHWRITECACHE ignored
DKIOCFLUSHWRITECACHE ignored
qemu-system-x86_64: virtio: trying to map MMIO memory

throughput decrease when using more vcpus

A locking issue that cause a lot of context switching, probably due to the single thread design of callouts dispatching, or assumptions of the freebsd TCP stack.

Investigation of the real cause

Reduce number of sections in loader.elf

Right now, our loader.elf has more than 3000 (!) ELF sections (see bjdump -h build/release/loader.elf). All these sections subdivisions waste space (about half a megabyte), and the subdivision of the "text" sections causes "perf kvm"'s output to be unreadable.

The solution is to fix arch/x64/loader.ld
The most important part is to merge the .text.* sections. A patch for this would be (Avi already sent a similar patch):

-    .text : { *(.text) *(.text.hot) *(.text.unlikely) *(.text.fixup) } :text
+    .text : {
+        *(.text.hot .text.hot.*)
+        *(.text.unlikely .text.unlikely.* .text.*_unlikely)
+        *(.text.fixup)
+        *(.text.startup .text.startup.*)
+        *(.text .text._*)
+    } :text

There are also hundreds of data sections, the following patch correctly merges them:

-    .data : { *(.data) } :text
+/*    .data : { *(.data) } :text */
+    .data.rel.ro : {
+        *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*)
+        *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*)
+    } : text
+    .data : { *(.data .data.* .gnu.linkonce.d.*) } :text

What remain are several hundred .gcc_except_table.* sections. An attempt to merge them with:

+    .gcc_except_table : { *(.gcc_except_table) *(.gcc_except_table.*) } : text

Miserably fails - the generated code hangs when using timers, I still have no idea why (most benchmarks fail, try for example tst-pipe.so).
Interestingly, if in the above line I replace "text" by "dynamic", I get a linker warning, but everything works (the sections are merged, and the resulting code appears to work). have no idea why.

execve(2) support

Implementing basic execve(2) support would allow simpler init process management than specifying the command line in build.mk, and make it easier to replace the java.so init process with other, custom init binaries. Is this planned?

"osv pagetable walk" doesn't work

(gdb) osv pagetable walk
Python Exception <class 'gdb.error'> Argument required (expression to compute).: 
Error occurred in Python command: Argument required (expression to compute).

Improve xen tracing infrastructure

We have no tracepoints in the xen code that came from BSD. Some of it is very important, like irq entries and buffer allocation. Semantically speaking, a subset of those traces are common to virtio. Where possible, we should aim at having tracepoints with the same name, that users can call regardless of the underlying hypervisor.

dlopen() and program::add_object() needs reference counting

According to Linux's dlopen(3) manual page, "If the same library is loaded again with dlopen(), the same file handle is returned. The dl library maintains reference counts for library handles, so a dynamic library is not deallocated until dlclose() has been called on it as many times as dlopen() has succeeded on it."

Our dlopen() uses our program::add_object(), and neither have this necessary reference counting on the handles they return.

I am writing now a convenience function osv::run() to run a shared-object's main (similar to elf-loader.cc), and this reference counting is important because without it, we cannot run the same object more than once concurrent with itself - without the reference count, the first run to finish will unload the object, and the still-running instances will crash.

NOTE: This reference count is completely unrelated to the issue of concurrent use of dlopen()/dlclose()/dl_iterate_phdr(), which is also something we need to fix (I have a patch for that, but it's too long to fit in the margin).

Add some spinning to mutex_lock()

When very short sections of code (e.g., set a couple of variables) are protected by mutex_lock()/mutex_unlock(), when the lock is busy it is often more efficient, instead of doing a context switch, just try locking again - the other thread will often finish already.

Some implementation ideas:

lockfree::mutex::lock() can, after increasing count, spin for a while while count is still > 1, and if during this short spin every time we get to count == 1, we try to take the lock (by taking the "handoff" left by the unlock(), which knows we're trying to get the lock because we increased count). And don't forget the "pause" instruction during that spin :-)
Probably a better alternative to the above is, instead of checking count==1 as above, check the handoff directly (see example in try_lock()) - without considering count. The reason is that two threads spinning in lock() will leave count==2 and neither of them will ever see count==1, but a handoff will occur, because unlock() sees count=2 and nobody on the queue.
A third alternative is not to use the handoff mechanism at all. I.e., like try_lock()'s code, don't increment count and only loop on a CAS trying to grab the lock when nobody has it. I'm not sure if this alternative is better or worse than the previous one. The loop is slower, but we don't force unlock() to use the handoff mechanism. This alternative also changes the way that lock() works in the uncontended case - previously it did fetch_add, and now we start it with compare_exchange (I don't know which is more efficient).
Don't forget NOT to spin on single CPU machines - it can't help :-)

An open question is for how long to spin before resorting to sleeping? Does the amount of spinning need to depend on number of processors, the lock's previous history, other heuristics, or be explicitly configured at compile-time by the programmer for each use? Some thoughts on this issue which were raised in a long email discussion we had:

My original thinking was that the programmer knows when critical sections are short, so he can use a different method, say lock_expect_quick(nspins) or lock_expect_quick(nanoseconds) in that case. The time of spin should be determined by the critical section's length, by the time it takes to do a context switch, or both (?).
Avi suggested to the above in a more OO manner, adding more types that keep mutex's original lock()/unlock() method names so that lock_guard<> and friends will continue to work. E.g., spinning_mutex. Both types of mutexes can share code using a template, e.g.,

typedef basic_mutex<contention_strategy_spin<50>> spinning_mutex;
typedef basic_mutex<contention_strategy_wait> io_mutex;
typedef basic_mutex<contention_strategy_adaptive> mutex;

Glauber suggested to look at the queue's length to see if other threads are waiting already, and if so, don't spin as we don't have much chance (assuming the spin length we chose is the amount of time it takes for one critical section to finish - not more than one). Getting the queue length is not easy, but we do have the "count" we can use (which, however, includes in addition to the queue length also the number of concurrent lock()s which didn't yet manage to put themselves on the queue), and we also know if the queue is empty.
Dor and Guy thought that spinning should be adaptive, i.e., consider how much time we spun in previous lock attempts and whether they were successful. E.g., if 100 spins were usually enough, don't spin more than that even if our maximum spin time (perhaps related to the context switch cost) is 1000 spins (i.e. previous experience showed that if after 100 spins we didn't get the lock, it won't help to wait for 1000 spins).
Dor and Guy thought that the spin time should be related to the number of vcpus - no spinning on one cpu (of course), spin for time x on 2 cpus, spin for x_1.5 or x_log2(4) on 4 vcpus, etc. Glauber and I thought the spin time should not depend on the number of CPUs.

Optimize atomic operations on UP (single-VCPU)

The run time of mutex_lock() and mutex_unlock() is dominated by a single instruction, "lock xadd", which is generated by std::atomic::fetch_add().

On a single VCPU, the "lock" prefix isn't needed. Because the host is SMP it cannot ignore this prefix, but when the guest has a single VCPU, we know this prefix is not necessary. If we drop the "lock" prefix and use the ordinary increment instruction, the mutex becomes much faster - an uncontended lock/unlock pair drops from 22ns to just 9ns. When mutexes are heavily used (e.g., in memcached they take as much as 20% of the run time), this can bring a noticable improvement.

What we should do is to remember where in the code we have the "lock" prefix (the single byte 0xf0), and when booting on a single vcpu, replace them by "nop" (0x90). Linux also has such a mechanism (see asm/alternative.h) - "LOCK_PREFIX" generates the "lock" instruction but also saves in a ".smp_locks" section the address of this lock, and any time the number of cpus grows beyond 1 or shrinks to 1, the code iterates over these locates and changes them to 0x90 or 0xf0.

Doing the above is easy if we implemented our own "fetch_add" and "compare_exchange" operation. However, currently we use C++11's std::atomic and it will be a shame to lose its advantages (like working on any processors, not just x86). Perhaps there's a solution, though: uses the GCC builtins __atomic_fetch_add and friends (see http://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html). So if we re-implement those, it can be enough. I tried to redefine this function and got some strange compilation errors, but maybe by re-"#define"-ing it before including , or some other ugly trick, we can force our own implementation.

A different approach we can consider (though it will probably be more complex) is to remove the lock prefix from all code in a certain function or section. This will be hard and risky, though - we need to understand where instructions begin and end, and what is code and what is not code. It will be safer if we can limit this transformation to single functions (such as lockfree_mutex_lock()) which are known not to be problematic in this regard.

Improve xen network performance

Early tests showed us far behind Linux. With the new xen interrupt handler we are ahead, but there is no theoretical reason why we can't beat Linux the same way KVM does. We need to investigate the performance bottlenecks and fix them.

join() hang

In one run of tst-mutex, the test hung at the end when joining the 20 test threads.

Looking at "osv info threads", we see
36 (0xffffc0003383e000) cpu1 terminating ?? at arch/x64/entry.S:101 vruntime 1372016020739989192
37 (0xffffc00033829000) cpu1 terminated ?? at arch/x64/entry.S:101 vruntime 1372016020739953420
38 (0xffffc00033813000) cpu1 terminated ?? at arch/x64/entry.S:101 vruntime 1372016022969736399
39 (0xffffc000337fe000) cpu1 terminated ?? at arch/x64/entry.S:101 vruntime 1372016024256845711
40 (0xffffc000337e9000) cpu1 terminated ?? at arch/x64/entry.S:101 vruntime 1372016023734879695
41 (0xffffc000337d3000) cpu1 terminated ?? at arch/x64/entry.S:101 vruntime 1372016024605295666
42 (0xffffc000337be000) cpu2 terminated ?? at arch/x64/entry.S:101 vruntime 1372016023144445721
43 (0xffffc000337a9000) cpu1 terminated ?? at arch/x64/entry.S:101 vruntime 1372016024014862832
44 (0xffffc00033793000) cpu2 terminated ?? at arch/x64/entry.S:101 vruntime 1372016024783200136
45 (0xffffc0003377e000) cpu2 terminated ?? at arch/x64/entry.S:101 vruntime 1372016023507271733
46 (0xffffc00033769000) cpu2 terminated ?? at arch/x64/entry.S:101 vruntime 1372016024155354410
47 (0xffffc00033753000) cpu0 terminated ?? at arch/x64/entry.S:101 vruntime 1372016025922601336
48 (0xffffc0003373e000) cpu2 terminated ?? at arch/x64/entry.S:101 vruntime 1372016022280810107
49 (0xffffc00033729000) cpu0 terminated ?? at arch/x64/entry.S:101 vruntime 1372016023182387074
50 (0xffffc00033713000) cpu0 terminated ?? at arch/x64/entry.S:101 vruntime 1372016024034689208
51 (0xffffc000336fe000) cpu0 terminated ?? at arch/x64/entry.S:101 vruntime 1372016025329941850
52 (0xffffc000336e9000) cpu1 terminated ?? at arch/x64/entry.S:101 vruntime 1372016023366949297
53 (0xffffc000336d3000) cpu0 terminated ?? at arch/x64/entry.S:101 vruntime 1372016023013167672
54 (0xffffc000336be000) cpu1 terminated ?? at arch/x64/entry.S:101 vruntime 1372016024129763748

So it seems that 1 thread was successfully joined, but 19 weren't: The problem is that the 2nd thread remained in "terminating" state, so join() for it never finished; The other 18 successfully made it to the terminated state, but the test only joins them after joining the second thread, so it's not surprising that they don't get deleted and rather remain in "terminated" state.

The problem is why thread 36 remained in "terminating" state.
A bit more analysis:

The only place in the code we set "terminating" state is in thread::complete(), which then also sets _cpu->terminating_thread = this. We cannot run one and not the other because of the preempt_disable().
[TODO: assert that _cpu->terminating_thread == nullptr before; but how can it be not null?]

So we set _cpu->terminating_thread != null, but at the time of the hang, in thread 36, _cpu->terminating_thread==nullptr.
If _cpu hasn't changed (can it? I don't think we migrate non-queued threads), this means that we got to the only place in the code that resets terminating_thread it back to zero:
if (p->_cpu->terminating_thread) {
p->_cpu->terminating_thread->unref();
p->_cpu->terminating_thread = nullptr;
}
Which means unref() must have run, but done nothing. Printing the threads ref_count we indeed see 1 which means that either unref() wasn't run in the above code (how can that be possible?) or something else is keeping the thread's reference count up - but that shouldn't happen as all the other threads participating in our mutex wakeups are all done (terminated).

I need to tracepoints to understand what's happening, if I can reproduce this bug.

make fails

Hello
In my first attempt to build the osv on a Ubuntu 13.04, I get:

ANT tests/bench
Traceback (most recent call last):
File "scripts/silentant.py", line 14, in
stderr = subprocess.PIPE)
File "/usr/lib/python2.7/subprocess.py", line 711, in init
errread, errwrite)
File "/usr/lib/python2.7/subprocess.py", line 1308, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
make: *** [all] Error 1

add getprotobyname from musl

Needed by netperf, should't be hard

Solve the thread completion/destruction problem without cost on every context switch

In commit 96ee424 I solved a long-standing bug we had, where a terminating thread, in complete(), would wake the thread join()ing it (_joiner), but in rare cases, this joiner thread would delete the completing thread's stack quickly, before complete() returned, causing a crash.

My solution in this commit included an extra variable we need to check after every context switch - to see if the thread we just left terminated, and we need to wake its joiner (on the new thread's stack). This new if() on every context switch - even when no thread ever terminates - is a waste - not a big one, but still a waste. It would be nice if we had a solution without such an extra if().

In a long email brainstorming with Guy we came up with two alternative solutions. Both are more complex and require more coding, but we should consider one of them in the future:

Solution 1 - migration (my favorite, but Avi and Guy are not enthusiastic about it)

The idea behind this solution is that join() can only run in parallel to complete() if they are running on two different CPUs. If we know that both are on the same CPU, and we disable preemption, then join() will not run (and destroy the terminating thread) until the next schedule() switches the threads. To get them both on the same CPU, all we need to do is to migrate the current thread to the joiner's CPU.

So with this solution, complete() will call unref() directly (no need to set a flag instead), and unref will look like this:

void thread::unref()
{
    if (_ref_counter.fetch_add(-1) == 1) {
        if (_joiner) {
            if (thread::current() == this) {
                // Waking the _joiner thread might cause it to quickly delete
                // this thread, including the stack we're running on, so we
                // need to make sure that _joiner can't run while doing this.
                // We ensure this by migrating to the same CPU as _joiner, and
                // then preempt_disable is enough to ensure _joiner is not
                // running (note we assume here migration of a thread out of
                // this CPU only happens in load_balancer() and this doesn't
                // run when preemption is disabled).
                preempt_disable(); // already disabled in complete(), but doesn't hurt
                while (_joiner->tcpu() != tcpu()) {
                    migrate(_joiner->tcpu());
                }
                _joiner->wake_with([&] { _status.store(status::terminated); });
                // at this point, _joiner cannot have run yet (it's on the
                // same CPU as us and preemption is disabled), so our thread
                // and its stack still exists.
                schedule();
            }

            _joiner->wake_with([&] { _status.store(status::terminated); });
        } else {
            _status.store(status::terminated);
        }
    }
}

The remaining challenge is, of course, to implement migrate(). We can't trivially migrate the running thread from itself, because we'll have the potential migrate() running on the original CPU in parallel with itself on the target CPU. Instead, we need to add to load_balancer() support to send the load_balancer on the current CPU the command to move this thread to a different CPU. The current CPU can then yield() hoping load_balancer gets to run, but perhaps it will be more efficient to set a new status ("migrating") so that the thread being migrated doesn't run again until actually migrated ("waiting" may not good enough because a spurious wake can cause it to stop waiting).

While adding migrate() is a complication, I think it will be a generally somewhat-useful feature. Avi also needed it once, to allow a thread to pin itself to a specific CPU.

Solution 2 - temporary stack (Guy's favorite)

This solution also has complete() calling unref() directly, but when unref wants to set the thread's status and wake the joiner, it can't do it on the thread's stack, as we know, so the idea of this solution is to switch to a temporary stack. Each CPU will have a small stack (even one page may be enough) used just in unref() when it decides to delete the thread, it switches to this stack (via an asm() instruction).

There's a complication here, that it's not enough to call wake() on the new stack. wake() also checks the current thread's state, checks preempt_counter (a thread-local variable) and even needs the current thread just to get the current CPU. All of this cannot work when the thread structure, its stack and its TLS, are all deleted in parallel with this code.
So we need a new version of wake() which can take all this information (current CPU, etc.) as parameters. unref() will need to save these things in registers (and force the compiler to leave them in registers), change the stack, and then call wake with these parameters.

Cannot interrupt crash commands from the main console

When crash command is started from the main terminal (as opposed to the ssh connection) it cannot be interrupted.

Sample command:
[%] thread top

This is because OSv's console driver does not send SIGINT on which the crash shell depends for interrupting its commands (it installs a signal handler). The ssh connection is using different mechanism so interrupting works there.

Possible solutions:

Send SIGINT to the process. This would bring down the OS when there is no CLI. This is actually what some people suggested as expected behavior in #49 .
Change the crash shell so that it reacts on Ctrl+C

I'm leaning towards option 1, although it may have bigger consequences than option 2 Opinions?

libm is missing the musl copyright notice

According to the description of commit eda9e0a, the files come from musl. The musl source includes a copyright notice that is not included in your libm!

Make Java InetAddress.getHostName() work

In Shrew.java, I used

    connection.getInetAddress().getHostName()

(commented out in the current code)
And it just waits for 5 seconds and times out, returning the IP address string.

Rather, it needs to either fail (returning the IP address) immediately, or contact a real DNS server. What does it actually do in those 5 seconds - contact a DNS server? which? How do we configure it? And do we actually have the proper resolver code in our distribution?

mmap and filesystem atime, mtime, and ctime

Investigate OSv mmap wrt. filesystem atime, mtime, and ctime.

In particular, look into the issue pointed out by Dor:

http://lwn.net/Articles/564120/

Improved poll and efficient epoll

Right now poll() has a rather unusual design favoring the performance of immediately-returning poll (we poll_scan() before trying to install the poll request on the descriptors). We also do not efficiently support epoll, and epoll_wait() just calls poll() which is terrible for performance.

Instead of the current poll.c code, let's design our own osv::poll object, inspired by epoll. It will have an add() method to install itself on one file structure, remove() to uninstall, and wait() to wait on any of these files (or return immediately). We will wait using a "waker" (our familar wait_until+threadid idiom) - I believe we don't need condvar (we only ever have one waiter in a poll request) and certainly not msleep, and the "waker" idiom also takes care of wake-before-wait races without needing another variable or a mutex to protect it.

osv::poll will also have a static function osv::poll::wake(file *fp) to wake any polls sleeping on this fp (and also set their revents and add them to a list of ready files). The "extern C" function poll_wake() will just call osv::poll::wake.

The epoll code will wrap osv::poll. The poll() code will temporarily create an osv::poll (just like we have a poll_request in the current code), add() to it the given fds, call wait(), and finally prepare the poll results.

Investigate EC2 network disconnects

Dor reported that connection at amazon is suddenly dropped. If one tries to reconnect later it works again. I see the same behavior with an Ubuntu guest, but the time-to-disconnect there is much larger, which indicates a timeout related problem, which does not seem to be consistent with what Dor is seeing (disconnects are very quick to appear). Need to investigate that.

select() emulation doesn't handle POLLHUP/POLLERR/POLLPRI correctly

POLLHUP should set the file descriptor's bit in rfds, POLLERR should set it in wfds and rfds (but not xfds), POLLPRI should set it in xfds. As to POLLRDHUP, I guess it should set wfds since writes would always fail with EPIPE and thus not block... but Linux just ignores in its select() code which is a wrapper for poll() like OSv's.

Seriously... What an awesome project, and the C++ code is the nicest I've ever seen! Keep it up!

Filesystem hard links

OSv does not support hard links that are needed by Apache Cassandra for snapshotting, for example.

Christoph Hellwig already did vnode and dentry separation that's in master and the remaining TODO is:

wire up the vnode hash again, this time with a uin64_t as index
use the znode id as index for zfs, a counter for ramfs/devfs
write the actual link syscall and vnop (should be trivial)
lots of testing

virtio tx coalescing

Instead of kick the hypervisor for every tx invocation (can be more than one packet, depending on the tcp stack), ask the scheduler for a range timer between 0 - 200nsec.
If there is an armed timer in that range, the thread doing the kick will get invoked.
If there is no such armed timer, it will be invoked immediately since the timer arm is too expensive in virt.

This is low priority since at the moment there aren't much tx kicks and the stack structure will change as Van Jacobson design gets implemented.

Integrate VisualVM support for JVM

Out of the box support for it:
http://visualvm.java.net/