GithubHelp home page GithubHelp logo

Comments (6)

jasonwhite avatar jasonwhite commented on May 5, 2024 1

Thanks for submitting this issue! The bug in the issue title should be fixed as of b6147fc, but I don't actually see this bug as the culprit in your log output. I am able to reproduce the Go hello-world failure, but I get a different error:

fatal error: sigaction failed

We don't have much Go code internally, so we neglected to test this. The Go runtime seems to be calling sigaction early on start up and we're not handling it properly.

We're looking into it :)

from hermit.

rrnewton avatar rrnewton commented on May 5, 2024 1

Ok, actually that was a one line fix, when D41615308 (d8f506394736b3b200785874be672d7fae8861fa) lands and gets synced, hello world will work at least:

$ hermit run  hello_world_go_main
Hello world!

There may be other lurking issues with go. Other languages we've worked with (Rust, C++), use libc, and Go manages its own interactions with the kernel via raw syscalls, so it may do some things differently than libc.

Also, @MarcoPolo, one thing that's kind of an open issue that we're curious about is how hermit run --chaos will work with language level threads like go routines. I don't know if (1) there's a runtime mode that forces every go routine to get its own OS thread (that would be ideal), or (2) if there's enough random work stealing that controlling the runtime RNG is enough to get good coverage of possible schedules.

from hermit.

MarcoPolo avatar MarcoPolo commented on May 5, 2024

Ah a red herring! thanks! Looking forward to playing with hermit

from hermit.

rrnewton avatar rrnewton commented on May 5, 2024

Here's an strace -Cf of go hello world:

strace: Process 2026443 attached
strace: Process 2026444 attached
strace: Process 2026445 attached
strace: Process 2026446 attached
strace: Process 2026447 attached
Hello world!
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ------------------
 27.90    0.001892         157        12         1 futex
 24.45    0.001658          97        17           nanosleep
  9.17    0.000622           5       114           rt_sigaction
  7.37    0.000500          15        33           rt_sigprocmask
  6.47    0.000439          12        36           mmap
  6.31    0.000428          85         5           clone3
  5.03    0.000341          34        10           munmap
  3.52    0.000239          13        18           mprotect
  2.99    0.000203         203         1           execve
  1.27    0.000086           7        12           sigaltstack
  0.97    0.000066          11         6           set_robust_list
  0.93    0.000063           5        12        10 openat
  0.58    0.000039           6         6           gettid
  0.56    0.000038          38         1           tgkill
  0.55    0.000037          37         1           rt_sigreturn
  0.37    0.000025           8         3           brk
  0.29    0.000020           6         3           fcntl
  0.29    0.000020           2        10         9 newfstatat
  0.16    0.000011           5         2           read
  0.16    0.000011           5         2           close
  0.15    0.000010          10         1           getrandom
  0.13    0.000009           9         1         1 access
  0.12    0.000008           8         1           sched_getaffinity
  0.07    0.000005           5         1           getpid
  0.07    0.000005           5         1           prlimit64
  0.04    0.000003           3         1           arch_prctl
  0.04    0.000003           3         1           set_tid_address
  0.00    0.000000           0         1           write
  0.00    0.000000           0         4           pread64
------ ----------- ----------- --------- --------- ------------------
100.00    0.006781          21       316        21 total

It doesn't seem all that bad, and we are already (attempting to) handle rt_sigaction, but it must be tickling it in a way we don't properly support. In particular it makes a LOT of sigaction calls, seeming to want to cover its bases on all potential signals.

Some of these are quick fixes -- it would require knocking down the first couple issues to get a better idea. At some point, I'd like to post a guide for "how to debug/fix compatibility issues" in case any one else wants to give it a try before we get to it.

from hermit.

rrnewton avatar rrnewton commented on May 5, 2024

In particular, about 30 rt_sigaction calls in, we are returning an EINVAL which it doesn't like:

$ hermit --log=info run  hello_world_go_main
...
2022-11-30T14:25:21.255027Z  INFO detcore: DETLOG [syscall][detcore, dtid 3] finish syscall #109: rt_sigaction(16, NULL, 0x7fffffffd110, 8) = Err(Errno(EINVAL))
2022-11-30T14:25:21.255268Z  INFO detcore: DETLOG [syscall][detcore, dtid 3] inbound syscall: rt_sigaction(16, NULL, 0x7fffffffd408, 8) = ?
2022-11-30T14:25:21.255295Z  INFO detcore: DETLOG [syscall][detcore, dtid 3] finish syscall #110: rt_sigaction(16, NULL, 0x7fffffffd408, 8) = Err(Errno(EINVAL))

Those same calls under strace:

rt_sigaction(SIGSTKFLT, NULL, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
rt_sigaction(SIGSTKFLT, {sa_handler=0x2ed0e0, sa_mask=~[RTMIN RT_1], sa_flags=SA_RESTORER|SA_ONSTACK|SA_RESTART|SA_SIGINFO, sa_restorer=0x7f100ca445a0}, NULL, 8) = 0

This is "Stack fault on coprocessor (unused)". We took the liberty of using this (unused) signal for our own internal purposes. It seems that Go is paranoid and wants to register signal handlers for every possible signal, even unused ones.

I think it's pretty safe for us to just turn this into a "noop" but then just never deliver any signals of that kind to the guest. An easy fix.

from hermit.

MarcoPolo avatar MarcoPolo commented on May 5, 2024

Also, @MarcoPolo, one thing that's kind of an open issue that we're curious about is how hermit run --chaos will work with language level threads like go routines. I don't know if (1) there's a runtime mode that forces every go routine to get its own OS thread (that would be ideal), or (2) if there's enough random work stealing that controlling the runtime RNG is enough to get good coverage of possible schedules.

I think with Go at least there may be enough random work stealing that you could explore some of the possible schedules. A very quick experiment running:

package main

import (
        "fmt"
        "sync"
        "time"
)

func execute(id int) {
        fmt.Printf("start id: %d\n", id)
        time.Sleep(time.Second)
        fmt.Printf("end id: %d\n", id)
}

func main() {
        fmt.Println("Started")
        wg := sync.WaitGroup{}
        for i := 0; i < 16; i++ {
                wg.Add(1)
                go func(id int) {
                        execute(id)
                        wg.Done()
                }(i)
        }
        wg.Wait()
        fmt.Println("Finished")
}

with hermit run -e="GOMAXPROCS=2" --chaos --seed=<seed> main | shasum (I made a quick patch to Go to work around the original issue) shows that we do indeed get different executions for different values of the seed. There's some work stealing happening in the go scheduler that could explain this. And I bet you'd get more coverage if you had more than 256 goroutines per thread, since then they get queued up to a global queue.

from hermit.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.