GithubHelp home page GithubHelp logo

wingo / fibers Goto Github PK

View Code? Open in Web Editor NEW
291.0 291.0 32.0 484 KB

Concurrent ML-like concurrency for Guile

License: GNU Lesser General Public License v3.0

Makefile 1.85% M4 22.25% Shell 0.63% C 6.69% Scheme 68.58%

fibers's People

Contributors

aconchillo avatar amirouche avatar attila-lendvai avatar civodul avatar codemac avatar craigmaloney avatar cwebber avatar d4ryus avatar eduvcjy0ue0cy0zo3bs7xyo3b avatar emixa-d avatar habush avatar hugonikanor avatar levenson avatar mdjurfeldt avatar reepca avatar vyp avatar wingo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fibers's Issues

How to run a fiber without waiting for its completion?

Is there any way to run a fiber (it running its thunk), without resorting to something like call-with-new-thread? Perhaps something internal to the library?

What I want to achieve is to run a fiber without the overhead of creating a new thread, trusting, that the fiber will report a result itself later on, for example using an initially given channel. Having to use call-with-new-thread or similar defeats the purpose of lightweight fibers, when the number of fibers is not known at compile time and fibers are dynamically spawned, depending on data, that the program gets as input.

I want to be able to say: "OK, new work to do came in, just start a fiber and let it report the result of the work later, when it finished. Lets go on with other stuff we need to take care of." However, run-fibers returns, when the fibers in its thunk finish, so that is not a way to go. Also using multiple schedulers does not work, as they would not know about each other and their #:parallelism limits are independent, resulting in more parallelism than wanted.

I guess you could say, that I am looking for asynchronously starting fibers and getting their results.

Could we move guile to github?

Hii @wingo

I see that you are pretty passionate about Guile, and I feel that this is a truly awesome language. But, recently I've run into some licensing issues regarding another project I'm interested in remacs

Is it possible to convince the Guile repo to be moved to Gitlab, if not Github. I see that Gnu Emacs is now on Github then perhaps there's still something we can do about GNU Guile being more open to public contribution.

P.S. Feel free to close the issue as not really related :)

Memory leak when choice-operation'ing on multiple channels

Not sure why this is, but give it a try:

(define (send-a-lot ntimes)
  (run-fibers
   (lambda ()
     (define da-channel (make-channel))
     (define da-channel2 (make-channel))
     (define done (make-condition))
     ;; receiver
     (spawn-fiber
      (lambda ()
        (while #t
          (perform-operation
           (choice-operation
            (get-operation da-channel)
            ;; testing for memory growth
            (get-operation da-channel2))))))
     ;; sender
     (spawn-fiber
      (lambda ()
        (let lp ((n 0))
          (unless (= n ntimes)
            (put-message da-channel n)
            (lp (1+ n))))
        (signal-condition! done)))
     (wait done))))

;;; now run it
(send-a-lot 500000)

If you watch in htop or etc, you'll see memory start to grow.

Here's the funny thing: now comment out the second `get-operation'. The memory growth goes away!

Why is this? I dunno, but it's the source of a "leak" I was experiencing in 8sync-on-fibers... took a while to track down.

run-fibers with #:drain #t does not wait for dependent threads

When trying to emulate a threadpool for blocking operations (cpu intensive or embedded database like wiredtiger) I hit a bug with #drain #t where run-fibers returns before the thread created with call-with-new-thread has the time the finish its block call in suspend of operations.scm:

(define (suspend)
;; Two cases. If there is a current fiber, then we suspend the
;; current fiber and arrange to restart it when the operation
;; succeeds. Otherwise we block the current thread until the
;; operation succeeds, to allow for communication between fibers
;; and foreign threads.
(if (current-scheduler)
((suspend-current-task
(lambda (sched k)
(define (resume thunk)
(schedule-task sched (lambda () (k thunk))))
(block sched resume))))
(let ((k #f)
(thread (current-thread))
(mutex (make-mutex))
(condvar (make-condition-variable)))
(define (resume thunk)
(cond
((eq? (current-thread) thread)
(set! k thunk))
(else
(call-with-blocked-asyncs
(lambda ()
(lock-mutex mutex)
(set! k thunk)
(signal-condition-variable condvar)
(unlock-mutex mutex))))))
(lock-mutex mutex)
(block #f resume)
(let lp ()
(cond
(k
(unlock-mutex mutex)
(k))
(else
(wait-condition-variable condvar mutex)
(lp)))))))

Since in the thread, there is no reference to the "parent" scheduler, it is not possible to notify that a thread is waiting/blocking for a operation rendez-vous.

When creating the thread in the fiber, the program hits the issue #21.

Here is a test program:

(import (fibers))
(import (fibers channels))

(import (ice-9 threads))


(define mutex (make-mutex))
(define channel (make-channel))

(let loop ((index (- (current-processor-count) 7)))
  (unless (zero? index)
    (call-with-new-thread
     (lambda ()
       (let continue ((message (get-message channel)))
         (let ((thunk (car message))
               (return (cdr message)))
           (let ((out (thunk)))
             (put-message return out)))
         (continue (get-message channel))))
     pk)
    (loop (- index 1))))

(define (fib n)
  (cond
    ((= n 0) 0)
    ((= n 1) 1)
    (else (+ (fib (- n 1)) (fib (- n 2))))))

(define (exec thunk)
  (let ((return (make-channel)))
    (put-message channel (cons thunk return))
    (pk 'getting)
    (get-message return)))

(define (compute)
  (pk 'out (exec (lambda () (fib (expt 2 5))))))

(define (main)
  (let loop ((index 1))
    (unless (zero? index)
      (spawn-fiber (lambda () (compute)))
      (loop (- index 1))))
  (pk 'main-end)
  #;(sleep 10))


(run-fibers main #:parallelism 1 #:drain? #t #:hz 0)
(pk 'program-end)

uncomment (sleep 10) to have see the program complete.

debugging a race in the web server

I was running some benchmark on fiber's web server and found that sometimes channels spin infinitly.

With #:parallel? to #t and 2 concurent requests, it finish 10-15 requests and then stop responding.
I can see one thread spinning infinitly on scm_atomic_compare_and_swap_scm.

WhenI set #:parallel? to #f or with more then 2 concurent requests it finish all the requests
except one or two and spin infinitly.

Could it be that for some reason all put-operation are finished and there is one get-operation left or vice-versa? How would you debug that?

Here's the GDB backtrace:

btw, GDB's guile extension is not working with guile 2.1.5 because of the changes in libguile/ports.c.
But you probably knew that already ;-)

#0  scm_atomic_compare_and_swap_scm (loc=0x5555557b95d8, expected=0x7ffff18faba0, desired=0x304) at ../libguile/atomics-internal.h:74
#1  0x00007ffff7aab337 in scm_i_async_pop (t=0x5555557b9540) at async.c:125
#2  0x00007ffff7b60d0d in vm_regular_engine (thread=0x5555557b9540, vp=0x555555c6dbd0, registers=0x7ffff18fba30, resume=1) at vm-engine.c:3887
#3  0x00007ffff7b7258a in scm_call_n (proc=0x555555c98ca0, argv=0x0, nargs=0) at vm.c:1250
#4  0x00007ffff7abed02 in scm_call_0 (proc=0x555555c98ca0) at eval.c:475
#5  0x00007ffff7aab920 in scm_call_with_unblocked_asyncs (proc=0x555555c98ca0) at async.c:400
#6  0x00007ffff7ad767c in scm_apply_subr (sp=0x7ffff10fdf70, nslots=2) at gsubr.c:305
#7  0x00007ffff7b56041 in vm_regular_engine (thread=0x5555557b9540, vp=0x555555c6dbd0, registers=0x7ffff18fca00, resume=0) at vm-engine.c:778
#8  0x00007ffff7b7258a in scm_call_n (proc=0x555555c82e40, argv=0x0, nargs=0) at vm.c:1250
#9  0x00007ffff7abed02 in scm_call_0 (proc=0x555555c82e40) at eval.c:475
#10 0x00007ffff7b4eeda in really_launch (d=0x555555c94b60) at threads.c:783
#11 0x00007ffff7ab809c in c_body (d=0x7ffff18fde00) at continuations.c:425
#12 0x00007ffff7b514a8 in apply_catch_closure (clo=0x555555c98dc0, args=0x304) at throw.c:307
#13 0x00007ffff7b285f4 in apply_1 (smob=0x555555c98dc0, a=0x304) at smob.c:141
#14 0x00007ffff7ad76a4 in scm_apply_subr (sp=0x7ffff10fdfc0, nslots=3) at gsubr.c:307
#15 0x00007ffff7b56041 in vm_regular_engine (thread=0x5555557b9540, vp=0x555555c6dbd0, registers=0x7ffff18fda90, resume=0) at vm-engine.c:778
#16 0x00007ffff7b7258a in scm_call_n (proc=0x555555c98dc0, argv=0x0, nargs=0) at vm.c:1250
#17 0x00007ffff7abed02 in scm_call_0 (proc=0x555555c98dc0) at eval.c:475
#18 0x00007ffff7b50f08 in catch (tag=0x404, thunk=0x555555c98dc0, handler=0x555555c98da0, pre_unwind_handler=0x555555c98d80) at throw.c:138
#19 0x00007ffff7b51352 in scm_catch_with_pre_unwind_handler (key=0x404, thunk=0x555555c98dc0, handler=0x555555c98da0, pre_unwind_handler=0x555555c98d80) at throw.c:252
#20 0x00007ffff7b51580 in scm_c_catch (tag=0x404, body=0x7ffff7ab8074 , body_data=0x7ffff18fde00, handler=0x7ffff7ab80ae , handler_data=0x7ffff18fde00, 
    pre_unwind_handler=0x7ffff7ab8108 , pre_unwind_handler_data=0x55555583f040) at throw.c:375
#21 0x00007ffff7ab7eeb in scm_i_with_continuation_barrier (body=0x7ffff7ab8074 , body_data=0x7ffff18fde00, handler=0x7ffff7ab80ae , handler_data=0x7ffff18fde00, 
    pre_unwind_handler=0x7ffff7ab8108 , pre_unwind_handler_data=0x55555583f040) at continuations.c:363
#22 0x00007ffff7ab81b4 in scm_c_with_continuation_barrier (func=0x7ffff7b4ee70 , data=0x555555c94b60) at continuations.c:459
#23 0x00007ffff7b4eb81 in with_guile (base=0x7ffff18fde70, data=0x7ffff18fdec0) at threads.c:650
#24 0x00007ffff7233952 in GC_call_with_stack_base () from /usr/lib/x86_64-linux-gnu/libgc.so.1
#25 0x00007ffff7b4ec6a in scm_i_with_guile (func=0x7ffff7b4ee70 , data=0x555555c94b60, dynamic_state=0x555555c77470) at threads.c:693
#26 0x00007ffff7b4ef26 in launch_thread (d=0x555555c94b60) at threads.c:792
#27 0x00007ffff78576ca in start_thread (arg=0x7ffff18fe700) at pthread_create.c:333
#28 0x00007ffff75910af in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105

where (not sure how useful this is...)

(gdb) print expected
$2 = (SCM *) 0x7ffff18faba0
(gdb) x 0x7ffff18faba0
0x7ffff18faba0:	0x55c0e8c0

wiki spellfix

At 2.5 Conditions:
Change: (use-modules (fibers contitions)) <<-- conDitions

autoreconf failed on macOS Mojave

autoreconf -vif
autoreconf: Entering directory `.'
autoreconf: configure.ac: not using Gettext
autoreconf: running: aclocal --force -I m4
autoreconf: configure.ac: tracing
autoreconf: running: glibtoolize --copy --force
glibtoolize: putting auxiliary files in AC_CONFIG_AUX_DIR, 'build-aux'.
glibtoolize: copying file 'build-aux/ltmain.sh'
glibtoolize: putting macros in AC_CONFIG_MACRO_DIRS, 'm4'.
glibtoolize: copying file 'm4/libtool.m4'
glibtoolize: copying file 'm4/ltoptions.m4'
glibtoolize: copying file 'm4/ltsugar.m4'
glibtoolize: copying file 'm4/ltversion.m4'
glibtoolize: copying file 'm4/lt~obsolete.m4'
autoreconf: running: /usr/local/Cellar/autoconf/2.69/bin/autoconf --force
configure:13082: error: possibly undefined macro: AC_LIB_LINKFLAGS_FROM_LIBS
      If this token and others are legitimate, please use m4_pattern_allow.
      See the Autoconf documentation.
autoreconf: /usr/local/Cellar/autoconf/2.69/bin/autoconf failed with exit status: 1

OS version: macOS Mojave (10.14.1)
Guile version: 2.2.4 (installed by homebrew)

configure.ac has two operating systems hard-coded, posix-clocks-foo.scm related?

Trying to build/test #53 on NetBSD, I got an error:

gmake: *** No rule to make target 'fibers/posix-clocks-.scm', needed by 'fibers/posix-clocks.scm'.  Stop.

because of Makefile's

fibers/posix-clocks.scm: Makefile fibers/posix-clocks-$(PLATFORM).scm
        ln -sf $(abs_top_srcdir)/fibers/posix-clocks-$(PLATFORM).scm $(abs_top_builddir)/fibers/posix-clocks.scm

I then found in configure.ac:

# Detect the target system                                                                                                                                                    
case "$host_os" in
    linux*)
        build_linux=yes
        PLATFORM=linux
        ;;
    darwin*)
        build_darwin=yes
        PLATFORM=darwin
        ;;
esac

I don't understand why this exists; I'm guessing that despite posix-clocks-foo starting with posix, it might use beyond-posix features? I would hope that we can end up with a build that works on pretty much any modern posixy system without per-platform code, but it looks like this is ffi glue code and needs to know types (which are left unspecified by posix).

mistake in spawn-fiber documentation

I found a mistake in the documentation of the spawn-fiber procedure:

The fiber will be added to the current scheduler, which is usually what you want. Itโ€™s also possible to spawn the fiber on a specific scheduler, which is useful to ensure that the fiber runs on a different kernel thread. In that case, pass the #:scheduler keyword argument.

scheduler is an optional argument, not a keyword argument. If used as keyword argument, one gets the message:

warning: possibly wrong number of arguments to `spawn-fiber'

while one does not get any error when using it as an optional argument. This is also reflected in the heading for the spawn-fibers documentation:

Function: spawn-fiber thunk [scheduler=(require-current-scheduler)] [#:parallel?=#f]

Cannot print backtraces within a fiber

At Spritely we've been experiencing a strange issue with fibers >= 1.1.0 where we cannot print backtraces for errors that we catch within a fiber. Below is a reproducer program. This program should catch an error and display a backtrace:

(use-modules (fibers)
             (fibers conditions)
             (ice-9 control))

(run-fibers
 (lambda ()
   (spawn-fiber
    (lambda ()
      (call/ec
       (lambda (abort)
         (with-exception-handler
             (lambda (e)
               (setenv "COLUMNS" "72")
               (display-backtrace (make-stack #t)
                                  (current-error-port))
               (abort))
           (lambda ()
             (error "oops")))))))
   (wait (make-condition))))

However, instead it hangs after printing a partial backtrace:

In fibers.scm:
   163:13 13 (run-fibers _ #:hz _ #:scheduler _ #:parallelism _ # _ # โ€ฆ)
     84:3 12 (%run-fibers _ _ _ _)
In fibers/interrupts.scm:
     69:4 11 (with-interrupts/thread-cputime _ _ _ _)
In fibers/scheduler.scm:
   314:26 10

The CPU running the program then goes to 100% usage. It always hangs on the fibers/scheduler.scm stack frame.

Interestingly, reverting commit 84addfb resolves the issue.

Bonus: You might be wondering why (setenv "COLUMNS" "72") is in the code above. It's because otherwise we encounter what seems to be a bug in Guile:

Uncaught exception in task:
(skipping a bunch of frames...)
In system/repl/debug.scm:
    72:40  1 (_)
In ice-9/boot-9.scm:
  1685:16  0 (raise-exception _ #:continuable? _)
ice-9/boot-9.scm:1685:16: In procedure raise-exception:
In procedure string->number: Wrong type argument in position 1 (expecting string): #f

This is a weird one!

Lots of unmerged pull reqs!?

Why are there so many unmerged pull reqs? Is this project unmaintained? Deprecated in favor of something else?

I've got something else that has a dependency on fibers, and since this project has the smell of bit-rot and broken-ness, I'm thinking perhaps the fibers should be ripped out and replaced by something that works? What's the recommendation going forward into the future?

autoreconf -vif fails complaining about possibly undefined macro: AC_LIB_LINKFLAGS_FROM_LIBS

autoreconf: Entering directory `.'
autoreconf: configure.ac: not using Gettext
autoreconf: running: aclocal --force -I m4
aclocal: warning: couldn't open directory 'm4': No such file or directory
autoreconf: configure.ac: tracing
autoreconf: running: libtoolize --copy --force
libtoolize: putting auxiliary files in AC_CONFIG_AUX_DIR, 'build-aux'.
libtoolize: copying file 'build-aux/ltmain.sh'
libtoolize: putting macros in AC_CONFIG_MACRO_DIRS, 'm4'.
libtoolize: copying file 'm4/libtool.m4'
libtoolize: copying file 'm4/ltoptions.m4'
libtoolize: copying file 'm4/ltsugar.m4'
libtoolize: copying file 'm4/ltversion.m4'
libtoolize: copying file 'm4/lt~obsolete.m4'
autoreconf: running: /nix/store/wd3fssfnj4ywzwwcym7fjbxywhbfr8cc-autoconf-2.69/bin/autoconf --force
configure:13082: error: possibly undefined macro: AC_LIB_LINKFLAGS_FROM_LIBS
      If this token and others are legitimate, please use m4_pattern_allow.
      See the Autoconf documentation.
autoreconf: /nix/store/wd3fssfnj4ywzwwcym7fjbxywhbfr8cc-autoconf-2.69/bin/autoconf failed with exit status: 1

So autoconf doesn't have gnulib, which is where AC_LIB_LINKFLAGS_FROM_LIBS is defined. I'm not sure what to do now.

I know that for example other projects have bootstrap scripts that allow setting a --gnulib-srcdir, probably generated from a bootstrap.conf file somehow?

https://github.com/fontforge/fontforge/blob/20d67b0c43db2815d676af23d74317b9fd17dc1f/bootstrap#L4573-L4578

Benchmark summary

Is there a way to summarize the results given by the benchmarks ?

thread-local fluids don't work like they're supposed to

Consider the following code, used with Fibers 1.0.0 as packaged in Guix System:

(use-modules (fibers)
	     (fibers conditions))

(define some-fluid (make-thread-local-fluid 1234567))

(format #t "Main thread initial value: ~A~%" (fluid-ref some-fluid))

(fluid-set! some-fluid 1337)

(join-thread
 (call-with-new-thread
  (lambda ()
    (format #t "Guile thread value: ~A~%" (fluid-ref some-fluid)))))

(run-fibers
 (lambda ()
   (let ((condition (make-condition)))
     (spawn-fiber (lambda ()
		    (format #t "Fiber 1 value: ~A~%" (fluid-ref some-fluid))
		    (fluid-set! some-fluid 42)
		    (signal-condition! condition)))
     (spawn-fiber (lambda ()
		    (wait condition)
		    (format #t "Fiber 2 value: ~A~%" (fluid-ref some-fluid))))))
 #:drain? #t)

(format #t "Main thread final value: ~A~%" (fluid-ref some-fluid))

It produces this output:

Main thread initial value: #f
Guile thread value: #f
Fiber 1 value: 1337
Fiber 2 value: 42
Main thread final value: 42

There are a number of things going wrong here.

  1. The default value of some-fluid, set when it is created, seems to be ignored in favor of #f. This is a bug in guile I've reported as bug 36915.
  2. In the fiber, the value of some-fluid is inherited. This is the opposite of what might be expected to happen based on section 6.22.2 of the Guile manual, but is in line with section 2.1 of the Fibers manual, where it says "The fiber will inherit the fluid-value associations (the dynamic state) in place when 'spawn-fiber' is called". It would be nice to have the ability to use thread-local variables in fibers, though.
  3. When the value of some-fluid is set from within the fiber, the change takes effect outside of that fiber, which is the opposite of what might be expected to happen based on section 2.1 of the Fibers manual. There it says: "Any 'fluid-set!' or parameter set within the fiber will not affect fluid or parameter bindings outside the fiber".

Installation steps not working

I just tried to install the fibers library, but something is not working correctly. After installing I cannot use the module: (use-modules (fibers)). If I try, I get the following error:

scheme@(guile-user)> (use-modules (fibers))
While compiling expression:
no code for module (fibers)

Here are my outputs of the single commands from the installation instructions:

$ ./configure --prefix=/opt/guile
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /bin/mkdir -p
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking for style of include used by make... GNU
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables... 
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking whether gcc understands -c and -o together... yes
checking dependency style of gcc... gcc3
checking for library containing strerror... none required
checking for gcc... (cached) gcc
checking whether we are using the GNU C compiler... (cached) yes
checking whether gcc accepts -g... (cached) yes
checking for gcc option to accept ISO C89... (cached) none needed
checking whether gcc understands -c and -o together... (cached) yes
checking dependency style of gcc... (cached) gcc3
checking how to run the C preprocessor... gcc -E
checking for grep that handles long lines and -e... /bin/grep
checking for egrep... /bin/grep -E
checking for ANSI C header files... yes
checking build system type... x86_64-pc-linux-gnu
checking host system type... x86_64-pc-linux-gnu
checking how to print strings... printf
checking for a sed that does not truncate output... /bin/sed
checking for fgrep... /bin/grep -F
checking for ld used by gcc... /usr/bin/ld
checking if the linker (/usr/bin/ld) is GNU ld... yes
checking for BSD- or MS-compatible name lister (nm)... /usr/bin/nm -B
checking the name lister (/usr/bin/nm -B) interface... BSD nm
checking whether ln -s works... yes
checking the maximum length of command line arguments... 1572864
checking how to convert x86_64-pc-linux-gnu file names to x86_64-pc-linux-gnu format... func_convert_file_noop
checking how to convert x86_64-pc-linux-gnu file names to toolchain format... func_convert_file_noop
checking for /usr/bin/ld option to reload object files... -r
checking for objdump... objdump
checking how to recognize dependent libraries... pass_all
checking for dlltool... no
checking how to associate runtime and link libraries... printf %s\n
checking for ar... ar
checking for archiver @FILE support... @
checking for strip... strip
checking for ranlib... ranlib
checking command to parse /usr/bin/nm -B output from gcc object... ok
checking for sysroot... no
checking for a working dd... /bin/dd
checking how to truncate binary pipes... /bin/dd bs=4096 count=1
checking for mt... mt
checking if mt is a manifest tool... no
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking for dlfcn.h... yes
checking for objdir... .libs
checking if gcc supports -fno-rtti -fno-exceptions... no
checking for gcc option to produce PIC... -fPIC -DPIC
checking if gcc PIC flag -fPIC -DPIC works... yes
checking if gcc static flag -static works... yes
checking if gcc supports -c -o file.o... yes
checking if gcc supports -c -o file.o... (cached) yes
checking whether the gcc linker (/usr/bin/ld -m elf_x86_64) supports shared libraries... yes
checking whether -lc should be explicitly linked in... no
checking dynamic linker characteristics... GNU/Linux ld.so
checking how to hardcode library paths into programs... immediate
checking whether stripping libraries is possible... yes
checking if libtool supports shared libraries... yes
checking whether to build shared libraries... yes
checking whether to build static libraries... no
checking sys/epoll.h usability... yes
checking sys/epoll.h presence... yes
checking for sys/epoll.h... yes
checking for epoll_create... yes
checking for epoll_create1... yes
checking for pkg-config... /usr/bin/pkg-config
checking pkg-config is at least version 0.9.0... yes
configure: checking for guile 3.0
configure: checking for guile 2.2
configure: found guile 2.2
checking for ld used by gcc... /usr/bin/ld -m elf_x86_64
checking if the linker (/usr/bin/ld -m elf_x86_64) is GNU ld... yes
checking for shared library run path origin... done
checking for GUILE... yes
checking for guile-2.2... no
checking for guile2.2... no
checking for guile-2... no
checking for guile2... no
checking for guile... /usr/local/bin/guile
checking for Guile version >= 2.2... 2.2.3
checking for guild... /usr/local/bin/guild
checking for guile-config... /usr/local/bin/guile-config
checking that generated files are newer than configure... done
configure: creating ./config.status
config.status: creating env
config.status: creating Makefile
config.status: creating config.h
config.status: config.h is unchanged
config.status: executing depfiles commands
config.status: executing libtool commands

Then

$ make
mkdir -p fibers
sed -e "s|@extlibdir\@|/opt/guile/lib/guile/2.2/extensions|" \
    ./fibers/config.scm.in > fibers/config.scm
make  all-am
make[1]: Entering directory '/home/user/development/GuileScheme/fibers'
/bin/bash ./libtool  --tag=CC   --mode=compile gcc -DHAVE_CONFIG_H -I.    -I. -Wall -Werror  -pthread -I/usr/local/include/guile/2.2 -g -O2 -MT epoll_la-epoll.lo -MD -MP -MF .deps/epoll_la-epoll.Tpo -c -o epoll_la-epoll.lo `test -f 'epoll.c' || echo './'`epoll.c
libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I. -Wall -Werror -pthread -I/usr/local/include/guile/2.2 -g -O2 -MT epoll_la-epoll.lo -MD -MP -MF .deps/epoll_la-epoll.Tpo -c epoll.c  -fPIC -DPIC -o .libs/epoll_la-epoll.o
mv -f .deps/epoll_la-epoll.Tpo .deps/epoll_la-epoll.Plo
/bin/bash ./libtool  --tag=CC   --mode=link gcc -I. -Wall -Werror  -pthread -I/usr/local/include/guile/2.2 -g -O2 -export-dynamic -module  -o epoll.la -rpath /opt/guile/lib/guile/2.2/extensions epoll_la-epoll.lo -L/usr/local/lib -lguile-2.2 -lgc  
libtool: link: gcc -shared  -fPIC -DPIC  .libs/epoll_la-epoll.o   -L/usr/local/lib /usr/local/lib/libguile-2.2.so -L/usr/lib/x86_64-linux-gnu -lffi -lunistring -lgmp /usr/lib/x86_64-linux-gnu/libltdl.so -ldl -lcrypt -lm -lgc  -pthread -g -O2   -pthread -Wl,-soname -Wl,epoll.so.0 -o .libs/epoll.so.0.0.0
libtool: link: (cd ".libs" && rm -f "epoll.so.0" && ln -s "epoll.so.0.0.0" "epoll.so.0")
libtool: link: (cd ".libs" && rm -f "epoll.so" && ln -s "epoll.so.0.0.0" "epoll.so")
libtool: link: ( cd ".libs" && rm -f "epoll.la" && ln -s "../epoll.la" "epoll.la" )
./env /usr/local/bin/guild compile -Wunbound-variable -Warity-mismatch -Wformat -o "fibers.go" "fibers.scm"
wrote `fibers.go'
./env /usr/local/bin/guild compile -Wunbound-variable -Warity-mismatch -Wformat -o "fibers/channels.go" "fibers/channels.scm"
wrote `fibers/channels.go'
./env /usr/local/bin/guild compile -Wunbound-variable -Warity-mismatch -Wformat -o "fibers/conditions.go" "fibers/conditions.scm"
wrote `fibers/conditions.go'
./env /usr/local/bin/guild compile -Wunbound-variable -Warity-mismatch -Wformat -o "fibers/config.go" "fibers/config.scm"
wrote `fibers/config.go'
./env /usr/local/bin/guild compile -Wunbound-variable -Warity-mismatch -Wformat -o "fibers/counter.go" "fibers/counter.scm"
wrote `fibers/counter.go'
./env /usr/local/bin/guild compile -Wunbound-variable -Warity-mismatch -Wformat -o "fibers/deque.go" "fibers/deque.scm"
wrote `fibers/deque.go'
./env /usr/local/bin/guild compile -Wunbound-variable -Warity-mismatch -Wformat -o "fibers/epoll.go" "fibers/epoll.scm"
wrote `fibers/epoll.go'
./env /usr/local/bin/guild compile -Wunbound-variable -Warity-mismatch -Wformat -o "fibers/interrupts.go" "fibers/interrupts.scm"
wrote `fibers/interrupts.go'
./env /usr/local/bin/guild compile -Wunbound-variable -Warity-mismatch -Wformat -o "fibers/nameset.go" "fibers/nameset.scm"
wrote `fibers/nameset.go'
./env /usr/local/bin/guild compile -Wunbound-variable -Warity-mismatch -Wformat -o "fibers/operations.go" "fibers/operations.scm"
wrote `fibers/operations.go'
./env /usr/local/bin/guild compile -Wunbound-variable -Warity-mismatch -Wformat -o "fibers/posix-clocks.go" "fibers/posix-clocks.scm"
wrote `fibers/posix-clocks.go'
./env /usr/local/bin/guild compile -Wunbound-variable -Warity-mismatch -Wformat -o "fibers/psq.go" "fibers/psq.scm"
wrote `fibers/psq.go'
./env /usr/local/bin/guild compile -Wunbound-variable -Warity-mismatch -Wformat -o "fibers/scheduler.go" "fibers/scheduler.scm"
wrote `fibers/scheduler.go'
./env /usr/local/bin/guild compile -Wunbound-variable -Warity-mismatch -Wformat -o "fibers/stack.go" "fibers/stack.scm"
wrote `fibers/stack.go'
./env /usr/local/bin/guild compile -Wunbound-variable -Warity-mismatch -Wformat -o "fibers/repl.go" "fibers/repl.scm"
wrote `fibers/repl.go'
./env /usr/local/bin/guild compile -Wunbound-variable -Warity-mismatch -Wformat -o "fibers/timers.go" "fibers/timers.scm"
wrote `fibers/timers.go'
./env /usr/local/bin/guild compile -Wunbound-variable -Warity-mismatch -Wformat -o "fibers/web/server.go" "fibers/web/server.scm"
wrote `fibers/web/server.go'
./env /usr/local/bin/guild compile -Wunbound-variable -Warity-mismatch -Wformat -o "web/server/fibers.go" "web/server/fibers.scm"
wrote `web/server/fibers.go'
make[1]: Leaving directory '/home/user/development/GuileScheme/fibers'

Then

$ sudo make install
make  install-am
make[1]: Entering directory '/home/user/development/GuileScheme/fibers'
make[2]: Entering directory '/home/user/development/GuileScheme/fibers'
make[2]: Nothing to be done for 'install-exec-am'.
 /bin/mkdir -p '/opt/guile/lib/guile/2.2/extensions'
 /bin/bash ./libtool   --mode=install /usr/bin/install -c   epoll.la '/opt/guile/lib/guile/2.2/extensions'
libtool: install: /usr/bin/install -c .libs/epoll.so.0.0.0 /opt/guile/lib/guile/2.2/extensions/epoll.so.0.0.0
libtool: install: (cd /opt/guile/lib/guile/2.2/extensions && { ln -s -f epoll.so.0.0.0 epoll.so.0 || { rm -f epoll.so.0 && ln -s epoll.so.0.0.0 epoll.so.0; }; })
libtool: install: (cd /opt/guile/lib/guile/2.2/extensions && { ln -s -f epoll.so.0.0.0 epoll.so || { rm -f epoll.so && ln -s epoll.so.0.0.0 epoll.so; }; })
libtool: install: /usr/bin/install -c .libs/epoll.lai /opt/guile/lib/guile/2.2/extensions/epoll.la
libtool: finish: PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin:/sbin" ldconfig -n /opt/guile/lib/guile/2.2/extensions
----------------------------------------------------------------------
Libraries have been installed in:
   /opt/guile/lib/guile/2.2/extensions

If you ever happen to want to link against installed libraries
in a given directory, LIBDIR, you must either use libtool, and
specify the full pathname of the library, or use the '-LLIBDIR'
flag during linking and do at least one of the following:
   - add LIBDIR to the 'LD_LIBRARY_PATH' environment variable
     during execution
   - add LIBDIR to the 'LD_RUN_PATH' environment variable
     during linking
   - use the '-Wl,-rpath -Wl,LIBDIR' linker flag
   - have your system administrator add LIBDIR to '/etc/ld.so.conf'

See any operating system documentation about shared libraries for
more information, such as the ld(1) and ld.so(8) manual pages.
----------------------------------------------------------------------
 /bin/mkdir -p '/opt/guile/share/info'
 /usr/bin/install -c -m 644 ./fibers.info '/opt/guile/share/info'
 install-info --info-dir='/opt/guile/share/info' '/opt/guile/share/info/fibers.info'
 /bin/mkdir -p '/opt/guile/share/guile/site/2.2'
 /bin/mkdir -p '/opt/guile/share/guile/site/2.2/fibers/web'
 /usr/bin/install -c -m 644  fibers/web/server.scm '/opt/guile/share/guile/site/2.2/fibers/web'
 /bin/mkdir -p '/opt/guile/share/guile/site/2.2/web/server'
 /usr/bin/install -c -m 644  web/server/fibers.scm '/opt/guile/share/guile/site/2.2/web/server'
 /usr/bin/install -c -m 644  fibers.scm '/opt/guile/share/guile/site/2.2/.'
 /bin/mkdir -p '/opt/guile/share/guile/site/2.2/fibers'
 /usr/bin/install -c -m 644  fibers/channels.scm fibers/conditions.scm fibers/config.scm fibers/counter.scm fibers/deque.scm fibers/epoll.scm fibers/interrupts.scm fibers/nameset.scm fibers/operations.scm fibers/posix-clocks.scm fibers/psq.scm fibers/scheduler.scm fibers/stack.scm fibers/repl.scm fibers/timers.scm '/opt/guile/share/guile/site/2.2/fibers'
 /bin/mkdir -p '/opt/guile/lib/guile/2.2/ccache'
 /bin/mkdir -p '/opt/guile/lib/guile/2.2/ccache/fibers/web'
 /usr/bin/install -c -m 644  fibers/web/server.go '/opt/guile/lib/guile/2.2/ccache/fibers/web'
 /bin/mkdir -p '/opt/guile/lib/guile/2.2/ccache/web/server'
 /usr/bin/install -c -m 644  web/server/fibers.go '/opt/guile/lib/guile/2.2/ccache/web/server'
 /usr/bin/install -c -m 644  fibers.go '/opt/guile/lib/guile/2.2/ccache/.'
 /bin/mkdir -p '/opt/guile/lib/guile/2.2/ccache/fibers'
 /usr/bin/install -c -m 644  fibers/channels.go fibers/conditions.go fibers/config.go fibers/counter.go fibers/deque.go fibers/epoll.go fibers/interrupts.go fibers/nameset.go fibers/operations.go fibers/posix-clocks.go fibers/psq.go fibers/scheduler.go fibers/stack.go fibers/repl.go fibers/timers.go '/opt/guile/lib/guile/2.2/ccache/fibers'
make[2]: Leaving directory '/home/user/development/GuileScheme/fibers'
make[1]: Leaving directory '/home/user/development/GuileScheme/fibers'

What might seem unusual is, that it installs the library into /opt/guile/lib/guile/2.2/extensions, while I am using Guile 2.2.3. Not sure if there should be another subdirectory in /opt/guile/lib/guile/ for 2.2.3.

However, ./env guile works. There I can do (use-modules (fibers)).

The operating system is a Ubuntu 17.04.
Guile was build very recently, maybe 3 or 4 days ago.

Porting guile-websocket to fibers

Hi!

I am trying to run guile-websocket with fibers, but I failed. I pushed the code online:

  git clone https://github.com/amirouche/guile-websocket

It seems to me the code blocks when trying to read something on the socket. To reproduce
configure the project using the autotools and then fire the test.scm with the following
command:

  ./pre-inst-env guile test.scm

and open the test.html in a browser. This should print a "connected" message in the web
console and "hello there". But it doesn't. If you refresh the page a few time (7 times
on my side) the server part completely hangs and doesn't accept new connections.

Any help, will be greatly appreciated.

Example from the manual doesn't work as documented

Guile version: 3.0.4
Fibers version: 1.0.0

     For example:
          (run-fibers (lambda () 1))
          => 1

          (run-fibers
           (lambda ()
             (spawn-fiber (lambda () (display "hey!\n")))))
          -| hey!

If I paste this into repl, it doesn't print "hey!". Adding #:drain? #t fixes the problem.

README.md doesn't document dependencies

Inspired by spritely goblins, I'm trying to build fibers on NetBSD in a pkgsrc context. The README says that the only dependency is guile, but I find I need also (to build from git)

  • autoconf
  • automake
  • libtool
    and apparently gnulib for AC_LIB_LINKFLAGS_FROM_LIBS.

This issue is just about documenting what is required, to build from git, and to build tarballs. (Perhaps it is correct for building make dist-created tarballs.)

High CPU usage on system time change

Since Guix upgraded to guile-fibers 1.3.1, shepherd hangs shortly after boot on systems without a RTC. I believe the problem comes from using get-internal-real-time in the guile-fibers timer wheel implementation. After NTP corrects the system time, this function returns a much larger value, and the CPU load (for one core) goes to 100%.

Profiling suggests the process spends the CPU time in timer-wheel-advance!, so I imagine it is trying to tick through a five-year time diff. I tried increasing the system time manually by N days, which causes shepherd to be unresponsive (e.g. to herd status) for about Nร—5 seconds. I observed similar behavior with the example from guile-fibers readme.

Replacing all instances of (get-internal-real-time) with (clock-gettime 1) in guile-fibers, and reconfiguring the system with the patched package, fixes this problem. I think using a monotonic clock makes sense, but there is probably a cleaner / more portable way to do it.

Thanks!

Sending lambda as message does not work

I have the following simplified code adapted from the tutorial:

(use-modules (fibers)
             (fibers channels)
             (ice-9 match))

(define (server in out)
  (let lopo ()
    (display (simple-format #f "~s" ((get-message in))))
    (put-message out 'pong!)
    (lopo)))

(define (client in out)
  (put-message out (ฮป () 'ping!))
  (pk 'client-received (get-message in)))

(run-fibers
 (lambda ()
   (let ([c2s (make-channel)]
         [s2c (make-channel)])
     (spawn-fiber (lambda () (server c2s s2c)))
     (client s2c c2s))))

However, when I run it, I get:

Uncaught exception in task:
In fibers.scm:
    149:8  2 (_)
In /home/user/development/GuileScheme/fibers-tutorial/fibers-tutorial.scm:
    13:32  1 (server #<<channel> getq: #<atomic-box 5566f1ed9240 value: (())> getq-gc-cโ€ฆ> โ€ฆ)
In unknown file:
           0 (_)
ERROR: Wrong type to apply: ping!

I tried several versions of calling and not calling the received lambda, but it seems, that I cannot send a lambda?
That would mean though, that the procedures run in fibers need to be known / in some way defined ahead of receiving messages (only possible to receive data). I would like to be able to simply send a thunk to evaluate to a fiber.

Fibers wait can forever on a file descriptor once it has been closed

Consider this code where one fiber blocks in accept on a socket and the other fiber closes said socket:

(use-modules (fibers)
             (fibers conditions))

(define %socket-file-name
  "/tmp/test.sock")

(define (open-server-socket file-name)
  (let ((sock    (socket PF_UNIX
                         (logior SOCK_STREAM SOCK_NONBLOCK SOCK_CLOEXEC)
                         0))
        (address (make-socket-address AF_UNIX file-name)))
    (bind sock address)
    (listen sock 10)
    sock))

(define (run)
  (false-if-exception (delete-file %socket-file-name))
  (let ((socket (open-server-socket %socket-file-name))
        (ready (make-condition)))
    (spawn-fiber
     (lambda ()
       (pk 'accepting)
       (signal-condition! ready)
       ;; Here's the problem: 'accept' never returns. โคต
       (pk 'accepted (false-if-exception
                      (accept socket (logior SOCK_NONBLOCK SOCK_CLOEXEC))))
       (pk 'done)))

    (wait ready)
    (pk 'closing! socket)
    (shutdown socket 0)
    (close-port socket)
    (pk 'closed! socket)
    (sleep 2)
    (pk 'exiting)))

(setvbuf (current-output-port) 'line)
(setvbuf (current-error-port) 'line)
(run-fibers
 run
 #:drain? #t
 #:parallelism 1
 #:hz 0)

With the epoll backend on GNU/Linux, the accept call never returns: that fiber is never woken up. The strace output looks like this:

write(1, ";;; (accepting)\n", 16)       = 16 <0.000029>
accept4(12, 0x7ffeb5d00fc0, [112], SOCK_CLOEXEC|SOCK_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable) <0.000057>
epoll_ctl(11, EPOLL_CTL_MOD, 12, {events=EPOLLIN|EPOLLRDHUP|EPOLLONESHOT, data={u32=12, u64=12}}) = -1 ENOENT (No such file or directory) <0.000020>
openat(AT_FDCWD, "/gnu/store/5h2w4qi9hk1qzzgi1w83220ydslinr4s-glibc-2.33/share/locale/en_US.utf8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory) <0.000082>
openat(AT_FDCWD, "/gnu/store/5h2w4qi9hk1qzzgi1w83220ydslinr4s-glibc-2.33/share/locale/en_US/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory) <0.000025>
openat(AT_FDCWD, "/gnu/store/5h2w4qi9hk1qzzgi1w83220ydslinr4s-glibc-2.33/share/locale/en.utf8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory) <0.000022>
openat(AT_FDCWD, "/gnu/store/5h2w4qi9hk1qzzgi1w83220ydslinr4s-glibc-2.33/share/locale/en/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory) <0.000022>
epoll_ctl(11, EPOLL_CTL_ADD, 12, {events=EPOLLIN|EPOLLRDHUP|EPOLLONESHOT, data={u32=12, u64=12}}) = 0 <0.000031>
epoll_wait(11, [], 8, 0)                = 0 <0.000021>
write(1, "\n", 1)                       = 1 <0.000037>
ioctl(12, TCGETS, 0x7ffeb5d00d70)       = -1 ENOTTY (Inappropriate ioctl for device) <0.000024>
write(1, ";;; (closing! #<input-output: socket 12>)\n", 42) = 42 <0.000031>
close(12)                               = 0 <0.000040>
write(1, "\n", 1)                       = 1 <0.000037>
write(1, ";;; (closed! #<closed: file 7f8b306063f0>)\n", 43) = 43 <0.000025>
write(2, "WARNING: (guile-user): imported module (fibers) overrides core binding `sleep'\n", 79) = 79 <0.000034>
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0 <0.000017>
epoll_wait(11, [], 8, 1999)             = 0 <2.000530>
epoll_wait(11, [], 8, 0)                = 0 <0.000073>
write(1, "\n", 1)                       = 1 <0.000108>
write(1, ";;; (exiting)\n", 14)         = 14 <0.000053>
epoll_wait(11, [], 8, 0)                = 0 <0.000040>

The key here is that epoll_wait never shows file descriptor 12 as ready. This is consistent with epoll(7), which says that closing a file descriptor causes it to be removed from the interest list.

So epoll is behaving as advertised, but the fact that a fiber is left behind forever is a problem. We could let the application take care of the problem (don't close a file descriptor when there are suspended fibers waiting on it), but that doesn't sound nice.

To address it in Fibers proper, it seems like we would need to register file descriptor finalizers that would wake up suspended fibers.

Thoughts?

Infinite loops on uncaught exceptions

In switching guile-goblins from Fibers 1.0 to 1.10, I noticed that when exceptions occured, schedulers suddenly stopped working but also my cpu would shoot up to 100%. That's strange, that shouldn't happen, what gives?

I did a git-bisect against fibers to try to figure out where this started. The issue is introduced in 84addfb. I haven't checked for sure, but I'm pretty certain that what's happening is that where the scheduler previously was catching the exception in the individual task but the scheduler would continue looping, now the scheduler is breaking and not continuing on an individual error. Thus, it isn't feeding anything to the individual fibers, which start spin locking madly.

Guix build guile-fibers test fails: Too many heap sections.

Howdy. I'm a beginner trying to get GUIX system up and running. I get a core dump when running guile-fibers tests on aarch64.

      In ice-9/boot-9.scm:
        1685:16  0 (raise-exception _ #:continuable? _)
      ice-9/boot-9.scm:1685:16: In procedure raise-exception:
      In procedure set-current-dynamic-state: Wrong type argument in position 1: #<dynamic-state 115f30630>
      Too many heap sections: Increase MAXHINCR or MAX_HEAP_SECTS
      /gnu/store/lzf5zg0diw2bhh2qji4bl2v46wd8pylc-bash-minimal-5.1.8/bin/bash: line 6:  2902 Aborted                 (core dumped) top_srcdir="/tmp/guix-build-guile-fibers-1.1.0-0.c25dcb9.drv-0/source" ./env /gnu/store/ilhx4q1yyyflqigai0wk7f677yrzpffl-guile-3.0.7/bin/guile -s ${dir}$tst
      FAIL: tests/channels.scm

Things I tried,

Rerunning with modified =GC_MAXIMUM_HEAP_SIZE=, =GC_INITIAL_HEAP_SIZE=, and
https://lists.gnu.org/archive/html/bug-guix/2019-10/msg00281.html advice from
the mailing list to increase file descriptors. These elongated the time it took
for the tests to fail. So my working theory is that (current-processor-count) in
tests/channels.scm returns 80 cores (instead of my vm's share of 4). All other
tests pass.

(assert-run-fibers-terminates (pingpong (current-processor-count) 1000))

The VM I am using has 4 cores of the 80 on https://solutions.amperecomputing.com/systems/altra/kraken/other/kraken-comhpc-1s

I understand the reputation Oracle has! I'm simply a student playing around on
the free VM allowance. Please don't shoot the messenger.

debian@master-instance:~$ lscpu
Architecture:                    aarch64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
CPU(s):                          4
On-line CPU(s) list:             0-3
Thread(s) per core:              1
Core(s) per socket:              4
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       ARM
Model:                           1
Model name:                      Neoverse-N1
Stepping:                        r3p1
BogoMIPS:                        50.00
NUMA node0 CPU(s):               0-3
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:        Mitigation; __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; CSV2, BHB
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs

The test suite takes roughly ~1700 seconds to fail. Happy to provide any other details.

Have a good one!

EPOLLHUP events are improperly handled

Hello!

When reading from a non-blocking pipe, epoll eventually returns EPOLLHUP. In Fibers 1.0.0, this leads to a loop where schedule-task-when-fd-active is called repeatedly, which leads to this (PID 3718 is the parent process reading from the pipe):

3718  read(27, "(((#:eval-id . 1)\n  (#:nix-name . \"random\")\n  (#:system . \"x86_64-linux\")\n  (#:duration . 0.878188)\n  (#:job-name \"foo0\")\n  (#:derivation\n   .\n   \"/gnu/store/cika9pnr12mnhrqr8mhsix9g37cc9hp0-random.drv\")\n  (#:license (name . \"GPLv3+\"))\n  (#:description \"dummy job\")\n  (#:long-description \"really dummy job\"))\n ((#:eval-id . 1)\n  (#:nix-name . \"r"..., 4096) = 3392
3718  read(27,  <unfinished ...>
3729  exit_group(0 <unfinished ...>
3718  <... read resumed> 0x19ff020, 4096) = -1 EAGAIN (Resource temporarily unavailable)
[...]
3718  epoll_wait(11, [{EPOLLHUP, {u32=27, u64=27}}], 8, 8547) = 1
3718  epoll_ctl(11, EPOLL_CTL_MOD, 27, {EPOLLIN|EPOLLRDHUP|EPOLLONESHOT, {u32=27, u64=27}}) = 0
3718  rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
3718  epoll_wait(11, [{EPOLLHUP, {u32=27, u64=27}}], 8, 8547) = 1
3718  epoll_ctl(11, EPOLL_CTL_MOD, 27, {EPOLLIN|EPOLLRDHUP|EPOLLONESHOT, {u32=27, u64=27}}) = 0
3718  rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
3718  epoll_wait(11, [{EPOLLHUP, {u32=27, u64=27}}], 8, 8547) = 1
3718  epoll_ctl(11, EPOLL_CTL_MOD, 27, {EPOLLIN|EPOLLRDHUP|EPOLLONESHOT, {u32=27, u64=27}}) = 0
3718  rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
3718  epoll_wait(11, [{EPOLLHUP, {u32=27, u64=27}}], 8, 8547) = 1
3718  epoll_ctl(11, EPOLL_CTL_MOD, 27, {EPOLLIN|EPOLLRDHUP|EPOLLONESHOT, {u32=27, u64=27}}) = 0
3718  rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
3718  epoll_wait(11, [{EPOLLHUP, {u32=27, u64=27}}], 8, 8547) = 1
3718  epoll_ctl(11, EPOLL_CTL_MOD, 27, {EPOLLIN|EPOLLRDHUP|EPOLLONESHOT, {u32=27, u64=27}}) = 0

The epoll_wait/epoll_ctl sequence repeats endlessly.

On Fibers 1.0.0 the patch below fixes the problem for me:

diff --git a/fibers/internal.scm b/fibers/internal.scm
index 79b1011..255ec0b 100644
--- a/fibers/internal.scm
+++ b/fibers/internal.scm
@@ -230,14 +230,14 @@ thread."
      ;; deactivated our entry in the epoll set.
      (set-car! sources #f)
      (set-cdr! sources '())
-     (unless (zero? (logand revents EPOLLERR))
+     (unless (zero? (logand revents (logior EPOLLHUP EPOLLERR)))
        (hashv-remove! (scheduler-sources sched) fd))
      ;; Now resume or re-enqueue fibers, as appropriate.
      (let lp ((waiters waiters))
        (match waiters
          (() #f)
          (((events . resume) . waiters)
-          (if (zero? (logand revents (logior events EPOLLERR)))
+          (if (zero? (logand revents (logior events (logior EPOLLHUP EPOLLERR))))
               ;; Re-enqueue.
               (add-fd-event-waiter sched fd events resume)
               ;; Resume.

Not sure how this translates on current master.

autoreconf -vif fails complaining about build-aux/config.rpath not existing

; GUILE=/bin/guile-2.2 ./autogen.sh
autoreconf: Entering directory `.'
autoreconf: configure.ac: not using Gettext
autoreconf: running: aclocal --force -I m4
autoreconf: configure.ac: tracing
autoreconf: running: libtoolize --copy --force
libtoolize: putting auxiliary files in AC_CONFIG_AUX_DIR, 'build-aux'.
libtoolize: copying file 'build-aux/ltmain.sh'
libtoolize: putting macros in AC_CONFIG_MACRO_DIRS, 'm4'.
libtoolize: copying file 'm4/libtool.m4'
libtoolize: copying file 'm4/ltoptions.m4'
libtoolize: copying file 'm4/ltsugar.m4'
libtoolize: copying file 'm4/ltversion.m4'
libtoolize: copying file 'm4/lt~obsolete.m4'
autoreconf: running: /usr/bin/autoconf --force
autoreconf: running: /usr/bin/autoheader --force
autoreconf: running: automake --add-missing --copy --force-missing
Unescaped left brace in regex is deprecated, passed through in regex; marked by <-- HERE in m/\${ <-- HERE ([^ \t=:+{}]+)}/ at /bin/automake line 3936.
configure.ac:32: installing 'build-aux/compile'
configure.ac:49: error: required file 'build-aux/config.rpath' not found
configure.ac:26: installing 'build-aux/missing'
Makefile.am: installing 'build-aux/depcomp'
autoreconf: automake failed with exit status: 1

So, let's force it

; touch build-aux/config.rpath
; GUILE=/bin/guile-2.2 ./autogen.sh
autoreconf: Entering directory `.'
autoreconf: configure.ac: not using Gettext
autoreconf: running: aclocal --force -I m4
autoreconf: configure.ac: tracing
autoreconf: running: libtoolize --copy --force
libtoolize: putting auxiliary files in AC_CONFIG_AUX_DIR, 'build-aux'.
libtoolize: copying file 'build-aux/ltmain.sh'
libtoolize: putting macros in AC_CONFIG_MACRO_DIRS, 'm4'.
libtoolize: copying file 'm4/libtool.m4'
libtoolize: copying file 'm4/ltoptions.m4'
libtoolize: copying file 'm4/ltsugar.m4'
libtoolize: copying file 'm4/ltversion.m4'
libtoolize: copying file 'm4/lt~obsolete.m4'
autoreconf: running: /usr/bin/autoconf --force
autoreconf: running: /usr/bin/autoheader --force
autoreconf: running: automake --add-missing --copy --force-missing
Unescaped left brace in regex is deprecated, passed through in regex; marked by <-- HERE in m/\${ <-- HERE ([^ \t=:+{}]+)}/ at /bin/automake line 3936.
configure.ac:32: installing 'build-aux/compile'
configure.ac:26: installing 'build-aux/missing'
Makefile.am: installing 'build-aux/depcomp'
autoreconf: Leaving directory `.'
; 

Everything after that works with my GUILE=/bin/guile-2.2 hack. It may have to do with which versions of autotools I'm using:

; autoconf --version
autoconf (GNU Autoconf) 2.69
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+/Autoconf: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>, <http://gnu.org/licenses/exceptions.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by David J. MacKenzie and Akim Demaille.
; automake --version
Unescaped left brace in regex is deprecated, passed through in regex; marked by <-- HERE in m/\${ <-- HERE ([^ \t=:+{}]+)}/ at /bin/automake line 3936.
automake (GNU automake) 1.15
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl-2.0.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Tom Tromey <[email protected]>
       and Alexandre Duret-Lutz <[email protected]>.
; autoreconf --version
autoreconf (GNU Autoconf) 2.69
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+/Autoconf: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>, <http://gnu.org/licenses/exceptions.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by David J. MacKenzie and Akim Demaille.

Resource (fd) leak in run-fibers

Only the main scheduler is destroyed when run-fibers returns. So the epoll file descriptor and the two wake-pipe file descriptors for each remote scheduler stay open until garbage-collected (which until recently still wouldn't close them due to a guile bug with revealed-count).

To see this behavior in practice, call run-fibers in a loop a few thousand times (with #:parallelism > 1). Guile will abort when it tries creating a new thread but creating the sleep pipe for them returns EMFILE.

We should destroy the peer schedulers as well when we return from run-fibers. AFAIK this should be as simple as changing

(match affinities
  ((affinity . affinities)
   (dynamic-wind
     (lambda ()
       (start-auxiliary-threads scheduler hz finished? affinities))
     (lambda ()
       (%run-fibers scheduler hz finished? affinity))
     (lambda ()
       (stop-auxiliary-threads scheduler)))))
(destroy-scheduler scheduler)
(apply values (atomic-box-ref ret))

to

(match affinities
  ((affinity . affinities)
   (dynamic-wind
     (lambda ()
       (start-auxiliary-threads scheduler hz finished? affinities))
     (lambda ()
       (%run-fibers scheduler hz finished? affinity))
     (lambda ()
       (stop-auxiliary-threads scheduler)))))
(for-each destroy-scheduler (scheduler-remote-peers scheduler))
(destroy-scheduler scheduler)
(apply values (atomic-box-ref ret))

It should be safe to destroy them at that point because stop-auxiliary-threads has ensured that they aren't running.

In procedure pipe: Too many open files

After monkeying with fibers for 5-10 minutes, I get the error:

In fibers.scm:
WARNING: (guile-user)(guile-user)   113:22  5 :
:  
In fibers/scheduler.scm:
imported mm   105:14  4 (make-scheduler #:parallelism _ #:prompt-tag _)
oduu
In fibers/epoll.scm:
le (fibers) oo    94:31  3 (epoll-create #:close-on-exec? _ #:maxevents _)
verrides      53:14  2 (make-wake-pipe)
cc
In unknown file:
           1 (pipe)
ore binn
In ice-9/boot-9.scm:
ding `sleep''  1669:16  0 (raise-exception _ #:continuable? _)

ice-9/boot-9.scm:1669:16: In procedure raise-exception:
In procedure pipe: Too many open files

I've gotten this several times now, each time after a clean restart. This is using git clone of today's git.

I do not have any simple test case to reproduce this: my fibers code is very simple, but it's working within a large & complex framework of other code. (specifically, inside of https://github.com/opencog/learn/) (you won't find fibers code in there, because I'm still experimenting with it.)

(I'm trying to see if I can use fibers to parallelize a work-farm (a local async proceedure call): I've got a subroutine that is slow but is called many times. I cannot use par-for-each, because that subroutine is called from several different places under quite different circumstances. So fibers seemed ideal for this. Of course I could just create a thread-pool, too.)

Implementing thread-operation

I was working on guile-nrepl (asyncronous network repl for guile) and stumbled upon an issue: we can't do [interruptable] eval inside fiber because it blocks the whole thread until evaluation is complete. So we need to do eval in a separate thread and thus we need to interact with this threads from fibers somehow.

I hacked together an implementation of thread-operation and it works great, but because of my lack of knowledge of fibers internals the implementation consumes unreasonable amounts of memory and probably contains some other flaws. Any ideas on a proper implementation of this operation?

(use-modules (fibers))
(use-modules (fibers channels))
(use-modules (fibers operations))
(use-modules (fibers timers))
(use-modules (fibers conditions))
(use-modules (fibers scheduler))
(use-modules (ice-9 threads))
(use-modules (ice-9 match))
(use-modules (ice-9 atomic))

(define (thread-operation th)
  "Make an operation that will succeed when the thread is
exited.  The operation will succeed with the value returned by thread."
  (define (wrap-fn)
    (join-thread th))
  (define (try-fn) (and (thread-exited? th) values))
  (define (block-fn flag sched resume)
    (define (commit)
       (match (atomic-box-compare-and-swap! flag 'W 'S)
         ('W (resume values))
         ('C (commit))
         ('S #f)))
    (when sched
      (schedule-task
       sched
       (lambda ()
         (perform-operation (thread-operation th))
         (commit)))))
  (make-base-operation wrap-fn try-fn block-fn))

(define (async-program)
  (let ((th (call-with-new-thread (lambda () (sleep 7) 'thread-value)))
        (ch (make-channel))
        (cnd (make-condition)))
    (spawn-fiber
     (lambda ()
       (put-message
        ch
        (perform-operation
         (choice-operation
          (wrap-operation
           (wait-operation cnd)
           (lambda ()
             (format #t "condition signal recieved\n")
             'recieved-cnd-signal))
          (wrap-operation
           (thread-operation th)
           (lambda (. v)
             (format #t "thread finished: ~a\n" v)
             'finished-long-operation)))))))
     (spawn-fiber
      (lambda ()
        (sleep 5)
        (format #t "sending signal\n")
        (signal-condition! cnd)))
     (get-message ch)))

(format #t "return value: ~a\n"
        (run-fibers async-program #:drain? #t))

Unknown meta command: fibers

The documentation mentions 2.6 REPL Commands -- REPL Command: fibers [sched] but when I run it I get

scheme@(run)> ,fibers
Unknown meta command: fibers

FYI, I also have this:

scheme@(run)> ,scheds                    
No schedulers.

But that is OK, as expected because the scheduler has finished.

FYI, This is built from git clone of today's git repo.

'get-message' continuation fired more than once?

On aarch64-linux-gnu, everything looks as though the continuation of a get-message call might occasionally be called more than once. Consider this code:

(use-modules (fibers)
             (fibers channels)
             (srfi srfi-1)
             (ice-9 match))

(define %max 100000)

(define (receiver channel)
  (lambda ()
    (let loop ((previous 0))
      (when (< previous %max)
        (let ((n (get-message channel)))
          (unless (= n (+ previous 1))
            (error "boooh!" n))
          (loop n))))))

(define (sender channel)
  (lambda ()
    (let loop ((n 1))
      (when (<= n %max)
        (put-message channel n)
        (loop (+ n 1))))))

(define (run)
  (let ((channel (make-channel)))
    (spawn-fiber (receiver channel))
    (spawn-fiber (sender channel))
    #t))

(run-fibers run #:parallelism 1 #:hz 0 #:drain? #t)
(pk 'done)

It works fine on x86_64-linux-gnu AFAICS, but on AArch64 I quickly hit this:

$ guile ../fibers-channel.scm
Uncaught exception in task:
In fibers.scm:
    150:8  2 (_)
In /home/ludo/src/shepherd/../fibers-channel.scm:
    14:12  1 (_)
In ice-9/boot-9.scm:
  1685:16  0 (raise-exception _ #:continuable? _)
ice-9/boot-9.scm:1685:16: In procedure raise-exception:
boooh! 2241
Uncaught exception in task:
In fibers.scm:
    150:8  2 (_)
In /home/ludo/src/shepherd/../fibers-channel.scm:
    14:12  1 (_)
In ice-9/boot-9.scm:
  1685:16  0 (raise-exception _ #:continuable? _)
ice-9/boot-9.scm:1685:16: In procedure raise-exception:
boooh! 2240

;;; (done)

Some deprecated features have been used.  Set the environment
variable GUILE_WARN_DEPRECATED to "detailed" and rerun the
program to get more information.  Set it to "no" to suppress
this message.

(This is Guile 3.0.9 and Fibers 1.1.1.)

#:drain? #t does not wait until all fibers have completed.

No threads, no operations, only run-fibers with #:drain? #t and spawn-fiber.
FWIW, the bug doesn't manifest when disabling preemption (#:hz 0). Anyway, here is a test case waiting to succeed:

;;;; Copyright (C) 2021 Maxime Devos
;;;;
;;;; This library is free software; you can redistribute it and/or
;;;; modify it under the terms of the GNU Lesser General Public
;;;; License as published by the Free Software Foundation; either
;;;; version 3 of the License, or (at your option) any later version.
;;;;
;;;; This library is distributed in the hope that it will be useful,
;;;; but WITHOUT ANY WARRANTY; without even the implied warranty of
;;;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
;;;; Lesser General Public License for more details.
;;;;
;;;; You should have received a copy of the GNU Lesser General Public
;;;; License along with this library; if not, write to the Free Software
;;;; Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA

(use-modules (ice-9 control)
             (fibers)
             (srfi srfi-43)
             (srfi srfi-64)
             (rnrs bytevectors))

(define do-nothing-but-dont-tell-guile
  (eval '(lambda _ (values)) (current-module)))

(define N_ITERATIONS 20000)
;; (* 2 (current-processor-count)) on the system
;; the test was written on (and failing), in case
;; that matters.
(define N_THREAD 8)

(define (thread thread-index)
  "Do things, make garbage and return #t."
  (pk 'fiber-start thread-index)
  (do ((i 0 (+ 1 i)))
      ((>= i N_ITERATIONS))
    (do-nothing-but-dont-tell-guile thread-index i)
    ;; Curiously, if I remove the following expression,
    ;; then the bug consistently does not manifest. Some connection
    ;; to garbage management perhaps?
    ;;
    ;; Using (cons thread-index i) also manifests the bug,
    ;; but (subjectively) less frequently.
    (do-nothing-but-dont-tell-guile
     (make-bytevector 600)))
  ;; If the test fails and you look at the output,
  ;; notice that not all (fiber-start INDEX) are matched
  ;; by a (fiber-end INDEX).
  (pk 'fiber-end thread-index)
  #t)

(define (check-all-done per-thread)
  (define (bad? a)
    (not (eq? a #t)))
  (define bad (vector-index bad? per-thread))
  (if bad
      (begin (pk 'bad!! bad (vector-ref per-thread bad))
             #f)
      #t))

(test-assert "all fibers are xrun to completion"
  (let ((thread-indices (iota N_THREAD))
        ;; When a thread has completed, an element
        ;; of this vector is set to #t.
        (thread-done? (make-vector (+ 1 N_THREAD))))
    (run-fibers
     (lambda ()
       (define (run! thread-index)
         (spawn-fiber
          (lambda ()
            (vector-set! thread-done? thread-index
                         (thread thread-index)))))
       (for-each run! thread-indices))
     #:drain? #t
     ;; No need
     #:install-suspendable-ports? #f
     #:hz 700)
    (vector-set! thread-done? N_THREAD #t)
    (check-all-done thread-done?)))

Pinging @amirouche since someone on IRC suggested so.
Your N_THREAD and N_ITERATIONS may vary. Also, this test case occasionally succeeds.

fibers does not build without epoll

Building fibers 1.1.1 on NetBSD 9 fails because it tries to include sys/epoll.h. This has several problems:

  • epoll.h is not specified by POSIX (and the Linux man page says that), and thus the build system should not try to include it without an autconf test
  • there doesn't seem to be an implementation using POSIX-specified facilities that is used when epoll is not present
  • the README doesn't mention this dependency -- the default expectation is that software will build on more or less "any reasonable system complying with POSIX". (The system I'm trying to build on does have POSIX-compliant poll and also kqueue.)

Guile deprecation warning due to bit-count

here:

(cond-expand

the issue is that the reference to bit-count is not eliminated completely at expand time, it's only guarded by a runtime if.

not the end of the world, but due to this Shepherd services in Guix print this enormous warning, mutiple times:

Some deprecated features have been used.  Set the environment
variable GUILE_WARN_DEPRECATED to "detailed" and rerun the
program to get more information.  Set it to "no" to suppress
this message.

Exceptions in initial fiber cause hang

Consider the following:

(run-fibers (lambda () (throw 'foo)))

in both current master and 1.0.0 this will cause a hang - run-fibers doesn't return, and it doesn't do anything either (in current master it will actually be an uninterruptible hang - ^C'ing won't do anything because of the catch-loop in run-scheduler (unless you manage to press it again before catch is re-entered)). The reason for this is that the ret atomic box doesn't get written to when the init thunk throws an exception. Thus, finished? always returns #f. It's also currently possible for an exception from init to cause the main scheduler to never be woken up, as that spawn-fiber invocation is skipped.

Since we're already ensuring that the return value of the initial thunk gets passed along to the caller, we should probably ensure that any exceptions do as well. We could do this by wrapping the result such that an exception with arguments args becomes (list 'err args) and a regular exit with return values vals becomes (list 'ok vals). We then match on the first symbol when deciding at the end of run-fibers whether to return or re-throw.

The code below fixes the hang-on-exception behavior and also ensures that the main scheduler gets woken up even if an exception is thrown. It also re-throws the exception in the caller's context.

(define* (run-fibers #:optional (init #f)
                     #:key (hz 100) (scheduler #f)
                     (parallelism (current-processor-count))
                     (cpus (getaffinity 0))
                     (install-suspendable-ports? #t)
                     (drain? #f))
  (when install-suspendable-ports? (install-suspendable-ports!))
  (cond
   (scheduler
    (let ((finished? (lambda () #f)))
      (when init (spawn-fiber init scheduler))
      (%run-fibers scheduler hz finished? cpus)))
   (else
    (let* ((scheduler (make-scheduler #:parallelism parallelism))
           (ret (make-atomic-box #f))
           (finished? (lambda ()
                        (and (atomic-box-ref ret)
                             (or (not drain?)
                                 (not (scheduler-work-pending? scheduler))))))
           (affinities (compute-affinities cpus parallelism)))
      (unless init
        (error "run-fibers requires initial fiber thunk when creating sched"))
      (spawn-fiber (lambda ()
                     (dynamic-wind
		       (const #t)
		       (lambda ()
			 (catch #t
			   (lambda ()
			     (call-with-values init
			       (lambda vals (atomic-box-set! ret
							     (list 'ok vals)))))
			   (lambda args
			     (atomic-box-set! ret (list 'err args))
			     (apply throw args))))
		       ;; Could be that this fiber was migrated away.
		       ;; Make sure to wake up the main scheduler.
		       (lambda ()
			 (spawn-fiber (lambda () #t) scheduler))))
                   scheduler)
      (match affinities
        ((affinity . affinities)
         (dynamic-wind
           (lambda ()
             (start-auxiliary-threads scheduler hz finished? affinities))
           (lambda ()
             (%run-fibers scheduler hz finished? affinity))
           (lambda ()
             (stop-auxiliary-threads scheduler)))))
      (destroy-scheduler scheduler)
      (match (atomic-box-ref ret)
        (('ok vals)
         (apply values vals))
        (('err args)
         (apply throw args)))))))

Possibly wrong usage of atomic-box-set!

https://github.com/wingo/fibers/blob/master/fibers/scheduler.scm#L272

I see (atomic-box-set! box (+ (atomic-box-ref ...) ...)) pattern here, but IIUC this will cause race condition in multithreading context, the box maybe update after atomic-box-ref and before atomic-box-set! by other threads.

I think this implementation is correct

(define update-count-box (box)
  (let retry ((old-box-val (atomic-box-ref box)))
    (let ((new-val (logand (1+ old-box-val) #xffffFFFF))
          (cur-box-val (atomic-box-compare-and-swap! box old-box-val new-val)))
      (if (eq? cur-box-val new-val)
          new-val
          (retry cur-box-val)))))

Fibers not running in parallel

I have some code in which fibers do not perform their work in parallel for some reason:

(define pool-initializer
  (lambda* (#:key (parallelism (current-processor-count)))
    (let ([channel-receive (make-channel)]
          [scheduler (make-scheduler #:parallelism parallelism)])

      (call-with-new-thread
       (lambda ()
         (run-fibers
          (lambda ()
            (let loop ([index parallelism])
              (unless (zero? index)
                (spawn-fiber (lambda () (worker index channel-receive)))
                (loop (- index 1)))))
          #:scheduler scheduler)))

      (call-with-new-thread
       (lambda ()
         (work-distributor channel-receive)))
      channel-receive)))

This procedure initializes a pool of fibers. The number of fibers depends on the parallelism keyword argument. Later on these fibers receive some work on a channel and return their result on a channel.

My understanding is, that fibers of the same scheduler should be able to run in parallel, if the scheduler's parallelism keyword argument is > 1. For some reason however, this does not happen and the fibers run sequentially, one after the other, when I give them work.

Why do they not run in parallel?

Is it because the scheduler is run inside a call-with-new-thread? Does this limit the parallelism to 1? I need to run it in call-with-new-thread, because I need to return the channel-receive, which will be input to the procedure, which gives work to a thing I called work-distributor. If I cannot run a scheduler in call-with-new-thread, how can I run a scheduler without blocking until all the fibers have completed their work?

So far I have only guesses, why my code does not do computation in parallel.

For a complete example of sequentially running fibers, I will paste my whole code below. It is actually based on @amirouche 's thread pool for his babelia project (I currently cannot find it on Github any longer and do not know where to look for it.), only that I am trying to use fibers instead of threads to perform work, hoping to take advantage of work stealing and work sharing as well as running more lightweight than threads.

The code:

(define-module (fibers-pool))


(use-modules
 ;; FIFO queue, not functional, using mutation
 ;; https://www.gnu.org/software/guile/manual/html_node/Queues.html
 (ice-9 q)
 (ice-9 match)
 (ice-9 threads)
 (rnrs exceptions)
 (rnrs conditions)
 ;; fibers internals are needed for creating schedulers without running anything
 ;; in them immediately
 (fibers)
 (fibers channels)
 (fibers internal))


(define displayln
  (lambda (msg)
    (display msg)
    (newline)))


(define work-distributor
  (lambda (channel-receive)
    (let loop ([work-queue (make-q)]
               [worker-channels-queue (make-q)])
      (displayln "[WORK-DISTRIBUTOR]: work-distributor is listening for messages")

      (display "[WORK-DISTRIBUTOR]: number of ready workers in queue: ")
      (displayln (q-length worker-channels-queue))

      (display "[WORK-DISTRIBUTOR]: number of works in queue: ")
      (displayln (q-length work-queue))

      (match (pk 'work-distributor-received-msg (get-message channel-receive))
        [('worker-ready . channel-worker)
         (displayln "[WORK-DISTRIBUTOR]: work-distributor received ready worker channel")
         ;; If there is no work for the ready worker, enqueue the worker,
         ;; otherwise give it work.
         (cond
          [(q-empty? work-queue)
           (enq! worker-channels-queue channel-worker)]
          [else
           (let ([some-work (deq! work-queue)])
             (put-message channel-worker (cons 'work some-work)))])
         (loop work-queue worker-channels-queue)]
        [('work . work)
         ;; ~work~ is always a pair of a thunk to be run and a return channel,
         ;; on which the result shall be put.

         ;; If there is no worker ready, enqueue the work, otherwise distribute
         ;; the work to a ready worker.
         (cond
          [(q-empty? worker-channels-queue)
           (enq! work-queue work)]
          [else
           (let ([channel-worker (deq! worker-channels-queue)])
             (put-message channel-worker (cons 'work work)))])
         (loop work-queue worker-channels-queue)]
        ;; On any other message raise a condition.
        [other
         (raise
          (condition
           (make-error)
           (make-message-condition "work-distributor received unrecognized message")
           (make-irritants-condition (list other))))]))))


(define worker
  (lambda (worker-index channel-receive)
    (let ([channel-worker (make-channel)])
      (displayln "[WORKER]: before worker message loop")
      (let loop ()
        ;; Report as ready. Give my own channel to the work-distributor to let
        ;; it send me work.
        (put-message channel-receive
                     (cons 'worker-ready
                           channel-worker))
        ;; Get messages sent to me by the distributor on my own channel.
        (match (pk 'worker-got-msg (get-message channel-worker))
          ;; If I receive work, do the work and return it on the channel-return.
          [('work . (thunk . channel-return))
           ;; Put the result on the return channel, so that anyone, who has the
           ;; a binding of the return channel, can access the result.
           (put-message channel-return (thunk))
           (loop)]
          ;; On any other message raise a condition.
          [other
           (raise
            (condition
             (make-error)
             (make-message-condition "worker received unrecognized message")
             (make-irritants-condition (list other))))])))))


(define pool-initializer
  (lambda* (#:key (parallelism (current-processor-count)))
    (let ([channel-receive (make-channel)]
          [scheduler (make-scheduler #:parallelism parallelism)])
      ;; start as many workers as are desired

      ;; TODO: PROBLEM: ~run-fibers~ blocks. So we need a new thread to run the
      ;; fibers in a non-blocking way. LOOKUP: How to start fibers without
      ;; waiting for them to finish?
      (call-with-new-thread
       (lambda ()
         (run-fibers
          (lambda ()
            (let loop ([index parallelism])
              (unless (zero? index)
                (display "[POOL INIT THREAD]: will spawn fiber ") (displayln index)
                (spawn-fiber (lambda () (worker index channel-receive)))
                ;; We do not need to spawn new fibers in the same scheduler later. The
                ;; fibers should stay alive for the whole duration the program is
                ;; running.
                (displayln "[POOL INIT THREAD]: fiber spawned")
                (loop (- index 1)))))
          #:scheduler scheduler)
         (displayln "[POOL INIT]: pool init thread returning")))
      (displayln "[POOL INIT]: will start work-distributor")
      (call-with-new-thread
       (lambda ()
         (work-distributor channel-receive)))
      ;; Return the channel for receiving work, so that the outside context can
      ;; make use of it when calling ~publish~ to publish work.
      channel-receive)))


(define publish
  (lambda (work-as-thunk channel-receive)
    ;; The result of the computation can be taken from ~channel-return~.
    (let ([channel-return (make-channel)])
      ;; Put work tagged as work on the receive channel of the work-distributor.
      (let ([work-message (cons 'work (cons work-as-thunk channel-return))])
        (display
         (simple-format
          #f "[PUBLISHER]: will publish the following work: ~a\n"
          work-message))
        (put-message channel-receive work-message))

      (displayln "[PUBLISHER]: work published")
      ;; Return the ~channel-return~, so that the outside context can get
      ;; results from it.
      channel-return)))


(define busy-work
  (lambda ()
    (let loop ([i 0])
      (cond
       [(< i 5e8) (loop (+ i 1))]
       [else i]))))


;; Try it!
(define c-rec (pool-initializer #:parallelism 2))
(define c-ret-2 (publish (lambda () (busy-work)) c-rec))
(define c-ret-1 (publish (lambda () (busy-work)) c-rec))
(get-message c-ret-2)
(get-message c-ret-1)

On my machine, this runs in sequence, rather than in parallel.

`race-until` test runs forever (1.2.0 + affinity patch, NetBSD 9 amd64)

I'm building 1.2.0 plus the patch to not use affinity on most platforms (thanks @aconchillo), under pkgsrc.

The test that prints assert run-fibers on (race-until 100) terminates: runs indefinitely. I have done a build and make check 4 or 5 times and the indefinite running (with CPU usage - I can hear the fan spin up) happens reliably.

I'm sure there is a bug, maybe fibers, maybe guile, maybe in the OS, but I have no idea where. But, this issue is that the test should instead always exit, failing if that's how it is, perhaps just by having a max loop count.

The tests prior to this pass, with very reasonable runtimes, almost all less than 20 ms.

On GNU/Hurd, 'time_units_per_microsec' is zero

Hello,

On GNU/Hurd, time_units_per_microsec is zero, leading to SIGFPE early on when running a Fibers 1.3.1 program:

Core was generated by `/gnu/store/mljlxj4gc8rii7891sza3vda7y06fyf8-guile-3.0.9/bin/guile --no-auto-comp'.
Program terminated with signal SIGFPE, Arithmetic exception.

warning: Unexpected size of section `.reg2/130' in core file.
#0  0x025f7b9d in __udivmoddi4 (rp=0x0, d=0, n=0) at ../../../gcc-11.3.0/libgcc/libgcc2.c:1026
1026    ../../../gcc-11.3.0/libgcc/libgcc2.c: No such file or directory.
[Current thread is 1 (process 130)]
(gdb) bt
#0  0x025f7b9d in __udivmoddi4 (rp=0x0, d=0, n=0) at ../../../gcc-11.3.0/libgcc/libgcc2.c:1026
#1  __udivdi3 (n=0, d=0) at ../../../gcc-11.3.0/libgcc/libgcc2.c:1300
#2  0x025f73e6 in run_event_loop (p=0x103e724) at extensions/libevent.c:218
#3  0x025f7523 in scm_primitive_event_loop (lst=0x37e2768, wakefd=0x42, wokefd=0x3e, timeout=0x2) at extensions/libevent.c:270
#4  0x0108e451 in ?? () from /gnu/store/mljlxj4gc8rii7891sza3vda7y06fyf8-guile-3.0.9/lib/libguile-3.0.so.1
#5  0x010f9785 in ?? () from /gnu/store/mljlxj4gc8rii7891sza3vda7y06fyf8-guile-3.0.9/lib/libguile-3.0.so.1
#6  0x01108536 in scm_call_n () from /gnu/store/mljlxj4gc8rii7891sza3vda7y06fyf8-guile-3.0.9/lib/libguile-3.0.so.1
#7  0x01079e0b in scm_primitive_eval () from /gnu/store/mljlxj4gc8rii7891sza3vda7y06fyf8-guile-3.0.9/lib/libguile-3.0.so.1
#8  0x010a5064 in scm_primitive_load () from /gnu/store/mljlxj4gc8rii7891sza3vda7y06fyf8-guile-3.0.9/lib/libguile-3.0.so.1
#9  0x0108e410 in ?? () from /gnu/store/mljlxj4gc8rii7891sza3vda7y06fyf8-guile-3.0.9/lib/libguile-3.0.so.1
#10 0x010f9785 in ?? () from /gnu/store/mljlxj4gc8rii7891sza3vda7y06fyf8-guile-3.0.9/lib/libguile-3.0.so.1
#11 0x01108536 in scm_call_n () from /gnu/store/mljlxj4gc8rii7891sza3vda7y06fyf8-guile-3.0.9/lib/libguile-3.0.so.1
#12 0x01079e0b in scm_primitive_eval () from /gnu/store/mljlxj4gc8rii7891sza3vda7y06fyf8-guile-3.0.9/lib/libguile-3.0.so.1
#13 0x0107fcc3 in scm_eval () from /gnu/store/mljlxj4gc8rii7891sza3vda7y06fyf8-guile-3.0.9/lib/libguile-3.0.so.1
#14 0x010d7a33 in scm_shell () from /gnu/store/mljlxj4gc8rii7891sza3vda7y06fyf8-guile-3.0.9/lib/libguile-3.0.so.1
#15 0x0109108d in ?? () from /gnu/store/mljlxj4gc8rii7891sza3vda7y06fyf8-guile-3.0.9/lib/libguile-3.0.so.1
#16 0x0107804e in ?? () from /gnu/store/mljlxj4gc8rii7891sza3vda7y06fyf8-guile-3.0.9/lib/libguile-3.0.so.1
#17 0x0108e410 in ?? () from /gnu/store/mljlxj4gc8rii7891sza3vda7y06fyf8-guile-3.0.9/lib/libguile-3.0.so.1
#18 0x010f9785 in ?? () from /gnu/store/mljlxj4gc8rii7891sza3vda7y06fyf8-guile-3.0.9/lib/libguile-3.0.so.1
#19 0x01108536 in scm_call_n () from /gnu/store/mljlxj4gc8rii7891sza3vda7y06fyf8-guile-3.0.9/lib/libguile-3.0.so.1
#20 0x01079a60 in scm_call_2 () from /gnu/store/mljlxj4gc8rii7891sza3vda7y06fyf8-guile-3.0.9/lib/libguile-3.0.so.1
#21 0x010f4d4b in ?? () from /gnu/store/mljlxj4gc8rii7891sza3vda7y06fyf8-guile-3.0.9/lib/libguile-3.0.so.1
#22 0x011270c3 in ?? () from /gnu/store/mljlxj4gc8rii7891sza3vda7y06fyf8-guile-3.0.9/lib/libguile-3.0.so.1
#23 0x010efa9b in scm_c_catch () from /gnu/store/mljlxj4gc8rii7891sza3vda7y06fyf8-guile-3.0.9/lib/libguile-3.0.so.1
#24 0x0107a7cd in scm_c_with_continuation_barrier () from /gnu/store/mljlxj4gc8rii7891sza3vda7y06fyf8-guile-3.0.9/lib/libguile-3.0.so.1
#25 0x010f4a1b in ?? () from /gnu/store/mljlxj4gc8rii7891sza3vda7y06fyf8-guile-3.0.9/lib/libguile-3.0.so.1
#26 0x011b4fb4 in GC_call_with_stack_base () from /gnu/store/qqgp6fd8xq55dc4gknvgk6d8wjvxxgn3-libgc-8.2.2/lib/libgc.so.1
#27 0x010ef94a in scm_with_guile () from /gnu/store/mljlxj4gc8rii7891sza3vda7y06fyf8-guile-3.0.9/lib/libguile-3.0.so.1
#28 0x01098840 in scm_boot_guile () from /gnu/store/mljlxj4gc8rii7891sza3vda7y06fyf8-guile-3.0.9/lib/libguile-3.0.so.1
#29 0x0804911a in ?? ()
#30 0x0156e290 in __libc_start_call_main (argv=0x103ede4, argc=8, main=0x80490c0) at ../sysdeps/generic/libc_start_call_main.h:23
#31 __libc_start_main_impl (main=0x80490c0, argc=8, argv=0x103ede4, init=0x0, fini=0x0, rtld_fini=0x6d90 <_dl_fini>, stack_end=0x103eddc)
    at ../csu/libc-start.c:392
#32 0x080491a8 in ?? ()
(gdb) frame 2
#2  0x025f73e6 in run_event_loop (p=0x103e724) at extensions/libevent.c:218
218     extensions/libevent.c: No such file or directory.
(gdb) p time_units_per_microsec 
$1 = 0
(gdb) p scm_c_time_units_per_second 
$2 = 1000

So I guess we should change:

microsec = data->timeout / time_units_per_microsec;

to:

microsec = (time_units_per_microsec == 0) ? 0 : (data->timeout / time_units_per_microsec);

Thoughts?

Memory leak on context switches

As discussed in the context of the Shepherd, Fibers 1.1.1 leaks memory on each context switch (!). The simplest reproducer (with Guile 3.0.8 or even 2.2.7) is this:

(use-modules (fibers)
             (fibers channels)
             ((fibers scheduler) #:select (yield-current-task))
             (ice-9 rdelim)
             (statprof))

(run-fibers
 (lambda ()
   (define channel
     (make-channel))
   (define leave-channel
     (make-channel))

   (spawn-fiber
    (lambda ()
      (sleep 10)
      (put-message leave-channel 'leave)))
   (spawn-fiber
    (lambda ()
      (let loop ()
        (yield-current-task)
        ;; (sleep 10)
        (loop))))
   (spawn-fiber
    (lambda ()
      (let loop ()
        (pk 'heap-size (assoc-ref (gc-stats) 'heap-size))
        (sleep 2)
        (loop))))
   (get-message leave-channel))
 ;; #:drain? #t
 #:parallelism 1                                  ;don't create POSIX threads
 #:hz 0)

This leak is proportional to the number of context switches: replace (yield-current-task) with (sleep 1) or similar, and everything's fine.

fibers depends on gnulib

I have the impression from testing the poll/libevent branch that gnulib is needed for AC_LIB_LINKFLAGS_FROM_LIBS but perhaps nothing more, in which case it would be nice to avoid the dependency. Alternatively, in README where it says one needs gnulib, give a parenthetical remark about why.

Ressource sharing / Asynchronous lock

In my search engine, the database works with contexts. I must limit the number of context since the underlying database can not open as much context as there can be concurrent fibers.

Right now the revelant code is the following:

(define (get-or-create-context! env)
  (with-mutex (env-mutex env)
    (let ((contexts (env-contexts env)))
      (if (null? contexts)
          ;; create a new context
          ;; XXX: the number of active context is unbound
          (apply context-open (cons (env-connection env) (env-configs env)))
          ;; re-use an existing context
          (let ((context (car contexts)))
            (env-contexts! env (cdr contexts))
            context)))))

As you can see right now, the number of context is unbound. So this will lead to a crash if there is too many concurrent fibers.

What I would like is that a fiber request a context and suspend until one is available if there is none available and the limit is reached.

How can I achieve that?

PF_INET or PF_INET6 for AF_INET6?

When trying to use (run-server server-file-download-handler #:family AF_INET6 #:addr 1 #:port 8083) I get the error

ERROR: In procedure bind: Address family not supported by protocol

Is this due to creating the socket always with PF_INET instead of switching to PF_INET6 for AF_INET6?

(let ((sock (socket PF_INET SOCK_STREAM 0)))

add link to manual to readme.md

the excellent online manual might be easier to find with a link added to the readme, maybe under contact info. on mobile i do not even see any link to the wiki in githubs navigation.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.