While the syntax parser is practically complete (and well designed too), the interpreter is functional but still needs quite some work. Currently pipelines with more than one step (as in, a regular statement) are ignored by the interpreter with a TODO
. There are several reasons for this, but the main reason is that there has been no clear implementation path on how to run pipeline steps in parallel.
There are several possible ways of parallelizing pipelines:
Pipes and fork
In this approach, you set up N-1 anonymous pipes, where N is the number of steps, fork N-1 times, and set the stdin/stdout of the child processes to use the anonymous pipes in order to form a chain. Then you poll the right-most step, or execute it directly if it a shell function. This is the traditional approach taken by many shells, including popular ones like Bash and Z shell.
The advantage of this approach is that it unifies the pipeline implementation; you always pipe multiple processes together, whether they are external commands, or forks of the shell. The biggest downside is that the subshells are disconnected from the scope they came from and cannot (easily) communicate with the parent. This can cause lots of confusion when learning how to write scripts in these languages, and so I decided that this approach was unacceptable for Riptide. Take the following example code:
def count 0
ls | grep foo | each {
set count (+ $count 1)
} > foo-files.txt
println "Found $count foo files"
While you might expect the above script to print out Found 10 foo files
if there are 10 files matching in the current directly, instead you will always get Found 0 foo files
, because the each
block has to operate on a clone of the count
variable. The original count
never gets updated.
Multithreading
Another approach is to use background threads or a thread pool to run multiple steps in a pipeline in parallel. "Script" steps are run in threads, while external commands are run in real processes. We then use abstractions inside these threads to make them behave as if they were normal processes. The current variable scope is then shared between these threads behind a mutex, so that they can all mutate their scope normally. Interestingly, this seems to be the approach that Fish Shell takes.
This lets you pipeline multiple script blocks together and be able to have the expected "normal" scoping rules for each, but has several disadvantages:
- How many threads should you limit execution to? If you do not cap threads, then the singular shell process could consume quite a bit of memory. If you do cap threads, then the pipeline will not be fully concurrent.
- Creating threads just to block on I/O is a bit wasteful in theory and is not the most optimal design.
- Threads are notorious for causing trouble when interacting with traditional shell concepts like signals and
exec
. You have to jump through a lot of hoops to avoid these gotchas.
Asynchronous tasks
This is the approach I propose we use in Riptide (see #11). We turn every potentially parallel bit of code into an asynchronous task, and then execute all tasks on the main thread using a single-threaded executor. This has many benefits:
- No additional threads are needed, so we don't have to worry about how they interact with the rest of the shell.
- Since most of the time pipelines are just waiting on I/O, this is a problem that async is well suited to solving.
- All steps in a pipeline can be executed concurrently (though not in parallel) on the main thread. We will switch between executing tasks naturally as bytes flow between the steps in the pipeline. External commands can also be awaited very easily.
This will require some refactoring, since the interpreter must essentially be asynchronous and be able to "yield" in the middle of a script and be resumed later. Typically this would be incredibly difficult to do using a tree-walk interpreter, but Rust's async/await makes this almost trivial by generating the complex state machine for us.
We could also accomplish the same thing by rewriting the interpreter into a JIT VM, but that would be quite a bit more work and would increase the complexity of the interpreter by quite a lot. I'd rather take tree walking as far as it can possibly go in order to keep the implementation simple, and only reconsider if it is an actual bottleneck.