sagebind / riptide Goto Github PK

View Code? Open in Web Editor NEW

27.0 3.0 3.0 885 KB

The Riptide Programming Language: Shell scripting redesigned.

Home Page: https://riptide.sh

License: MIT License

Rust 99.83% Makefile 0.17%

riptide scripting-language programming-language language interpreter shell

riptide's Introduction

Riptide

Ripide is a powerful scripting language and interpreter that seeks to accomplish two goals:

Provide a powerful, stream-oriented scripting language where the UNIX philosophy is first-class.
Provide an interactive and programmable console interface for using your system.

Status

Riptide is currently in rough development, so expect many things to not work correctly. I work on Riptide in my free time only occasionally, so it may take a while before it becomes ready to be used. I'm committed to finishing it one day though, so maybe check back in a year or so?

Documentation

Guide: For beginners
Reference: In-depth language overview

License

The Riptide project is licensed under the MIT license. See the LICENSE file for details.

riptide's People

Contributors

Stargazers

Watchers

Forkers

aakashofficial icodein darcyg

riptide's Issues

Windows support via ConPTY

I don't want to compromise Riptide's asynchronous design by introducing dirty hacks in order to support Windows, but it would be cool for Riptide to work as a native Windows shell, as there's a poor few number of shells for it.

The old Win32 APIs for console applications does not allow asynchronous operation, but the new ConPTY API in Windows 10 does. We could build on top of that in order to have first-class WIndows support.

Implement pipelines

While the syntax parser is practically complete (and well designed too), the interpreter is functional but still needs quite some work. Currently pipelines with more than one step (as in, a regular statement) are ignored by the interpreter with a TODO. There are several reasons for this, but the main reason is that there has been no clear implementation path on how to run pipeline steps in parallel.

There are several possible ways of parallelizing pipelines:

Pipes and fork

In this approach, you set up N-1 anonymous pipes, where N is the number of steps, fork N-1 times, and set the stdin/stdout of the child processes to use the anonymous pipes in order to form a chain. Then you poll the right-most step, or execute it directly if it a shell function. This is the traditional approach taken by many shells, including popular ones like Bash and Z shell.

The advantage of this approach is that it unifies the pipeline implementation; you always pipe multiple processes together, whether they are external commands, or forks of the shell. The biggest downside is that the subshells are disconnected from the scope they came from and cannot (easily) communicate with the parent. This can cause lots of confusion when learning how to write scripts in these languages, and so I decided that this approach was unacceptable for Riptide. Take the following example code:

def count 0

ls | grep foo | each {
    set count (+ $count 1)
} > foo-files.txt

println "Found $count foo files"

While you might expect the above script to print out Found 10 foo files if there are 10 files matching in the current directly, instead you will always get Found 0 foo files, because the each block has to operate on a clone of the count variable. The original count never gets updated.

Multithreading

Another approach is to use background threads or a thread pool to run multiple steps in a pipeline in parallel. "Script" steps are run in threads, while external commands are run in real processes. We then use abstractions inside these threads to make them behave as if they were normal processes. The current variable scope is then shared between these threads behind a mutex, so that they can all mutate their scope normally. Interestingly, this seems to be the approach that Fish Shell takes.

This lets you pipeline multiple script blocks together and be able to have the expected "normal" scoping rules for each, but has several disadvantages:

How many threads should you limit execution to? If you do not cap threads, then the singular shell process could consume quite a bit of memory. If you do cap threads, then the pipeline will not be fully concurrent.
Creating threads just to block on I/O is a bit wasteful in theory and is not the most optimal design.
Threads are notorious for causing trouble when interacting with traditional shell concepts like signals and exec. You have to jump through a lot of hoops to avoid these gotchas.

Asynchronous tasks

This is the approach I propose we use in Riptide (see #11). We turn every potentially parallel bit of code into an asynchronous task, and then execute all tasks on the main thread using a single-threaded executor. This has many benefits:

No additional threads are needed, so we don't have to worry about how they interact with the rest of the shell.
Since most of the time pipelines are just waiting on I/O, this is a problem that async is well suited to solving.
All steps in a pipeline can be executed concurrently (though not in parallel) on the main thread. We will switch between executing tasks naturally as bytes flow between the steps in the pipeline. External commands can also be awaited very easily.

This will require some refactoring, since the interpreter must essentially be asynchronous and be able to "yield" in the middle of a script and be resumed later. Typically this would be incredibly difficult to do using a tree-walk interpreter, but Rust's async/await makes this almost trivial by generating the complex state machine for us.

We could also accomplish the same thing by rewriting the interpreter into a JIT VM, but that would be quite a bit more work and would increase the complexity of the interpreter by quite a lot. I'd rather take tree walking as far as it can possibly go in order to keep the implementation simple, and only reconsider if it is an actual bottleneck.

Convert certain builtins into language constructs

We gain very little by making everything into a builtin and we lose some degree of convenience. The following things should be made first-class language constructs:

require: Define some sort of import syntax
if
cond
while
def: This currently uses hacks to be implemented as a builtin.
set: This doesn't exist yet because it can't be implemented as a builtin.
catch
throw

Website redesign

At some point when Riptide becomes usable we will need a snazzy home page that explains what Riptide is and links to the guide and documentation.

More sophisticated garbage collector

"Shared" values such as tables and strings are currently using naive reference counting, but cycles for circular tables are not checked at all and are not cleaned up until the end of the program. This might be fine for short scripts, but could cause long-running interactive shells to have continually growing memory usage if cycles are created.

Since preventing cycles in the first place is not an option (you're always allowed to assign a table as a field inside another table), we should consider using a more sophisticated garbage collection scheme to bust reference cycles once no longer in use. There are several options for doing this today:

Implement our own mark-and-sweep garbage collector.
Implement a cycle collector that works alongside the reference counting we already have (like Python).
Use an existing garbage collector library, such as
- bacon_rajan_cc
- broom
- gc
- gc-arena
- gcmodule
- moving_gc_arena
- rcgc

Strings can continue to use reference counting, or even something a little different (string interning) since they have different usage patterns and can't have cycles.

Editor extension architecture

As someone who used to be heavily involved in the development of projects like Oh My Fish! for many years, I've learned a lot about shells, package distribution, plugins, and themes, and various pitfalls that should be avoided. One such area that we must be careful with are ways of customizing and augmenting the terminal editor (prompt) itself.

When a "plugin" is simply a bundle of scripts that can hook into just about anything it means that plugin authors can do whatever they want, which is nice from one perspective, but it also causes a slew of other problems for the shell itself:

Since plugins can hook into anything, they are almost always stateful, and adding and removing plugins can often require you to exit all your terminals and re-open them to get everything working properly.
Plugins that hook into the same functionality can sometimes accidentally conflict with each other and cause things to break.
Plugins that hook into things like changing directories or rendering a prompt synchronously do work inside a callback, which can hamper user experience. It isn't uncommon for your shell to become painfully slow just by adding a few plugins and themes, especially on slower hardware.
Since plugins run in-line with the script code that powers the prompt, a rogue or broken plugin can render your shell completely unusable until the plugin is identified and removed.
"Plugins" become a bucket for basically any sort of script targeting your shell, which means that useful CLIs that could be standalone cross-shell tools instead become siloed and duplicated to take advantage of the ease of use of your plugin system.
Plugins must be implemented in script, and are almost impossible to test.

In attempt to avoid some of these, here's what I'd like Riptide's extensibility to look like:

Themes: Make themes a totally separate concept from everything else. Themes should be simple, declarative JSON files that configure things such as colors and prompt characters, and nothing else. Switching themes becomes easy and just always work.
Extensions: A means of extending the behavior of the terminal editor (prompt, display, hooks, etc) and not designed for creating commands (command should be standalone scripts).
- Extensions are run asynchronously and can't block user input.
- Basically everyone reimplements some sort of powerline-type prompt, so have it be a built-in shell feature. Extensions can then provide "prompt items" -- live strings that the shell will display along the prompt on an extension's behalf.

Implement terminal handling from scratch

Currently we are using termion, which has an OK API, but like almost every terminal-related library I can find, is inherently synchronous. This won't work for us, because interactive mode will be driven using a single-threaded event loop, and input events need to be read asynchronously.

Here's the plan:

Use the vte crate for the low-level parsing routines so that we don't have to implement everything ourselves.
Implement a higher-level asynchronous input event interface, which is used by the terminal editor.
Avoid stuff like terminfo, and focus on supporting the common modern terminals and emulators. If it turns out we need it this could be reconsidered, but generally terminals have standardized and its not worth the extra effort in order to support old terminals.

Make interpreter execution asynchronous and futures-based

The interpreter should be refactored to be fully asynchronous, using Rust futures and executors under the hood. We gain several huge wins by doing this:

At its core design, shell languages are merely tools for composing multiple chains of commands together asynchronously. Using tasks and futures to implement this will make the implementation more efficient, and also allow us to do new interesting things with the language that will put it squarely in the "next-generation" shells.
Instead of spawning new processes by default, background jobs can be executed inside a separate async task instead. This affords us much greater control, performance, and resource efficiency than using background processes.
We will finally have an implementation strategy for pipelines with minimal trade-offs. See #10 for details.

Since Riptide is still young, it is acceptable to switch to using nightly Rust for the interpreter to take advantage of any features we need to implement the async interpreter (though most notably async/await).

Refactor variable lookups to be more expressive, individual ops

For example, this should be allowed:

echo (require lang)->VERSION

but currently only string literals are allowed in a variable path.

Segregate history entries by "session"

It would be a useful distinction to be able to page through just the command history for a single session, or for history of all sessions.

To do this we have to figure out what a "session" is:

A random UUID generated whenever Riptide initializes. A meaningless value, but effective.
PID + timestamp: this should be unique enough that it will differ for all logical sessions, and has the advantage of being more meaningful.

To avoid duplicate storage, we could create a new table that is basically session_history and have command_history entries hace a foreign key pointing to the originating session. We could move pid from the command entries to the session entry in this scenario assuming a session can't span multiple PIDs.

Are forks the same session, or a new one?

Use rustyline for shell

The rustyline library implements history, word completion, multiple lines,
etc. I'm not sure if it would be a good idea to use it in a project that wants
to become it's own shell, but it would be a significant step up in features for
riptide currently.

I'm willing to do some of the legwork here, but I want to know if there's a
reason not to before I do.

[Security] Workflow ci.yml is using vulnerable action actions/checkout

The workflow ci.yml is referencing action actions/checkout using references v1. However this reference is missing the commit a6747255bd19d7a757dbdda8c654a9f84db19839 which may contain fix to the some vulnerability.
The vulnerability fix that is missing by actions version could be related to:
(1) CVE fix
(2) upgrade of vulnerable dependency
(3) fix to secret leak and others.
Please consider to update the reference to the action.

Integrated regex support

It would be very handy if regex was built into the language syntax itself as an embedded language mode. This would give you syntax errors to catch early if you have a mistake in your regex! (Of course also allow creating a regex from a string.)

I imagine the syntax looking something like this:

# If matched, $results is a list of capture groups. If not, then nil
$results = "my string" =~ re#\w+#

The tricky part is that forward slashes are often used as delimiters, but those mean directory separators far more often in a shell.

Contextual I/O

Make stdin, stdout, and stderr into normal context variables and make a "stream handle" a built-in type. Then we can do something like:

let @stdout = (open -w "message.txt") {
    println "Hello world!"
    fclose @stdout
}

All sorts of fun and useful programming abilities open up that many shells can't handle because you can't hold a file descriptor in a variable. Of course, it can also mean that users can leak file handles...

Debug logger doesn't play nice with non-blocking stderr

The built-in logger currently does not play nice with standard error. Since we make standard I/O handles non-blocking while the Riptide runtime is in control, the logger will rightfully get an EAGAIN or equivalent error from time to time when trying to write a log and the buffer is full. We should update the logger to either:

Retry failed calls with some sort of delay. Presumably stderr will become ready shortly when this happens, and a retry might be good enough.
Make the logger a separate asynchoronous task that uses async I/O to write logs, and have the log macros simply append to that task's current buffer.

The latter seems like the more correct solution, but might be trickier, since loggers are global resources.

Big tracking issue for implementing standard functions

For Riptide to be useful at all, a minimal list of builtin functions and constructs must be provided in order to be able to program properly. Below is a list of functions that should be implemented before an alpha release:

=: Test equivalence.
|: Create a pipeline.
and: Logical and.
apply: Function call argument unpacking.
begin: Execute expressions in sequence.
builtin: Explicitly call a builtin.
capture: Capture stdout and return it.
cd: Change directory.
command: Explicitly call a command.
cond: Multiple conditional branching.
def: Define a global function or variable (binding).
let: Define a local, lexically scoped binding.
env: Get, set, or list environment variables.
exec: Replace current process.
exit: Exit program.
foreach: Iterate over a list.
help: Some sort of help system.
if: Conditional branching.
list: Create a list.
not: Negate a boolean.
nth: Return nth item in list.
or: Logical or.
print: Write to output.
pwd: Print current directory.
read: Read from input.
source: Evaluate a script file.

(update this list as needed)

[Security] Workflow publish-docs.yml is using vulnerable action actions/checkout

The workflow publish-docs.yml is referencing action actions/checkout using references v1. However this reference is missing the commit a6747255bd19d7a757dbdda8c654a9f84db19839 which may contain fix to the some vulnerability.
The vulnerability fix that is missing by actions version could be related to:
(1) CVE fix
(2) upgrade of vulnerable dependency
(3) fix to secret leak and others.
Please consider to update the reference to the action.