GithubHelp home page GithubHelp logo

myst-lang / myst Goto Github PK

View Code? Open in Web Editor NEW
118.0 17.0 17.0 1.23 MB

A structured, dynamic, general-purpose language.

Home Page: http://myst-lang.org

License: MIT License

Crystal 99.33% Makefile 0.67%
dynamic object-oriented language programming-language crystal myst-lang

myst's Introduction

Myst

A structured, dynamic, general-purpose language.

deftype List
  def contains(element)
    each(&fn
      ->(<element>) { break true }
      ->(_)         { false }
    end)
  end
end

[1, 2, 3].contains(2) #=> true

Some of the high-level features include:

  • Pattern-matching everywhere. Assignments, method parameters, rescue clauses, etc.
  • Multiple-clause functions. All functions can define multiple clauses to adapt functionality based on inputs.
  • Value interpolations. Interpolate any value anywhere (even in method parameters) with the <> syntax.
  • Soft typing. Optional type annotations help control functionality without cluttering your code with conditionals.
  • Raise anything. Any value can be raised as an Exception and pattern matched in a rescue block.

Installation

NOTE: Due to Crystal's current limitations with compiling on Windows, Myst only works on macOS and Linux systems.

The recommended method of installing Myst is with mtenv, the official version manager for the Myst language. It is available here and has installation instructions available in the README.

For now, you will need to have Crystal installed to be able to install Myst. See Crystal's installation instructions for how to get started. Myst currently runs on Crystal 0.27.0.

Once Crystal and mtenv are installed, installing Myst is as simple as running mtenv install:

# Make sure mtenv is properly set up
mtenv setup
# Install v0.6.2 of Myst
mtenv install v0.6.2
# Make it the active version
mtenv use v0.6.2

With that, myst should now be installed and ready to go!

Help with improving these installation instructions, making pre-built binaries, and/or managing releases would be greatly appreciated :)

Get Involved

If you have an idea for a new feature or find a bug in Myst, please file an issue for it!. Using the language and finding bugs are the best ways to help Myst improve. Any and all help here is appreciated, even if that just means trying out the language for a day.

If you just want to get involved in the community, come hang out in our Discord server!. We're a pretty small community, so there's plenty of room for anyone that would like to hang out, even if it has nothing to do with Myst!

When I can, I try to label issues with help wanted or good first issue. help wanted is for issues that I'd really like external input on, while good first issue is for issues that can be implemented without too much knowledge of how the lexer/parser/interpreter works. On these issues, I try to explain as much as possible about what the solution looks like, including files that will need editing and/or methods that need implementing/changing. I hope that helps!

If you'd like to tackle something, but don't know where to start, please let me know! I'd love to help you get involved, so feel free to ask in the discord server or message me directly (faulty#7958 on discord, or email also works) and I'll do my best to get you up and running.

The Basics

If you would like to contribute to Myst's development, just:

  1. Fork it (https://github.com/myst-lang/myst/fork)
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create a new Pull Request (https://github.com/myst-lang/myst/pull/new)

Owning an issue

If you have a specific issue that you'd like to tackle, be sure to add a comment saying you're working on it so that everyone is aware! (currently, github doesn't allow for assigning issues to new contributers :/)

Also, "ownership" is not binding. It's just a way of saying "hey, I think I can work on this!". If you get stuck or need help moving forward, feel free to ask for help either on the issue itself, or in the discord server.

Most importantly, don't feel bad if you bite off more than you can chew. Issues can easily end up being far more complex than they appear at the start, especially on a project of this size. But don't give up! It's always hard to get started on an existing project, but I want to help and make it as easy as possible wherever I can!

myst's People

Contributors

atuley avatar bmulvihill avatar faultyserver avatar jens0512 avatar jepasq avatar minirop avatar zkayser avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

myst's Issues

Closures

Currently, this does not work:

sum = 0
[1,2,3].each do |e|
  sum  = sum + e
end

IO.puts(sum) #=> 0

The expected result (and actual result in Ruby) is 6, because sum should be a closured variable that can be referenced inside of the block.

Instead, sum is created as a new variable in the block's scope, leading to the result of 0, as the external sum variable has not been changed.

Implementing proper closures should resolve this.

Type resolution fails within instances.

Consider the following code:

deftype Foo
  deftype Bar
  end

  defstatic foo
    %Bar{}
  end

  def foo
    b = %Bar{}
  end
end

static    = Foo.foo
instance  = %Foo{}.foo

The expected behavior for this should be that both static and instance are assigned as instances of Foo.Bar. However, the actual result is that static is assigned, but the assignment for instance raises an error, saying:

Uncaught Exception: No variable or method `Bar` for Foo
  from `Bar` at /Users/jon/Sites/myst-lang/myst/test.mt:10:10
  from `foo` at /Users/jon/Sites/myst-lang/myst/test.mt:15:22

What this implies is that, for TInstances, the interpreter is checking the instance scope of Foo for the type Bar, while it only actually exists in the static scope. To resolve this, look ups for type instantiation when the current self is an instance should also check the static type.

Properly encapsulate types within the Interpreter

As noted in the description of 5628e61, all of the Types for the value classes and native libraries are currently implemented as constants.

For most applications this is okay, as the Interpreter will only be instantiated once. However, this causes issues for the tests and hinders the ability to embed the interpreter in other applications. Additionally, it's simply a bad practice, breaking the encapsulation of the Interpreter class.

I really don't know how to go about fixing this. I've spent 3 days thinking about how to move things around to allow the types to be dynamically allocated, but I haven't found a way to make those dynamically-created types work with the .type that each Value type has. I think the solution will involve removing .type and doing type resolution somewhere else.

As an example of the problem, take a look at these two tests:

it_raises %q(
  deftype Map
    def to_s
      "a map"
    end
  end

  raise {}
),                          "a map"

it_raises %q(
  deftype Map
    def to_s
      "different text"
    end
  end

  raise {}
),                          "different text"

(it_raises is just a macro that runs the code and expects the second argument in the error output after interpreting).

The second test here will fail, because the previous definition of Map#to_s will still exist when the second test starts, and because clauses are appended to functors rather than prepended, that first definition will be matched when raise calls to_s, leading to incorrect output.

FWIW, this only affects the native types and modules, as they are the only types/modules that are maintained as constants rather than defined dynamically.

expect_raises rescues AssertionFailure

Consider this method in single_spec.mt

    def expect_raises(&block)
      block()
      raise %AssertionFailure{@name, expected_error, "no error"}
    rescue
      # If an error was raised, the assertion passes.
    end

If we pass it a non-error producing block it will still pass the spec since the AssertionError will be rescued

Add `extend` for Modules within Types

Currently, Modules can be included into other Modules and Types using the include keyword. Including a Module into a Type will add the methods as instance methods. For example:

defmodule Foo
  def foo; :foo; end
end

deftype Bar
  include Foo
end

%Bar{}.foo #=> :foo

Including a Module into another Module just adds the methods to the Module, which can then be included elsewhere.

The way this works is through the ancestors list for Container types. When looking up definitions for Calls, the interpreter checks the current scope, then the scope of the receiver, and finally the scopes of each ancestor.

The complement to include is extend, and it should also be supported in Myst.

Instead of adding methods from a Module to the instance scope of a Type, extend adds them to the static scope. Because of this, extend does not really make sense for use in a Module (it would act exactly the same as include).

The above example using extend would look something like this:

defmodule Foo
  def foo; :foo; end
end

deftype Bar
  extend Foo
end

Bar.foo #=> :foo

Notice that foo is now a method on the type Bar, rather than the instances of that type.

Allow `!` as a method name

This issue has been marked as a "Good First Issue"! If you'd like to tackle this issue as your first contribution to the project, be sure to read the Get Involved section of the README for some help with how to get started.

Something I realized while writing this comment about Negation and Not is that ! is not currently allowed as a method name.

The current workaround is to use some other name for the method that Not calls on an object (most likely not), but that's a workaround, not a real solution.

Implementing this change is as simple as adding the NOT token type to the list defined in Token::Type.overloadable_operators

def self.overloadable_operators
[ PLUS, MINUS, STAR, SLASH, MODULO, MATCH, LESS, LESSEQUAL,
NOTEQUAL, EQUALEQUAL, GREATEREQUAL, GREATER]
end

Then, depending on the status of #35, the Not visitor will need to call this method instead of the workaround name of not.

Add magic constants: `__FILE__` and `__LINE__`

Magic constants are useful for many things. In particular, __FILE__ is great for making portable scripts, and __LINE__ can help debug tooling and reporting to end users.

A simple description of their behavior:

  • __FILE__ returns the absolute path to the file that it appears in. For example, in a file ~/test.mt, the constant might resolve to /home/user/test.mt. The path is always the file containing the constant, not affected by require statements or likewise.

  • __LINE__ returns an integer representing the line number of the current file that the constant appears on, starting from 1. For example, the constant in the following code will evaluate to 3.

1 + 2
3 + 4
__LINE__

The implementation of these constants can be easily done using the location property that all nodes already have. A new node type, MagicConstant can be introduced, that has a type property indicating what the constant is (probably :line for __LINE__, or :file for __FILE__).

Allow exception handlers on anonymous function clauses.

This issue has been marked as a "Good First Issue"! If you'd like to tackle this issue as your first contribution to the project, be sure to read the Get Involved section of the README for some help with how to get started.

For consistency with normal method definitions and blocks, anonymous function clauses using the do...end syntax should be allowed to define exception handlers:

func = fn
  ->(2) do
    raise "woops"
  rescue
    :rescued
  end
end

func(2) #=> :rescued

The semantics of this should already be in place in the interpreter. The only necessary change should be in the parser, and will likely just be an addition to parse_anonymous_function around this area to call parse_exception_handler:

if finish = accept(Token::Type::END)
func.at_end(finish.location)
break
end

For inspiration, you can look at how parse_optional_block implements the same thing:

if finish = accept(end_token)
return block.at_end(finish.location)
else
if end_token == Token::Type::END
block.body = parse_exception_handler
else
block.body = parse_code_block(end_token)
end
finish = expect(end_token)
return block.at_end(finish.location)
end

There's not too much more to this change. Be sure to add some parser specs around this new syntax. The exception handler specs for blocks should be fairly easy to adapt to use anonymous functions instead.

If you have any questions, feel free to ask in the #help channel in the discord server or message me directly. Good luck! :)

[RFC] Separate hard and soft pattern matching.

Summary

Split the current pattern matching syntax of =: into two distinct types: hard and soft. Hard pattern matching will be what the current behavior is: raising a MatchError when the match fails. Providing hard guarantees on data matches. Soft pattern matching would not raise an error, though the semantics of what should happen are not entirely clear.

This would be a syntax distinction, using a different operator to distinguish hard from soft matching.

Motivation

While working on some Spec library improvements, I've been trying to use pattern matching for conditionals, and I keep running into the issue where I want to continue execution of the body, even after a failed match. Instead, the current matcher implementation raises an error immediately and will start panicking up the stack.

I could just use equality, but then I don't get the benefits of destructuring and complex matching.

Suggested change

I think the current matching syntax could be split into two variants. The current behavior would be considered "hard matching", where an error is raised if the match fails at any point. A new behavior would be added using a distinct operator for soft matching, where instead of an error, execution continues after returning some falsey value.

I would suggest that soft matching keep the current match operator, =:, and hard matching be given a visually louder operator, such as =!. =! in particular looks very similar to =:, so it seems like a good fit, and the bang already has the connotation of being more "dangerous" or "careless".

Since the semantics of hard matching won't be different from what they are currently, I won't cover them again here. Instead, here are some examples of what I think soft matching should do:

def match(arg)
  when true =: arg
    :matched
  else
    :no_match
  end
end

match(true)  #=> :matched
match(false) #=> :no_match
list = [1, 2, 3]
[1, 2] =: list #=> nil

[a, b, c] =: [1, 2] =: list #=> nil

Remaining Questions

The return value of a failed soft match is debatable. Does it make sense to return nil? To support the semantics shown above with usage in when, the result needs to be a falsey value, so either nil or false. Returning false directly feels wrong, as a successful match does not return true directly, and I feel like boolean results should always be boolean results (just true or false, with no other possibilities). nil, on the other hand, implies failure without a boolean connotation.

Should variables created by a match still be created if the match fails? In the last example above, a, b, and c would all be created by a successful match. However, the match before it will fail, so it's match will always fail. Should these variables be created even in the case of a failed match? If so, what should their value be?

More importantly, if the match fails after the first element, a will have been created, so should it be destroyed? How could that be guaranteed, especially with non-locals as match variables (e.g. ivars).

Remove `yield` keyword in favor of always-explicit blocks

The yield keyword is often used/abused to avoid having to explicit state that a function accepts a block parameter. The current example of blocks shows this very well:

myst/examples/blocks.mt

Lines 40 to 46 in e44c3f2

def pairs(element1, element2, element3)
# `yield` acts exactly like any other function call, and all parameter syntax
# rules can be used.
yield(element1, element2)
yield(element1, element3)
yield(element2, element3)
end

In this example, the pairs function implicitly accepts a block parameter that gets yielded to throughout the function's body. However, looking only at the function clause, it's impossible to know that this function accepts a block at all, nor that not providing a block would cause an error.

Because blocks are literally functors (not potentially wrapped Procs, as in Ruby and others), their call syntax when specified explicitly would appear as a normal function call, so the code above could be re-written as:

def pairs(element1, element2, element3, &block) 
   # Calling the block is exactly like any other function call.
   block(element1, element2) 
   block(element1, element3) 
   block(element2, element3) 
 end

I propose to remove the implicit block style and always require blocks to be specified explicitly.

Some reasoning:

  • More explicit. There will never be a question of whether a function accepts a block (or more critically, requires a block). Especially with the ability to have multiple clauses for a function, being explicit about every parameter is extremely useful. Making it a requirement will help avoid unintended behavior.
  • Simpler syntax. Users will never have to chose a style to use or question what "best practice would be. Additionally, the yield keyword is removed and freed up for end-user usage.
  • No impact on internal usage. As the example shows, the only real change to the function is adding the parameter in the function head. Use of the block within the function body simply exchanges yield with whatever the block is named as. This takes advantage of how blocks are passed directly as functors.

Only raise `RuntimeError`s from the Interpreter

This issue has been marked as a "Good First Issue"! If you'd like to tackle this issue as your first contribution to the project, be sure to read the Get Involved section of the README for some help with how to get started.

With the upgrade to Crystal 0.24.1 (see #53), expect_raises now requires an exception type to be specified by the callers. Out of haste to get 0.3.0 out, I've just put the general Exception class in for most cases where no argument was given. However, this is not a good practice, and the expected exceptions should be better restricted to more relevant types.

I think this improvement has two components: First, find all instances of expect_raises(Exception) in the spec suite and change them to more appropriate types (e.g., ParseError for errors raised by the parser, SyntaxError for errors from the Lexer, etc.). I don't think there are many of these.

Most instances probably won't be resolvable by the above fix, so the second part of this improvement would be to find all instances of raise in the Interpreter source code and turn them into appropriate RuntimeErrors. Not only does this help improve the spec, it also means that users can catch these errors in their programs, which is a good goal.

Adding a __raise_runtime_error helper method to util.cr also seems like a good idea to help mitigate the risk of this issue coming up again in the future.

Here's an example of how one of these improvements could be made. Line 18 here raises a native exception when re-assigning the value of a Constant:

when Const
if current_scope.has_key?(target.name)
raise "Re-assignment to constant value #{target.name}."
else
current_scope.assign(target.name, value)
end
.

Instead of a native exception being raised, I think this should raise a RuntimeError with the same message. Assuming the __raise_runtime_error method from above is implemented, it might look something like this:

# ...
when Const 
  if current_scope.has_key?(target.name) 
    __raise_runtime_error("Re-assignment to constant value #{target.name}.")
  else 
    current_scope.assign(target.name, value) 
  end
# ...

Then, in the specs, there's this test:

it "does not allow re-assignment to constants" do
error = expect_raises do
parse_and_interpret %q(
THING = 1
THING = 2
)
end

that should change to expect a RuntimeError instead of a normal Exception.

Hopefully that's enough direction to get started. As always, feel free to ask any questions either here or in the discord server. I don't think this issue requires too much knowledge of the interpreter, but it's hard for me to tell.

String Interpolations

String interpolations are a great syntactic feature for cleaning up the stringification complex expressions. The most common syntax for interpolations seems to be the "#{}" construct within a string literal. For example, in Ruby, Crystal, and Elixir:

"hello, #{1 + 2 + 3}"
#=> "hello, 6"

There are some alternatives out there, though.

In JavaScript (ES6), the constructs are known as template literals:

`hello, ${1 + 2 + 3}`

This is pretty terrible, honestly, because it's not a normal string. So, when adding an interpolation to an existing string, the user would have to additionally change the quoting punctuation for it to work. Bleh.

C# has an interesting difference, where the interpolations support formatting a-la printf format strings:

Console.WriteLine($"Name = {name}, hours = {hours:hh}");

This is interesting, but again, the leading $ throws me off a bit. Also, the potential confusion of the expression being interpolated and the formatting then applied to it isn't worth the small advantage it provides.

All of these constructs share the same underlying semantics, though. The string literal is split at every point of interpolation (e.g., instance of #{}). Each expression is then evaluated and has string coercion (e.g. to_s in Ruby) performed on it. Finally, all of the parts are combined again to get the resulting string. Further details of the semantics differ between languages, but the basics are fundamentally the same. A code example of the re-write would look like this:

s = "hello, #{name}! You're #{age * 12} months old."
# would be re-written as:
s = "hello, " + (name).to_s + "! You're " + (age * 12).to_s + " months old."

I like the #{} syntax with double quotes the most, both for consistency with normal strings, as well as consistency with the languages that Myst most closely resembles (Ruby, Crystal, Elixir).

[RFC] Classes and type definitions

Right now, there is no way in the language to create a new data type, nor is there a way to add methods to native types (e.g., List).

In the previous iteration of the interpreter, the native types were treated as Modules in the Kernel that could be re-opened to add methods. This was okay, but meant that there wasn't a distinction between a module of methods and a module for a type. In fact, much like in Python, it was possible to do things like List.each([1,2,3]) do ... end as a replacement for [1,2,3].each do ... end. I didn't particularly like this, but it was a side-effect of how the semantics of function lookup and receivers were implemented.

In this new iteration, before simply copying over the old behavior, I'd like to rethink how type classes and type definitions can be implemented.

The default would be to essentially copy Ruby's implementation, where all types are Classes and new types are created with the class keyword:

class Array
  # This would re-open the native Array class.
end

class NewType
  # This would define a new type call NewType.
end

# By default, classes have a `.new` method that creates a new instance of the type.
list  = Array.new
thing = NewType.new

In Ruby, classes are actually just modules with some added default behavior: the default implementations of self.new, etc., and the ability to actually allocate memory. This, to me, is one of the more difficult parts of Ruby for new programmers (especially those coming from static languages) to understand: that classes are mutable objects with no real special meaning (but actually with a little special meaning, then with metaclasses and all that other fun stuff).

The class implementation makes sense in Ruby as part of the rest of the language's design and values. However, I don't think it fits particularly well with what I see as the design and values of Myst : having minimal implicit behavior while retaining terseness and clarity.

I'd like this thread to suggest new and different ways of defining types (either from a syntactic or semantic angle) that can better meet that design goal. At this point, there is no bad suggestion; any and all input is appreciated. I have a few ideas that I'll throw out here soon. Voting with ๐Ÿ‘ and ๐Ÿ‘Ž is also appreciated :)

Add bytecode generator

It's hard to write bytecode samples by hand, especially because the bytecode is not word-aligned, and instructions aren't always the same length, so most editors will not display it in a nicely-editable way. Also, writing bytecode by hand is hard in general.

With that, there should be a tool for generating a bytecode file from something higher-level. Since the functionality for extracting instructions from bytecode already exists, it shouldn't be too difficult to add support for the reverse, simply converting instructions into their binary form and writing them in sequence to a file.

A sample usage of this tool initially might look like this:

File.open("test.mtc") do |io|
  io << Instruction::Push.new(64)
  io << Instruction::Push.new(52)
  io << Instruction::Write.new
end

`self` is not always restored after exiting Calls

When a function calls raise, it uses Crystal's native exception handling to implement the rescue/ensure behavior and to pop up the callstack.

The problem with this is that the value of self is not restored to the proper state when a rescue is encountered.

For example, in the Spec library, run looks like this:

def run(&block)
  block()
  IO.puts(".")
rescue failure
  IO.puts(failure)
  exit(1)
end

Here, block can be any arbitrary function, and can manipulate the self stack. Something like this, for example:

run do
  [1, 2, 3].each do |e|
    raise "woops"
  end
end

Here, the block given to run is pushing the List [1, 2, 3] onto the self stack, but then the block for each is calling raise.

So, the self stack has the main module and the List on it. When raise happens, execution immediately starts panicking up the (native) callstack until the rescue in visit(ExceptionHandler) is found. If a rescue clause from the ExceptionHandler matches the exception, panicking stops and execution is allowed to continue like normal from that point.

The issue is that in the panicking up the callstack, the entries from the self stack weren't removed. So, in the above example, execution is continuing in the rescue block in run, but the List that was on the self stack is still there. Call lookup then continues like normal, but the IO module will fail to be looked up, because the interpreter will be looking on the List instance scope, rather than the main module, where it is actually defined.

In practice, this error ends up being exposed as, which is hard to understand:

Uncaught Exception: No variable or method `IO` for SingleSpec
  from `IO` at /Users/jon/Sites/myst-lang/myst/stdlib/spec/single_spec.mt:14:7
  from `==` at /Users/jon/Sites/myst-lang/myst/spec/myst/enumerable_spec.mt:25:22
  from `assert` at /Users/jon/Sites/myst-lang/myst/spec/myst/enumerable_spec.mt:25:5
  from `block` at /Users/jon/Sites/myst-lang/myst/stdlib/spec.mt:28:15
  from `block` at /Users/jon/Sites/myst-lang/myst/stdlib/spec/single_spec.mt:11:7
  from `run` at /Users/jon/Sites/myst-lang/myst/stdlib/spec.mt:28:10
  from `it` at /Users/jon/Sites/myst-lang/myst/spec/myst/enumerable_spec.mt:24:3
  from `block` at /Users/jon/Sites/myst-lang/myst/stdlib/spec.mt:41:5
  from `describe` at /Users/jon/Sites/myst-lang/myst/spec/myst/enumerable_spec.mt:19:1

I don't entirely know what the solution for this will look like. But I imagine it will involve saving the value of self at the beginning of visit(ExceptionHandler), then popping from the self stack until that value is on top again when an error is rescued.

Thanks @atuley and @zkayser for listening to me ramble and helping find the cause of this :)

Consider adding type contracting (protocols/interfaces/abstract types)

In dynamic, interpreted languages, the ability to define explicit contracts about the interfaces that values expose is often lost. Instead, these are traded for abstract definitions and/or implicit contracts that are often visually simpler, but less obvious when the contract is extended or re-implemented by another class/module.

The primary loss in these languages is the ability to enforce those contract requirements before they are encountered at runtime (if they ever are).

Another great side-effect of being able to enforce those contracts is predictable failure. That is, if a function requires that a contract be satisfied, then calls to that function with non-conforming arguments will fail before any of the function's body is executed. When the contract is not enforceable, it is common for functions to begin execution and fail midway through, resulting in a greater (often inferred, not necessary) dependence on exceptions and exception handling.

I think having Myst adopt a contracting system would be beneficial and worthwhile, and doesn't have to compromise readability or flexibility in the language. The roadmap for Myst already suggests explicit typing will be idiomatic in most cases, so the ability to specify more abstract behaviors follows suit to me.

Examples of languages that implement and enforce contracting systems:

  • Elixir has Protocols and Behaviours which are distinct, but both provide guarantees of working interfaces at a contract-like level.
  • Java has Interfaces which are almost identical to the contracts explained here, but lack some functionality (and are extremely verbose).
  • C++ has virtual functions that can be added to superclasses as a simplistic version of a contract.
  • Crystal has abstract types that are similar to virtual functions in C++.

Some considerations:

  1. What does the syntax look like? I'm somewhat partial to something like a use or implements directive at a module/class level for defining the contracts that the type should satisfy.
  2. How is this affected by re-openable classes? When is the contract enforced? After parsing finishes, or some time during interpretation (function call time?).
  3. Is anything more than module inheritance necessary? In most cases, contracts can be implicitly set up by defining and including modules with the contract definition into subtypes (see Ruby). This is effective, but still has most of the drawbacks of implicitness as mentioned earlier.

Simple Spec library

Before a lot of work goes into the standard library, it'd be nice to have an in-language Spec library that can be used to test its functionality.

All of the current specs are written in Crystal, and while that's fine and fast to run, it's really verbose for tests that need a lot of setup, and all of that setup has to be repeated for each test, since they all get new instances of the interpreter.

The basic functionality needed is:

  • define a test with a name.
  • make assertions on the truthiness of an expression.
  • output assertion failures as failed specs, including the name of the failing test.
  • return a non-zero exit status when a test fails.

Further features are all nice-to-haves.

The Spec library should be available as part of the standard library, but not included as part of the prelude (since production code shouldn't need it most of the time).

Full installation instructions for Linux

The instructions are mostly complete, but not that great, and still leave a lot of the exact details for the reader to figure out. It'd be a lot better to have complete, exact instructions for installation.

Here are the steps that I ran to install v0.1.0 on an Ubuntu system:

# Pull a stable source
wget https://github.com/myst-lang/myst/archive/v0.1.0.tar.gz
# Extract it
tar xvf v0.1.0.tar.gz
cd myst-0.1.0
# Build the executable
shards build
# Link it to somewhere in $PATH
sudo ln -s /path/to/bin/myst /usr/local/bin/myst
# Run a test file
echo "[1,2,3].each{ |e| IO.puts(e) }" > test.mt
myst test.mt

Some notes on what should be expanded:

  • Explain what sources are available for download. (Link to the latest release. Maybe also the current master source?)
  • Explain various linking strategies (ln -s to /local/bin, or add the directory to $PATH)
  • Explain that stdlib needs to be available one directory up from the executable.

Support `Negation` and `Not` nodes.

Negation and Not are probably the most common unary operations that can be done. The parser is already set up and tested to parse unary operations properly, but there is no support in the interpreter for them. For example, running the simple program 1 + -1, will yield a "Compiler bug:" error for the - on -1, saying that Negation nodes are not yet supported. A similar error is raised for a program like !true.

Adding interpreter support for Negations and Nots will just require adding visit methods for them. Ideally, the methods for both of these nodes, as well as all future unary operations, will live under src/myst/interpreter/nodes/unary_ops.cr, to compliment the existing binary_ops.cr at the same level.

To actually implement these methods, the interpreter should evaluate the value property of the node, then call the appropriate unary method on it and push the result to the stack. There is no defined standard for what this method should look like, but since Myst allows multi-clause definitions, I think using ! for Not and - for Negation is sufficient, even if it slightly degrades performance.

To do the call to the appropriate method, there isn't a nice convenience method available from the interpreter. However, there is NativeLib.call_func_by_name, which provides the same functionality for NativeLib methods:

def call_func_by_name(itr, receiver : Value, name : String, args : Array(Value))
func = itr.__scopeof(receiver)[name].as(TFunctor)
Invocation.new(itr, func, receiver, args, nil).invoke
end

Just copy-pasting these two lines and replacing name should be sufficient for calling ! or - on the value. Alternatively, if you can come up with a nicer way for doing these calls, that would be great :)

Also, be sure to add some specs for these new features! There are already specs for parsing them, but specs for their interpretation should live in spec/interpreter/nodes/unary_ops_spec.cr. If you need some inspiration for some specs to include, you can look at the parser specs, though these are a little hard to read, since they're inside of a macro.

Hopefully that's a sufficient explanation to get started, but feel free to ask any questions if you have them :) Also, if you want to tackle this, be sure to add a comment or a ๐ŸŽ‰ reaction so everyone can be aware!

Implement the Splat operator

Myst currently supports Splat Collectors in parameter definitions and List patterns. The inverse of the Splat collector is the simple Splat, where values are extracted from a single List into multiple individual values, rather than the other way around. The most common use case for this is in function calls. With the Splat operator supported, the following should work:

def func(a, b, c)
  :matched
end

args = [1, 2, 3]
func(*args) #=> :matched

This is a trivial example, but it shows how the semantics of the Splat operation. Another use case is when creating new Lists, where a List can be inserted flatly into another List:

list1 = [1, 2, 3]
list2 = [4, 5, 6]
list3 = [*list1, *list2]
list3 #=> [1, 2, 3, 4, 5, 6]

I think that Splat should be implemented as an overridable method so that custom types can work natively with the semantics of Splat shown above. For example:

deftype Foo
  def splat
    [1, 2, 3]
  end
end

f = %Foo{}
some_method(*f) # Equivalent to `some_method(1, 2, 3)`

In my mind, this matches the existing behavior of the other unary operators (! and -) and the rationale of the Enumerable module, where types can implement all of the methods from Enumerable by simply defining a single each method.

Another benefit of the overridable method is that Splat can be used in places other than Call arguments or List literals:

f = %Foo{}
result = *f #=> The semantics of this are entirely overridable by the user.

For the implementation, the parser already understands how to parse Splat operations on values, so the only work that needs to be done here is in the interpreter and nativelib. In particular:

  • List should define a unary splat method (see #44 for why * won't work yet). It will just return self.
  • When an argument in a Call is a Splat node:
    • call the splat method on the object.
    • assert that the return value of the method is a List.
    • iterate the List, adding each element to the args array for the Call.

Flesh out initial syntax specification.

The SYNTAX.md document currently outlines most of the core syntax in Myst, but it doesn't do much in terms of explanation, and lacks a few things (conditionals as a suffix, etc.).

Classes/modules will probably not make it into this first version.

Other things that need clarification:

  • Function call syntax. There are very few examples of calling a function with/without arguments.
  • Default argument values vs. guard clauses in function heads. The current syntax has some ambiguity.
  • Key interpolation for maps.
  • Multiple variadic decomposition. e.g. [1, *_, b] vs [1, *_, 2, *_, 3]. This may not be supported, at least variadically. Single variadic decomposition will definitely be supported, though.

More may be added here as the syntax gets locked down.

Allow methods to match on block argument structures.

In Ruby, blocks are special-cased with how their parameters are matched. Unlike any other method in Ruby, blocks can be given more or fewer arguments than they are defined to accept, and the interpreter will be okay with it:

[1, 2, 3].each{ |a, b, c| puts a }
[1, 2, 3].each{ puts "hi" }

This is really more like javascript, and while I've definitely used/abused this feature before, I don't particularly like its implicit nature from a design perspective.

I would prefer allowing methods to match the structure of the block argument they are given, and define different clauses accordingly. Crystal supports this in a way that's fairly clean, but more verbose and restrictive than I think I would like:

def transform_int(start : Int32, &block : Int32 -> Int32)
  result = yield start
  result * 2
end

transform_int(3) { |x| x + 2 } #=> 10
transform_int(3) { |x| "foo" } # Error: expected block to return Int32, not String

In #57, I've already started describing method signatures using a more terse version of this syntax. Expanded into an actual method definition, it might looks something like this:

def map(&block(e))
  [1, 2, 3].each do |e|
    block(e)
  end
end

def map(&block(e, i))
  [1, 2, 3].each_with_index do |e, i|
    block(e, i)
  end
end

This would define two different clauses that each expect a different block structure, one accepting an index parameter, and the other not. Usage wise, this wouldn't look any different from Ruby:

map{ |e| IO.puts(e) }
map{ |e, i| IO.puts(e, i) }

Maybe this is unnecessary or unhelpful, but I've been thinking about how to more explicitly implement the variadic matching of Ruby blocks, and this is the best I've come up with.

Add block parameters as variables in local function scopes

Currently, when a function accepts a block parameter, it is purposefully not added as a variable in the local function scope. This was done in an effort to "be consistent" with treating block parameters like local functions, but I think it's more confusing than it needs to be.

For example, right now, capturing a block parameter within a function looks something like this:

def capture(&block)
  @captured_block = &block
end

While I like that the capture is explicit in this, it's easy to forget that it's needed and instead end up calling the block:

def capture(&block)
  # block would end up being called here.
  @captured_block = block
end

Especially when coming from Ruby, this would be confusing. Since Ruby doesn't support directly calling Procs (it uses block.call), the block would be captured, but in Myst it would be called.

I think a good compromise for this problem is to add the block parameter as a local variable in the function scope. This would mean that both of the above examples have the same semantics: the block is captured, not run.

Running the block would then require adding parentheses (by forcing the Var into a Call, see #48 for details), which is also more explicit, without adding much overhead. I find myself adding parentheses anyway (even with no arguments) to make it clear that the block is being called, so this doesn't feel like a burden on the user:

def capture(&block)
  @capture1      = &block
  @capture2      =  block
  @block_result  =  block()
end

Date/time literals

Something that I think a lot of big languages today are missing is Date literals. ISO 8601 provides an un-ambiguous syntax for specifying dates, though the individual components of the date could be construed as binary operations on integers, or map entries with integer values.

To ensure there is no ambiguity, date literals could be prefixed with a D (or DT explicitly for datetimes):

new_year = D2017-01-01
now = DT2017-06-26 4:15:30Z-05:00

Some disambiguation between optional timezone specifiers and map entry definitions is probably needed, though key interpolation would be used anyway, since a DateTime is not a symbol.

Mandatory vs. optional parentheses for function calls

I'm a bit torn over whether or not to require parentheses to designate function calls.

Background and similar languages

  • Ruby has always treated parentheses as always-optional.
  • Crystal only requires parentheses for definitions of functions that take parameters
  • Elixir changed in 1.4 to require parentheses for zero-argument calls, but not for multi-argument calls.

Myst is not really a message-passing language. As such, there's a bit of a distinction between a member access and a function call. Right now, a member function call looks like two separate nodes: an AccessExpression for the object.member part, and a FunctionCall for the member() part.

This complicates things a bit. FunctionCall has to know about object as the receiver in the object.member() structure. As a separate node, it can't really know that without some funky stack stuff and guessing from the interpreter/hints from the function being called.

I dislike the inconsistency that Elixir had where parentheses were optional for zero-argument calls. And still dislike that they are not required for calls with arguments. It's better, but still confusing to read if you are unfamiliar with the context.

I currently see two options, neither of which are really ideal to me.

Option 1

Always require parentheses for function calls.

This is the current behavior. In this case, those two node instances can be merged to a single MemberFunctionCall that has the entire context of receiver.member(). This also still allows for chaining like something.split().sum().

In cases like that, however, I specifically appreciate the ability to omit parentheses (something.split.sum looks a lot better in my eyes).

Option 2

Make parentheses fully optional.

Doing so would cause member accesses that result in functors to always be evaluated immediately. This means obtaining a reference to a member function would require additional syntax (currently ref = object.member_fun will assign ref as a bound functor that can later be called as ref(args...). This doesn't work if the functor would be immediately evaluated).

Something I really like from C/C++ is the ability to plainly obtain function references by writing their name with no call notation. This does, however, make something.split.sum immediately work.


I would really like a 3rd option that compromises between the two. I might be okay with more syntax to obtain references. I am not okay with requiring bare parentheses everywhere.

Referencing the current self with `self`.

self is already allowed by the parser and parsed into an appropriate Self node in the AST.

Implementing this feature would simply involve adding a visit(node : Self) to the interpreter that just pushes the current value of self (available through the current_self method) onto the stack and then returns.

There are a number of good use cases for self that should be tested as part of this. Tests for these would be added under a new file, spec/interpreter/nodes/self_spec.cr.

  • self as the return value of a method should return the instance the method operates on:
deftype Foo
  def it
    self
  end
end

%Foo{}.it #=> this instance of Foo
  • self as the receiver of a method should act as if the self wasn't there (e.g., the method call acts as it would without the self receiver).
deftype Foo
  def foo; :hi; end
  def foo_proxy
    self.foo
  end
end

%Foo{}.foo_proxy #=> :hi
  • self as an argument to a method should evaluate to the self at the call site, not the receiver of the method.
deftype Foo
  def put_self
    IO.puts(self)
  end

  def to_s
    "hello"
  end
end

%Foo{}.put_self #=> "hello"

Add version info and `-v` flag to CLI

Since there have only been 2 real releases of Myst and few enough people that everyone has upgraded already, there hasn't been much of a need for version information in Myst.

But, it would still be nice, especially as the language grows, to see a VERSION constant somewhere that also gets exposed through the CLI.

I also think the CLI version output should include the versions of Crystal and LLVM and the target platform that the executable was built with. Maybe this information would be hidden behind a --vv flag for "verbose" mode.

Warn on usage of an Underscore.

From the documentation on the Underscore node in the AST:

An underscore-prefixed identifier. Underscores are specifically intended to be used as ignored values (values where an assignment is needed to be semantically correct, but where the value is not used).

To "enforce" this, any reference to an Underscore should give a warning explaining that the Underscore should be renamed as a Var instead for better clarity.

As an example, the following:

_a = 10
b = _a

should give a warning along the lines of:

Reference to an Underscore value, `_a`.
Underscores indicate that a variable should not be referenced.
If this reference is intentional, consider removing the leading `_`.

Allow multiple clauses for native methods

I don't have time to flesh out this issue (I also don't yet have a solution), but I noticed there is currently no way to define multiple clauses for a native method. Using NativeLib.method multiple times with the same name would cause a compile error, because the parameter definitions are all the same, so there is a collision.

Like I said, I don't really have a solution, but this should be addressed to help with #35 and partly #43.

Upgrade to Crystal 0.24.1

Crystal 0.24.0 is currently a pre-release, though in actuality the content of the release has been static for a while.

I think it'd be a good idea to get ahead of the curve and upgrade the Crystal version before the next minor release of Myst. There are a few features that I'm really looking forward, but mainly better stacktraces, including locations on macOS and this bugfix for Constant manipulation inside a static method (14a79e0 has more details) :)

As for an upgrade path, I know there are a few TODOs in the codebase that reference Crystal 0.24, so those should be evaluated with this (though they won't necessarily be resolved by it). Also, expect_raises in the specs will now require an exception type to be given. In most cases, these should already be given as ParseError or SyntaxError, but may need to be added.

A successful resolution to this issue will be compiling with --release on Crystal 0.24.0.

Standardize error structures.

One of the "marketed features" of Myst is that any value can be raised as an exception. This is great for user-land code, letting raise act like a more powerful break, but can be a nightmare for consistency and extendability in libraries and frameworks.

I see a few aspects to this:

  • Add core error types to the native library. This would include NoClauseError, MatchError, ParseError, etc.
  • Add standard library error types where appropriate. This would include things like FileNotFound, DivideByZero, etc. (these are examples. Actual errors will probably differ).
  • Refactor existing code to use these types (see #98 for a start).

Additionally, an important part of rescuing errors is knowing where the error came from. With normal values as exceptions, this won't always be possible (e.g., integers don't have a callstack attached to them). For this, I have two ideas:

  • Add an optional second parameter for rescue that captures cause information separately. This would be similar to the result of the caller method that Ruby provides. This feels like it would need a distinguishing syntax to avoid confusion with passing multiple values:

    def foo
    rescue error, cause
      # `cause` includes the callstack of the error and location info.
    end
    
  • Assume a standard method name for getting callstack info in a rescue. This is implicit and I don't particularly like it, but avoids the confusion from above:

    def foo
    rescue error
      cause.callstack.size
      # `cause` is basically a magic constant like `__LINE__`.
    end
    

I'm not super excited by either of those, but I'm leaning towards the first, at least as a temporary implementation.

Allow operators as method names

The parser and native library already implement +, -, *, / as Calls to methods on objects. This should also be allowed in definitions within the language. For example:

deftype Foo
  def a; @a; end

  def initialize(a)
    @a = a
  end

  def +(other : Foo)
    @a = @a + other.a
  end
end

f1 = %Foo{1}
f2 = %Foo{2}

f3 = f1 + f2
f3.a #=> 3

This is a contrived example, for sure, but shows how the syntax can be used. This is particularly helpful for things like DateTimes or numeric replacements (e.g., BigInteger).

The only change needed for this should be allowing the operators as method names. The parser should already handle the infix calls properly.

The operators that should be supported are: +, -, *, /, %, <, <=, ==, !=, >=, >, [], []=, and =: (see #11). Handling unary operators (-, !, *) can come later, since the interpreter doesn't currently support these anyway.

Raise an error when rebinding an already-bound function parameter

Take this example:

def equal(a, a)
  IO.puts("arguments are equal")
end

def equal(_, _)
  IO.puts("arguments were not equal")
end

equal(2, 3)

The intended behavior of equal is to test and output the equality of the two arguments given to it. However, the actual behavior will always match the first clause, because a will simply be re-bound to the value of the last parameter it is matched with.

The proper resolution to this is to use value interpolation in the parameter:

def equal(a, <a>)

This is preferable to a condition in the clause body itself for both clarity, efficiency, and flexibility (because the match will fail if the arguments aren't equal, clause lookup will continue and attempt the next clause. Using a condition in the body would not fall through in this way).

This differs from Elixir, which will implicitly remember which parameters have already been bound and convert future references to them into interpolations. This "implicit magic" is something I want to specifically avoid, hence the dedicated interpolation syntax.

As such, I think that re-use of a parameter name anywhere in the same function head should raise an error when it is encountered (parse-time would be even better). This will explicitly avoid bugs arising from ambiguous or mistyped function heads.

Make `Value` a base type, rather than the only type.

Currently, Value is a single container for all types, acting like an untyped union, which makes adding new types difficult, and slows down performance for simple operations like addition, subtraction, etc.

A better way to do this would be to implement Value as an abstract base class, and then have types like Numeric, Array, String, etc. extend that abstract class. This should avoid having to do type switching for the left-hand-side argument of binary operations, and allows each type to specify it's valid operations.

Implement function capturing with `&`

As described in #42, treating functions as first-class citizens requires more than just having them as values. With anonymous functions, it's easily possible to create a function and store its value in a variable. Manipulating that variable, though, because any reference to the variable would execute the function.

Additionally, there's no way to get a reference to a function after the point of definition for the same reason: attempting to reference the function will end up calling it instead.

#42 proposed the & prefix as the way to capture references to functions instead of calling them. An example with a normal function might look like this:

def foo(a, b)
  a + b
end

bar = &foo
bar(1, 2) #=> 3

This syntax would also be used to pass functions as the block parameter for another Call. For example:

def foo(&block)
  block(1, 2)
end

def bar(a, b)
  a + b
end

foo(&bar) #=> 3

One consideration to make is how a function could be passed as something other than the block argument. With the above, this wouldn't be possible. A workaround could be to capture the function into a variable in a separate statement, then pass that variable to the function, but this could use some more thought.

Support Call syntax on any expression

Right now, only identifiers can accept parentheses suffixes to create Calls. Using any other expression will cause a ParseError saying that the parentheses are unexpected:

@thing = fn
  ->(a) { IO.puts(a) }
end

@thing()
ParseError at /Users/jon/Sites/myst-lang/myst/test.mt:5:7
  Expected one of SEMI,NEWLINE,EOF but got LPAREN

I think that any expression should be able to be converted into a Call by adding parentheses afterwards. Some other examples I would expect to work are:

get_func()(1, 2, 3) # Where `get_func` returns a functor
(get_func || @default)(1, 2, 3)
(func = &capture)()

Local scope overrides not maintained after raise/rescue

It looks like #65 did not fully resolve the issue with restoring the scope stack after a raise.

I haven't found a minimal code sample for this issue yet, but this is (roughly) the code I have now:

@specs = [] # List of SingleSpec instances
failures = []
@specs.each do |spec|
  when result = spec.run
    IO.print(".")
  else
    failures += [spec]
    IO.print("F")
  end
end

There's more beneath this code to support it, but the only important part is that spec.run will raise and rescue errors when there is a spec failure.

Running this code with a SingleSpec that raises an error yields this output:

Uncaught Exception: No variable or method `to_s` for SingleSpec
  from `to_s` at /Users/jon/Sites/myst-lang/myst/stdlib/enumerable.mt:24:23
  from `+` at /Users/jon/Sites/myst-lang/myst/stdlib/enumerable.mt:24:15
  from `each` at /Users/jon/Sites/myst-lang/myst/stdlib/enumerable.mt:22:5
  from `join` at /Users/jon/Sites/myst-lang/myst/stdlib/list.mt:9:11
  from `+` at /Users/jon/Sites/myst-lang/myst/stdlib/list.mt:9:5
  from `failures` at /Users/jon/Sites/myst-lang/myst/stdlib/spec.mt:40:9
  from `block` at /Users/jon/Sites/myst-lang/myst/stdlib/spec/single_spec.mt:20:17
  from `run` at /Users/jon/Sites/myst-lang/myst/stdlib/spec.mt:37:26
  from `each` at /Users/jon/Sites/myst-lang/myst/stdlib/spec.mt:36:12
  from `run` at /Users/jon/Sites/myst-lang/myst/spec/myst/spec.mt:19:18

This took me a while to properly track. Surprisingly, the stacktrace is accurate, but is reached by some interpreter internals, not by any user-end code - something that should also be addressed at some point.

The gist of the error is that after running a failed Spec, the next each iteration will not have the local scope override for current_scope, which will cause it to resolve to the scope of @specs (the receiver of the each call, thus the value of self).

A mock implementation of to_s on SingleSpec showed a "No variable or method spec for [...]" error, meaning that the value of self during the next iteration is the list being iterated, so that's correct. Attempting to reference any local variables from the closure's parent (such as failures in the code above) raises the same error.

My best guess at what's causing this error is that the local scope overrides for blocks are not being kept after a raise or after some nested call.

Standard library entries to match Getting Started guide

This issue has been marked as a "Good First Issue"! If you'd like to tackle this issue as your first contribution to the project, be sure to read the Get Involved section of the README for some help with how to get started.

This is also a very large issue with many parts. It is meant to be tackled piece-by-piece. A contribution that implements even one of these methods will gladly be accepted.

For the past few weeks, I've been working on a Getting Started guide for Myst. In that guide, I make some references to standard library functions that don't actually exist yet. The goal of this issue is to list all of the functions that are referenced that do not yet have an implementation so that they can be added one by one. Since the guide is not finished, entries may be added to this issue in the future.

Additionally, there are some other functions that I think would be good additions for the next release.

Some of these functions will be part of the native library (written in Crystal), while others will be part of the standard library (written in Myst). Each entry here has a small description of what it should do. For the most part, these follow the versions from Ruby, so feel free to look there for inspiration.

Native library

  • Map#+(other : Map)

    Add two Maps together into a new Map. If a key exists in both Maps, the value from the second map should be used.

  • List#-(other : List)

    Return a new List with only the entries from the first List that do not exist in the second.

  • Integer#<(other)

    Return true if the integer is numerically less than other. Return false otherwise.

  • Integer#<=(other)

    Return true if the integer is numerically less than or equal to other. Return false otherwise.

  • Integer#>=(other)

    Return true if the integer is numerically greater than or equal to other. Return false otherwise.

  • Integer#>(other)

    Return true if the integer is numerically greater than other. Return false otherwise.

  • Float#<(other)

    See Integer#<.

  • Float#<=(other)

    See Integer#<=.

  • Float#>=(other)

    See Integer#>=.

  • Float#>(other)

    See Integer#>.

  • List#<(other : List)

    Return true if the List is a proper subset of other. That is, if every element of the List appears in other, but other also contains at least one more value. Return false otherwise.

  • List#<=(other : List)

    Same as List#<(other), but also return true if there are no extra elements in other.

  • Map#<(other : Map)

    Return true if the Map is a proper subset of other. That is, if every key of the Map appears in other, but other also contains at least one more key. Return false otherwise.

    The values of the Map are not important. Only the keys determine the subset.

  • Map#<=(other: Map)

    Same as Map#<(other), but also return true if there are no extra elements in other.

  • IO.print(string : String)

    Print the given string to the output of the Interpreter, with no formatting or conversions done. The argument should be expected to already be a String.

  • List#size

    Return the number of elements in the List as an Integer.

  • Map#size

    Return the number of elements in the Map as an Integer.

Standard library

  • String#size

    Return the number of characters in the String as an Integer.

  • String#empty?

    Return true if the String contains 0 characters. Return false otherwise.

  • List#empty?

    Return true if the List contains 0 elements. Return false otherwise.

  • Map#empty?

    Return true if the Map contains 0 elements. Return false otherwise.

  • Enumerable#all?(&block)

    Pass each element of the enumerable to block, returning true if all elements in the enumerable causes block to return a truthy value.

  • Enumerable#any?(&block)

    Pass each element of the enumerable to block, returning true if any element in the enumerable causes block to return a truthy value.

  • Enumerable#find(default=nil, &block)

    Iterate the enumerable, passing each element to block. The return value should be the first element for which the block is truthy. If no elements cause a truthy return value, return the default value instead (which itself should default to nil).

  • Enumerable#min

    Iterate the enumerable, finding the element with the lowest value as determined by <.

  • Enumerable#max

    Iterate the enumerable, finding the element with the highest value as determined by >.

  • Enumerable#select(&block)

    Returns an array containing all the elements of the enumerable that cause block to return a truthy value.

  • Enumerable#sort

    Sort the elements of the enumerable with the ordering determined by the <= method for each element. To start with, the sort can be something simple like insertion sort, eventually replaced by a proper hybrid quick sort.

  • Enumerable#size

    Return the number of elements in the enumerable, determined by incrementing a counter for each element yielded by #each.

  • Enumerable#to_list

    Return a List containing all the elements of the enumerable.

  • Enumerable#reduce(&block(acc, elem))

    For every element in the enumerable, call block with the result of the previous call and the current element as arguments. The first element is used as the initial value of the accumulator; it does not get a separate call to the block.

  • Enumerable#reduce(initial, &block(acc, elem))

    Same as Enumerable#reduce(&block(acc, elem)), but also specifying an initial value to use for the accumulator. In this version, the first element will get its own call to the block.

  • Int#times(&block)

    Call block as many times as the value of this integer. For example, 3.times(...) would call block 3 times.

Implementation

Obviously, there are a lot of things to add. I don't expect that all of these would be added in a single PR. Tackling them one at a time is fine by me.

Adding functions to the native library can be done in the src/myst/interpreter/native_lib folder. All of the types listed above should already exist there. Use NativeLib.method to write the implementation for a function, then call NativeLib.def_instance_method or NativeLib.def_method to add it to the appropriate module. There are plenty of examples in the code already that should help you out.

The standard library exists in the stdlib folder at the top level. Look at the existing entries (specifically, Enumerable) to see how new functions can be added.

Also, please try to add a descriptive comment to each method describing the arguments that it accepts, the values that might be returned and a description of what the method does. The Enumerable module) has some good examples of these comments.

If you'd like to pick up one or more of these functions, please comment below saying which function you would like to implement so that others know they are taken.

As always, if you have any other questions, feel free to ask them here or let me know directly so I can help out :) Good luck!

Regex literals

Regex literals are an important part of modern scripting languages. Having a concise syntax for instantiating patterns and performing matches is important.

For actually performing matches, I think using the existing pattern-matching syntax would be nice:

/(?<identifier>[a-zA-Z][a-zA-Z0-9]+)/ =: "   matchedvalue "
puts identifier #=> matchedvalue

This syntax would only work with named subgroups, but seems incredibly clear and concise in terms of extracting values. Compare this to the Ruby equivalent:

matches = /(?<identifier>[a-zA-Z][a-zA-Z0-9]+)/.match("   matchedvalue ")
puts matches["identifier"] #=> matchedvalue

Crystal has direct support for PCRE regexes, so the actual matching aspect should be simple to implement.

This needs more thought for unnamed captures, interpolation, and other edge cases, but these are my initial thoughts.

Block closures do not capture the value of `self`

When using blocks, the expected behavior is for the scope of the block to refer to the lexical scope in which it was defined. This is how things like the following can work:

sum = 0
[1, 2, 3].each{ |e| sum += e }
sum #=> 6

However, because blocks do not currently capture the value of self, the same code could not be written using instance variables:

@sum = 0
[1, 2, 3].each{ |e| @sum += e } #=> will probably raise an error
@sum #=> 0

I believe this is a bug that should be fixed to work similar to Ruby, where the block captures the value of self as well:

# in Ruby
@sum = 0
[1, 2, 3].each{ |e| @sum += e }
@sum #=> 6

Implement anonymous functions

Anonymous functions are useful for quickly defining behaviors without creating entries in the local namespace. Most commonly, they are passed as arguments (normally as the block argument) to other functions to define additional, custom behavior. Anonymous functions are also critical for really treating functions as first-class citizens.

Every language that I can think of that treats functions as first-class citizens has a way to define anonymous functions:

  • Ruby and Crystal have the lambda keyword (or the newer "stabby lambda" syntax) is used:
add2 = ->(num) { num + 2 }

add2.call(3) #=> 5
  • Elixir uses fn, or the very-terse &():
add2 = fn(num) -> num + 2 end
# or 
add2 = &(&1 + 2)

add2.(3) #=> 5

Even Java introduced anonymous functions in Java 8 SE:

// ... lots of boilerplate
Adder add2 = num -> num + 2;

add2.apply(3) //=> 5

My issue with a lot of these implementations is that the resulting function is really a function wrapper (Proc in Ruby/Crystal, Function object in Elixir, some mess of interfaces in Java), which means calling the stored function requires special methods like .call(), .(), or .apply(). That's all fine and grand, but feels like a concern that the language should be able to handle internally, rather than relying on the user to implement.

That feeling is shown by how Myst handles captured block parameters - the block is simply created as a function in the local scope, so it can be called like any other function. For example:

def foo(&block)
  block(1, 2)
end

foo{ |a, b| a + b } #=> 3

The only real issue with this is that manipulating the function or storing it for later use becomes more difficult. Simply giving the name will treat is as a Call that gets evaluated. Modifying the above:

def foo(&block)
  @user_block = block
end

foo{ |a, b| a + b } #=> FunctionMatchFailure for `block`

This will cause an error that there is no matching clause for the call block, because it is trying to call the block given to foo, but did not provide the expected arguments.

A naive solution for this is to support creating references to functions using & as in the method parameter:

def foo(&block)
  @user_block = &block
  :success
end

foo{ |a, b| a + b } #=> :success

This would then work, because a reference to block is being stored in @user_block, rather than attempting a Call to it. A potential issue with this is that every time the function is meant to be passed, rather than called, it needs the & prefix. I can't say how good or bad that will be without trying it. Maybe it's not a problem at all.

Finally, (yes this has run on for a while), the reference capture syntax should also be allowed as a way of converting a function to a block argument for a Call:

def foo(&block)
  block(1, 2)
end

def add(a, b); a + b; end

foo(&add) #=> 3

I think implementing this behavior can be put off until after anonymous functions have been implemented.


Anyway, that was a bit of a tangent. Back to creating anonymous functions, I can't decide whether I prefer the -> from Ruby or fn from Elixir. I know that I don't like the lambda keyword or the &(), but even then, I could probably be convinced. The only reason I have against them is that the parameter syntax isn't there or is different, which feels arbitrarily less clear than it could be.

Whatever the decision from this is, it shouldn't be considered "final". I'm sure some other opinion or suggestion will come up later down the road, and I'd like to be open to those changes, rather than assuming this is the best it can get.

Ada-like custom primitive types

I was browsing around for ideas on how to mitigate some float precision errors (e.g., 0.4 is represented as 0.39999999998) and came across a book on ADA-95, which features the ability to define custom primitive types: https://books.google.com/books?id=X_VlpfGoQRgC&pg=PA391&lpg=PA391&dq=better+language+level+float&source=bl&ots=H8iob6ObPv&sig=rjDCo2qjiBFuim5NKKMsySofCHQ&hl=en&sa=X&ved=0ahUKEwj9iob2jcvWAhWHQSYKHfO-A4QQ6AEITDAH#v=onepage&q=better%20language%20level%20float&f=false

The tl;dr from there is that ADA-95 allows programmers to write something like:

type Inches is digits 4 range 0.00..100.00;

which would then let the programmer write something like "6 Inches".

The part that I'm interested in here is allowing for well-defined handling of Float precision errors and rounding. For example, a programmer could write something that says "only the first two digits after a decimal place are significant, and use banker's rounding".

I don't have a syntax in mind, but I figure this could be a useful tool in allowing programmers to reason about the problem they are solving, rather than about how IEEE Floats work.

Class property layouts for pattern matching

Structs in Elixir are useful for structuring data (duh) to make passing around complex values simpler to deal with. Structs are quite literally maps with some extra information. This makes them extremely easy to use in pattern matching situations:

def some_func(%Ecto.Changeset{valid?: true, changes: %{password: pass}}) do
  # ...
end

In the above, the argument to the function must be an Ecto.Changeset struct, the valid? property of that struct must be true and password must have been changed (the new value of which is captured as pass). This is a very clear and concise syntax that makes Elixir a joy to work with.

I'd like to see something similar in Myst, but obviously it won't be the same. Myst is object-oriented, Elixir is functional. Where Elixir uses Structs, Myst would use Classes. Classes can define properties and methods, which are often mixed (e.g., in Ruby, accessors are proxies for variables, which are private by default).

That said, I think Myst could have a very similar syntax for matching class objects by defining explicit structures. A requirement for this to really be effective would be that pattern matching with a class object should require no extra syntax for usage (e.g., just having a to_map method isn't effective).

Here's a proposal:

class Car
  @color : String
  @axle_count : Integer
  @door_count : Integer

  struct {
    color: @color,
    axles: @axle_count,
    doors: @door_count,
    weight: weight
  }

  def initialize(color="gray", axles=2, doors=2)
    @color = color
    @axle_count = axles
    @doors = doors
  end

  def weight
    axles * doors * 1000
  end
end

# Return true if the given car is a 4-door (sedan)
def is_sedan(Car{doors: 4}); true; end
def is_sedan(_); false; end

sedan = Car.new(doors: 4)
other_car = Car.new(doors: 4, axle_count: 2)

# This pattern match will succeed, because the comparison is not based
# on object identity or even full equality. Instead, it works similar to maps
# where only those properties defined in the struct are matched.
<sedan> =: other_car
# The above is equivalent to creating a literal pattern inline:
Car{doors: 4} =: other_car

This example is obviously contrived, but I could see this being a powerful tool both for library and application development.

I'm also not completely onboard with using struct as the keyword, as it seems a little dishonest or unintuitive. pattern also seems incorrect because it's a common identifier (particularly when dealing with regexes).

More information will hopefully follow.

Method calls on integer literals are parsed as floats.

Writing a method call on an integer primitive is currently being parsed as an attempt at a float literal, which then fails because the non-alphabetic character following the POINT token isn't a valid float character.

For example:

10.to_s()

The parser starts consuming the integer value 10 and then sees the point character. It then continues consuming the numeric as a float and encounters the t, which causes the parse to fail, as t is not numeric, but the point character was not followed by another numeric character.

The solution would be to do more lookahead while lexing to determine if the point character actually denotes a float value (is followed by another numeric character), or should be considered it's own lexeme.

Force Calls for Var references that are given arguments.

This issue has been marked as a "Good First Issue"! If you'd like to tackle this issue as your first contribution to the project, be sure to read the Get Involved section of the README for some help with how to get started.

Consider this code:

foo = fn
  ->(1) { 2 }
  ->(2) { 4 }
end

foo(2)

This code will fail to parse, because foo is considered a local variable. As such, the parser will generate a Var node, and then get confused by the LPAREN token that follows it.

I think that in this case, the parser should force the Var to become a Call with no receiver. As far as I can tell, this is unambiguous, as the opening parenthesis eliminates the possibility for infix Calls, etc.

With that, the expectation of the above code would be successful parsing and a result 4 after interpreting.

Everything in the interpreter is already set up to handle the fact that foo is a callable functor. To be sure, try the same code above with the last line changed to self.foo(2). In this case, foo is forced to be a Call (because it has an explicit receiver of self), so it runs as expected, calling the function and returning 4.


Since this is just a Parser change, the implementation is fairly straightforward. The distinction between Var and Call is made in Parser#parse_var_or_call. The exact code is here:

if receiver.nil?
if name.starts_with?('_')
return Underscore.new(name).at(start.location)
end
if is_local_var?(name)
return Var.new(name).at(start.location)
end
end
call = Call.new(receiver, name).at(start.location)
skip_space
if accept(Token::Type::LPAREN)

The change to make is to avoid returning early if the next significant token (not whitespace, but potentially a newline) is an LPAREN, which would indicate a forced Call.

I don't necessarily know the best way to go about this change, but here's my take:

  • Move the skip_space from line 725 to above the if receiver.nil?. This will skip all of the unimportant tokens in the stream, leaving current_token as the next significant token.
  • Add current_token.type != Token::Type::LPAREN to the if receiver.nil? condition. If either of these are not met, the Var should be forced to be a Call instead, so the condition should be falsey.

My hesitation with this implementation comes from a comment earlier in parser.cr that describes a condition that currently holds for every method in the parser:

Each method will consume exactly the number of tokens required to build its node.

Moving the skip_space to an early position, before knowing that an LPAREN will be present, violates this contract, because in the case that the next token is not an LPAREN, the method will have advanced through more tokens in the stream than have been used to build nodes for the tree. This is probably fine, and a solution that doesn't violate this contract will almost definitely be far more complex, but feel free to try out different ideas and see if you can make it work :)

The final part (or first, depending on your TDD preferences) to implementing this feature will be adding some specs in parser_spec.cr to test cases that either should or should not be forced into Calls. This section seems like the best place to add them.

I'm sure I haven't done a great job of explaining things, so please let me know if some part of this needs clarification :) The #help channel on our Discord server is a great place to get quick feedback and have a faster conversation, or you can add a comment here for a more long-winded reply.

Good luck :)

Rename the language?

This is more of a scratchpad for ideas than anything else.

I generally like the name Myst, but I'm not attached to it in any way. I have also been feeling like it doesn't particularly match the language's look and feel. That's obviously subjective and it's hard to explain it, it's mainly a gut feeling that's grown over time.

I also don't have any other ideas for names, but here are a few guidelines for a great name (at least in my eyes):

  • The domain <new-name>-lang.org should be publicly available for purchase. -lang is essentially a standard nowadays, and I don't want to miss out on that just because of a new name. Ideally the domain isn't parked, either (though parking any suggestions made here for later transfer might be a good idea).
  • There should be a corresponding name for people who use the language. Python users are Pythonistas, which is a little odd, but the correlation is obvious, and better than Ruby users being called Rubyers. Elixir users are Elixirists, Rust users are Rustaceans, etc. Something similar to that would be great to have (side note Myst users being called Mysters seems kind of bad...).
  • The name should reflect something about the language. This one is a bit out there, but the name Crystal matches it's relation to Ruby, it's a more-structured, "clearer" (e.g., faster) language, but still in the same "family", so to speak. For Myst, I wouldn't want to rename it to "2-ton brick", because that doesn't match the feel of the language to me (and it's just a bad name). I see Myst as more of a lightweight, but with structure that can be introduced where it is helpful. Kind of like a non-newtonian fluid, if you will.
  • No more than 2-3 syllables. One syllable is a little to short to me. Go, Rust, Lisp, they're kind of easy to miss when speaking quickly. Two syllable names have a little more impact: Python, Ruby, Crystal, Erlang, Prolog. 3 syllable names are okay, but start feeling a little too long sometimes: Elixir, C++, JavaScript (need I say more?). 4 is just too many.

I'll probably add to this list of guidelines over time, but those are essentially the criteria that I've been judging names by so far.

Note: This is not a decision that will be taken lightly. Renaming a language is already a big deal, particularly once it gains some semblance of popularity. Renaming a language more than once could mean cascading changes and weeks or months of transition time. I can't imagine actually changing the name of the language before a v0.5.0 or so, or after a v1.0. A name change after v1.0 would need substantial reasoning to do so.

Operational Assignments

Operational assignments are common shorthands for "dual expressions", where an operation is performed, and then the result is re-assigned to the same left hand value used in the operation. For example:

a += 1
# is equivalent to
a = a + 1

For non-logical operations, this is essentially a simple syntax rewrite that takes any expression in the form a {{op}}= b and rewrites it as a = a {{op}} b. This works for all of the arithmetic and comparative operators.

However, ||= and &&= are generally handled differently, something that the semantics of which are surprisingly complex. I won't bother going into those semantics here, as it could easily take up multiple articles on its own.

In any case, the parser is already capable of parsing these statements, and has the corresponding OpAssign node to represent them. However, the interpreter does not actually support visiting these nodes, and will instead raise an error when encountered.

I would recommend a two-step approach to this, unless the second step becomes more simple.

  1. Implement OpAssign for the arithmetic and comparative operators, but raise an UnsupportedError or similar when encountering ||= or &&=.

  2. Go back and flesh out ||= and &&= properly once assignment methods are implemented (rewrites of obj.method = value to be equivalent to obj.method=(value), where method= is a method name.

Callstack not always being cleared on function exit

This issue has been marked as a "Good First Issue"! If you'd like to tackle this issue as your first contribution to the project, be sure to read the Get Involved section of the README for some help with how to get started.

While working on the Spec library, I encountered an interesting error:

Uncaught Exception: No variable or method `start` for Kernel
  from `start` at /Users/jon/Sites/myst-lang/myst/spec/myst/spec.mt:22:53
  from `-` at /Users/jon/Sites/myst-lang/myst/spec/myst/spec.mt:22:46
  from `puts` at /Users/jon/Sites/myst-lang/myst/spec/myst/spec.mt:22:4
  from `block` at /Users/jon/Sites/myst-lang/myst/stdlib/spec.mt:41:5
  from `describe` at /Users/jon/Sites/myst-lang/myst/spec/myst/time_spec.mt:62:1
  from `run` at /Users/jon/Sites/myst-lang/myst/stdlib/spec.mt:28:10
  from `it` at /Users/jon/Sites/myst-lang/myst/spec/myst/integer_spec.mt:61:3
  from `block` at /Users/jon/Sites/myst-lang/myst/stdlib/spec.mt:28:15
  from `block` at /Users/jon/Sites/myst-lang/myst/stdlib/spec/single_spec.mt:11:7
  from `run` at /Users/jon/Sites/myst-lang/myst/stdlib/spec.mt:28:10
  from `it` at /Users/jon/Sites/myst-lang/myst/spec/myst/integer_spec.mt:56:3
  from `block` at /Users/jon/Sites/myst-lang/myst/stdlib/spec.mt:28:15
  from `block` at /Users/jon/Sites/myst-lang/myst/stdlib/spec/single_spec.mt:11:7
  from `run` at /Users/jon/Sites/myst-lang/myst/stdlib/spec.mt:28:10
  from `it` at /Users/jon/Sites/myst-lang/myst/spec/myst/integer_spec.mt:51:3
  from `run` at /Users/jon/Sites/myst-lang/myst/stdlib/spec.mt:28:10
  from `it` at /Users/jon/Sites/myst-lang/myst/spec/myst/integer_spec.mt:47:3
  from `run` at /Users/jon/Sites/myst-lang/myst/stdlib/spec.mt:28:10
  from `it` at /Users/jon/Sites/myst-lang/myst/spec/myst/integer_spec.mt:43:3
  from `block` at /Users/jon/Sites/myst-lang/myst/stdlib/spec.mt:41:5
  from `describe` at /Users/jon/Sites/myst-lang/myst/spec/myst/integer_spec.mt:3:1

At a glance, it looks fairly simple, but looking at where the error came from shows an issue: the line of code causing the error is at the top-level scope. The error comes from the last line here with IO.puts:

start = Time.now

# TODO: add Dir globbing to automatically detect and require all `*_spec.mt`
# files under this directory.
require "./enumerable_spec.mt"
require "./integer_spec.mt"
require "./list_spec.mt"
require "./map_spec.mt"
require "./string_spec.mt"
require "./unary_ops/not_spec.mt"
require "./type_spec.mt"
require "./time_spec.mt"

finish = Time.now

# The only way to reach this point is if all of the Specs passed. Any failures
# will immediately exit the program, so reaching here implies success.
IO.puts("\nAll in-language specs passed in <(finish-start)> seconds.")

Interestingly, this actually shows two errors with the callstack management:

  1. A function being called is pushed onto the stack before it's arguments are evaluated. This is why puts is the third entry in the list, when the error is actually coming from its argument.
  2. When a function exits, it may not be removed from the callstack.

A similar issue with the selfstack was addressed previously (see #65), but this seems mostly unrelated.

Assignment, query, and bang methods

Ruby's semantics of allowing =, ?, and ! on methods is quite handy. = allows for attribute setter methods to be written neatly, without any verbose method call syntax (e.g., obj.set_thing(value)). ? is nice for indicating either a boolean or nilable result, and ! for indicating destructive or exceptional behavior.

To implement this:

  • the lexer needs to understand that =, ?, and ! are all valid extensions for identifiers (whether these should be allowed on Constants is a potential discussion point).
  • the parser needs to allow these components of identifiers both at definition and call sites.
  • the parser needs to specifically rewrite calls on the left side of simple assigns into calls with the added =, regardless of spacing.

The last point is the most difficult semantically speaking. Directly adopting Ruby's semantics seems more than adequate. That is, that for = to be re-written into a Call with the = appended to the name, the left-hand side must have an explicit receiver, which can be self to represent operating on the current value of self.

A potentially interesting point here would be potentially allowing these Calls with receivers to be used in patterns and to perform the same rewrites. For example:

[a.a, a.b] = [:hello, :world]

Here, instead of setting local variables, the bindings would be re-written as a.a = :hello and a.b = :world, which would similarly be interpreted as assignment method calls. I think that gets into a bigger discussion about how patterns can be compiled to be more efficient as well, but this could be a realistic (though maybe not desirable) feature in the future.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.