masak / alma Goto Github PK

ALgoloid with MAcros -- a language with Algol-family syntax where macros take center stage

License: Artistic License 2.0

Shell 0.02% Perl 0.17% D 0.55% Raku 99.26%

perl6 macros dsl grammar parsing slang operator experimental macro

alma's Introduction

Alma

Alma is a small language created as a testbed for Raku macros. Its goal as a language is to inform the implementation of macros in Raku, by means of being a faster-moving code base and easier to iterate on towards good solutions.

Rakudo already contains a rudimentary implementation of macros, but at this point the most mature macro implementation for Raku is embodied in Alma.

Alma was previously known as "007", in reference to the "Q" data structure which represents program fragments.

Get it

If you're just planning to be a Alma end user, zef is the recommended way to install Alma:

zef install alma

(If you want to install from source, see the documentation.)

Run it

Now this should work:

$ alma -e='say("OH HAI")'
OH HAI

$ alma examples/format.alma
abracadabra
foo{1}bar

Status

Alma is currently in development.

The explicit goal is to reach some level of feature completeness for macros in Alma, and then to backport that solution to Rakudo.

Useful links

Documentation (🔧 under construction 🔧 )
examples/ directory
The Roadmap outlines short- and long-term goals of the Alma project

To learn more about macros:

To learn more about Alma:

Double oh seven blog post
Has it been three years? blog post
This README.md used to contain a pastiche of the cold open in Casino Royale (2006), which was entertaining for some and confusing for others

alma's People

Contributors

Stargazers

Watchers

Forkers

mouq jaffa4 vendethiel zoffixznet tadzik alphapapa pmurias stethd dan-simon lucasbuchala wade1990 eiselekd qhpaptyj9hj0rqwmow3ujuo3gdp3kyzkmxsh4gk lizmat

alma's Issues

Add Q node semantics test coverage

Creating and using the Q nodes works but is severely undertested. Need to go through all the current Q types and provide test coverage for the following for each Q type:

Creating it from scratch
- With the right type, or a few different ones if there's variation
- With the wrong type, making sure it fails
- With types that could be seen as wrong, maybe because they're not syntactically valid
Fetching it from code
- Digging into it a bit
- Changing some stuff (we don't have this yet, but we should)

Make a single source of truth out of registered ops in the parser and the built-in operator functions

For every operator in the parser, there should also be a Val::Sub::Builtin in the setting. Moreover, it should be a OAOO thing, a single source of truth.

The parser has access to the runtime (because of BEGIN and need for access to static things), but not the other way around (because no EVAL). So maybe inject all the operators from the parser into the runtime?

Consider multipart operators

The prototypical case of this is ?? !!, so let's start there.

What if we could define this operator in 007 like this:

macro infix:<?? !!>(cond, then_val, else_val) {
    # elided implementation; it's interesting but not relevant here
}

The main point of which is that instead of the usual two, an infix of this kind would have three parameters.

(Only reason it has to be a macro is to get thunkish semantics on then_val and else_val. We might still end up with a neater solution for that which doesn't need to involve outright macros. We'll see.)

And this would generalize not just to two-part infix operators, but to N-part infix operators. An N-part infix operator would require N+1 parameters. (And in fact, we might as well have a compile-time check for this. In fact, that's entirely orthogonal to the rest of this issue and probably a good idea anyway.)

From what I can see, allowing expressions inside of an infix operator like that has the following consequences:

The other shoe might never drop. If the operator is defined like infix:<?? !!> and the code looks like 2 ?? 3;, then the compiler needs to flag up an error that the ?? !! operator wasn't completed. (Prior art in Perl 6: Found ?? but no !!. Although this error is hard-coded for the ?? !! operator, since in Perl 6 there is no general mechanism as described here.)
Lower-precedence operators must be forbidden in the in-between expressions. It's OK for a little expression tree to form between the ?? and the !!, but the operators used in that expression tree must be tighter than ?? !! itself, for the simple reason that if they were looser, then ?? !! couldn't be reduced before them.
Associativity still exists, but is prejudiced towards the first and last expressions. "Left-associative" means that the operator makes subtrees in the first subexpression, and so A ?? B !! C ?? D !! E would mean (A ?? B !! C) ?? D !! E. "Right-associative" means that the operator makes subtrees in the last subexpression, and so A ?? B !! C ?? D !! E would mean A ?? B !! (C ?? D !! E). (As indeed it does in Perl 6, since ?? !! associates to the right.) There is no mechanism to make the operator associate through the middle subexpression, for the simple reason that that doesn't solve the problem of evaluation order.
This setup could enable an interesting and natural kind of multi dispatch.

Let me be more specific about the last point. Say we wanted to implement a ?? operator, with the semantics that A ?? B worked like A ?? B !! None. The conservative thing to do would be to say that we can't have ?? and ?? !! in the same environment, since they have the same first part. But the interesting thing to do would be to allow both, and simply keep both options open as we parse.

(Note: it feels to me like this would work, and not bend the expression parser beyond the breaking point. Some of the points above about error messages get seriously muddled up (just like with multi subs vs only subs), but that looks merely hard to me, not insurmountable. In the worst case I'm wrong, and the resulting expression parser would have to do exponential backtracking or something. In which case we should revert to being conservative rather than interesting.)

The interplay between this idea and is parsed is not clear to me. This may or may not be a problem, depending how much you like is parsed (or have an alternative up your sleeve). The closest I can come to an actual suggestion here is that is parsed needs to be free-form enough while also respecting the "fixed" bits of the operator. Or maybe that's not true at all and is parsed is in total control and the parser ignores everything except the first part of the <?? !!> specification.

Multipart prefixes and postfixes? I don't see why not, but I also suspect they would be less in demand than their infix counterparts. I haven't been able to come up with a use case for either. (Edit: actually, postcircumfixes are an excellent use case, and even have the multi-part syntax in Perl 6, although the number of parts is constrained to 2.)

Get rid of blocks as values

They aren't very useful since they can't return stuff. (And I don't plan on making them do so.)

We still need a Q node representing blocks in the various places they can occur. But we can rename Q::Literal::Block to Q::Block and abolish its use as a literal.

Implement class declarations

Proposed syntax:

class Q::Statement::My {
    my ident;
    my assignment = None;

    sub Str {
        if assignment == None {
            return "My" ~ children(ident);
        }
        return "My" ~ children(ident, assignment);
    }

    sub run(runtime) {
        if assignment == None {
            return;
        }
        assignment.eval(runtime);
    }
}

(Here's the Perl 6 class for comparison:)

role Q::Statement::My does Q::Statement {
    has $.ident;
    has $.assignment;
    method new($ident, $assignment = Empty) { self.bless(:$ident, :$assignment) }
    method Str { "My" ~ children($.ident, |$.assignment) }

    method run($runtime) {
        return
            unless $.assignment;
        $.assignment.eval($runtime);
    }
}

A couple of things of note:

Much like Python, we re-use lexically declared variables and subs to create attributes and methods.
Unlike Python, there's no __init__ method. You're not expected to need one.
Variables with an assignment on them are optional. Variables that are not assigned are mandatory.
Perhaps most crucially, there is no instance context. There's no self, there's no this. The variables declared with my in the class block happen to be lexically available in the methods, but that's all.
As a consequence, there's no private state. You can't even emulate it by having the methods close over a lexical defined outside of the class. (Or you could, but it would be "static", not per instance.)
~~There's no class inheritance.~~
Just as with arrays and object literals, everything's immutable from construction-time onwards. If you want to mutate something, you call setProp (or butWith or whatever we end up calling it).
Anything other than my or sub inside the class block is conservatively disallowed.
- Any "ordinary statements" are disallowed because they don't further the function of a class declaration. Use a BEGIN block instead. On similar grounds, BEGIN blocks are out.
- constant declarations might be OK, though. They'd be a kind of once-only variables, valid throughout the class. If we wanted to be fancy, we could even allow MyClass.someConstant to resolve. But I doubt we'd want to be that fancy.
- I nearly wanted to allow other class declarations inside the class body. Felt it could help to establish name spaces. (And then we'd probably switch from Q::Statement::My to Q.Statement.My.) Eventually decided against it because it feels like it's confusing together two very different things: the closed declaration of a class and its attributes/methods, and the open-ended declaration of namespaces.
- Finally, macros would have been really really cool to allow as methods. Unfortunately, macros run long before the late-bound method dispatch happens. Even if we had types and knew what class the macro was run on, we still wouldn't know what instance it was called on.

The syntax for creating an object mirrors object literals quite a lot.

my declaration = Q::Statement::My {
    ident: Q::Identifier {
        name: "foo"
    },
    assignment: None
};

(In this case, we could've skipped passing assignment, as it's already optional.

We could have used the new keyword in the construction syntax, but I don't feel that it adds anything.

As a nice retrofit, writing Object { ... } also works, and means the same as an ordinary object literal. Unlike user-defined classes, the Object type doesn't check for required or superfluous properties.

Tuple classes

After we get types, I'd very much like for us to get a "tuple class" syntax, where you don't name your attributes, but only specify their types as a tuple:

class Q::Statement::Sub (Q::Identifier, Q::Parameters, Q::Block) { ... }

And the corresponding constructor syntax would look like a function call:

my function = Q::Statement::Sub(
    Q::Identifier("fib"),
    Q::Parameters(Q::Identifier("n")),
    Q::Block(Q::Statements(...))
);

Internally, the attributes of a tuple class could have 0-based indices instead of names, and so you could still index them with getProp et al. This type of class would really shine if we had ADTs and case matching, the issue which see.

Meta-object protocol

If you ask type(Q::Statement::My), you get back the answer Type. (Much like Python.) Here's a rough attempt to get Type to "close the loop", meta-object-wise. Using the conjectural/future type syntax.

class Type {
    my name: Str;
    my tuplish: Int;    # since we don't have Bool
    my properties: Array<PropSpec>;
}

class PropSpec {
    my name: Str | Int;
    my type: Type;
}

Refactor the test suite

The test suite is currently structured into these files:

t/
  semantics/
    types.t (AAA)
    operators.t (6 x A, 5 x E)
    blocks.t (6 x A)
    variables.t (A)
    if-statement.t (AA)
    subroutines.t (8 x A)
    return.t (6 x A)
    for-loop.t (AAAA)
    while-loop.t (A)
    fibonacci.t (A)
    begin-blocks.t (A, 3 x P, 7 x O)
    type.t (6 x A)
    builtins.t (21 x A)
    constants.t (OOO)
    macros.t (A)
  syntax/
    elements.t (24 x T)
    corner-cases.t (9 x T, 15 x F)
    expr.t (9 x T)
    macros.t (OO)
    quasi.t (OO)
    errors.t (F)
    begin-time.t (O)
    custom-ops.t [still in a branch] (T, 2 x F, 5 x O)

The letters in parentheses mark individual test calls:

A  AST input, expect (runtime) stdout
E  AST input, expect (runtime) error
F  program text input, expect (compile-time) error
O  program text input, expect (compile-time & runtime) stdout
P  program text input, expect (compile-time) stdout
T  program text input, expect AST

The project started out with just A tests (in the semantics/ directory) and T tests (in syntax/). Later tests added E and F tests, which... fine.

Then, BEGIN blocks necessitated the P test, and then, the parser got access to the runtime, and then we got static lexpads and values surviving from compile time to runtime, and so we needed O test (and could in retrospect perhaps have done without the P kind)...

BEGIN blocks deliberately flaunt the compile/runtime subdivision. So do operators, constants and (especially) macros. I think in recognition of this, we should retire the original directory structure and not distinguish so heavily between syntax and semantics.

Dividing things by feature still makes sense to me. So I mostly like the test file names in the semantics/ directory, and suggest reorganizing most tests to be under them. Actually, perhaps put them in a features/ subdirectory instead, and then put fibonacci.t (and later, man or boy) under some other aptly named subdirectory.

It's nice when a project outgrows its original assumptions.

The truthy multi doesn't cover all our values

We have

multi truthy(Val::None) is export { False }
multi truthy(Val::Int $i) { ?$i.value }
multi truthy(Val::Str $s) { ?$s.value }
multi truthy(Val::Array $a) { ?$a.elements }

But we also have Val::Block, Val::Sub, and Val::Macro (all of which are always truthy). We then use truthy from (currently) three places in the 007 code:

if statements
while loops
the grep built-in

All of these would fail to dispatch if they got one of the three types of value we don't cover.

We should

make sure that this is actually the case
write one or more tests for the expected behavior
fix it

Possibly the fix is as simple as adding a Val candidate.

Make it possible to define PRE and POST phasers

This is a curious one, but one I think we should tackle. Very simple macros generate some code and then dump it on the site the macro was called. Any phaser macro typically wouldn't; they need to make sure this code runs at a particular time.

I can think of two approaches to this:

Without co-opting the compiler. The macro would have to queue up a request to participate in the construction of the surrounding sub. (So I guess there'd have to be a mechanism for that, or at least a mechanism to override standard sub construction.) As the sub is created, the passed statementlist gets extended with PRE blocks at the beginning and POST blocks at the end.
Co-opting the compiler. Same, but instead of forcibly inserting blocks in a statementlist, the Q::Sub has hooks for "before sub body" and "after sub body". In fact, it would be interesting to classify all the existing Perl 6 phasers in this way: what program element do they hook onto, and what's the relationship ("before", "after", etc)?

In both cases, the injected blocks need to be wrapped with the implicit boilerplate of "...make sure the value we get back is truthy, otherwise die with an error message". How this relates to issue 7 and whether blocks should be allowed to return values, I'm not sure. Need to think about that a bit more.

It should be possible to nest PRE and POST phasers inside each other, so that they can share locally defined lexicals.

In fact, PRE and POST probably shouldn't be restricted to just subs; it should probably work in any type of block. That way, one could put them on loops, too. Or on the entire program.

Bonus points for making the error message include the text of the precondition that fails. Like Rakudo does:

$ perl6 -e 'for 1..10 { PRE { $_ < 5 }; say $_ }'
1
2
3
4
Precondition '{ $_ < 5 }' failed

(In fact, can do better in two ways there. Firstly, in the case of a conjunction &&, we could single out the conjunctive term that failed. Rakudo currently does not do this:

$ perl6 -e 'for 1..10 { PRE { $_ < 5 && $_ > 0 }; say $_ }'
1
2
3
4
Precondition '{ $_ < 5 && $_ > 0 }' failed

(Could say Precondition '$_ < 5' failed instead.)

Secondly, we could show the values of all the variables involve in the failing term. That'd be the next question anyone debugging the code would have, so it'd be a great help. All this should be an excellent showcase for what macros can do, splitting an expression into conjunctive terms, and finding the variables.)

Implement method call syntax

#25 talks about method call syntax (which is actually just a property lookup plus a function call, much like Python and JavaScript).

There's nothing preventing us from implementing those ahead of time, without having object literals. If we're going that path anyway, then we could do the property lookup/method call as a standalone thing.

Why would that matter? Because the setting currently contains a bunch of accessors/destructors for the Qtree nodes, that would be much more intuitive as methods on the Qtree nodes themselves:

value params stmts expr lhs rhs pos args ident assign block

Thus, you could do this:

my q = quasi { say("Three plus four is " ~ (3 + 4)) };
say(q.stmtlist()[0].expr().arglist()[0].rhs().lhs().value()); # 3

Instead of (as you currently have to):

say(value(lhs(rhs(arglist(expr(stmtlist(q)[0]))[0])))); # 3

This would have the further advantage of uncluttering the setting a bit more.

Eliminate the double parameters/arguments/statements layer in Q

This is stupid:

my q_block = Q::Block(...);
statements(statements(q_block))[1];

my q_sub = Q::Statement::Sub {
params(params(q_sub))[2];

my q_call = Q::Postfix::Call(....);
args(args(q_call))[3];

Or, if we decide to go with method call syntax, it would keep being stupid:

q_block.statements().statements()[1];
q_sub.params().params()[2];
q_call.args().args()[3];

There's simply no good reason for the repetition. Notice that there's an explanation, but not a good reason.

The explanation is that the first call gives you the Q::Statements object: the class that holds the array that holds the individual Q::Statement objects. The second call gives you the array.

The long-term solution to this would be array types. But the short-term solution is simply for the setting functions to unpack two levels right away, giving you the array directly and ignoring the intermediate holder object. Yay! Free yays all around! Yays are on me!

That's on the 007 side. This is silly on the Perl 6 side too, though. We should make all of those three classes be positional and handle all the relevant Array methods. We can afford that level of convenience. :)

Implement type checking, orthogonally to everything else

It would be way cool if we could add type checking as basically an optional module on top of 007 core. It would be both a "battle test" of the macro primitives we provide (and provide clues as to whether they're powerful enough), and a prestigious thing to pull off.

I'd be totally fine with us faking much of it first, including macros, modules, and module imports/exports, as long as we were converging on having it be basically an add-on thing, expressible in 007 itself.

Syntax

The syntax extensions are the easy part. We'd want to extend:

my and constant declarations: my count: Int = 0;
Parameters (including to if and loop blocks): sub add(lhs: Int, rhs: Int) { ... } and -> s: Str { ... }
sub and macro declarations (return types): sub add(lhs, lhs): Int { ... }
the identifier in catch blocks (see #65)

Basically, all the places in the language where an identifier signifies the introduction of that identifier into some scope, we'd want to also introduce a : <type> parse.

I'm totally fine with cluing the core parser in to where those places are where we'd want to introduce a typing. From the list above, that looks like about five places. We just place a rule there or something that's empty in the standard 007 grammar. If that's not enough, then I guess we'll have to re-implement enough of Perl 6 grammars inside of 007 to make that possible.

By the way, it would always be optional to add a colon and a type, and leaving a type out would need to mean the same thing as before we introduced types. (If someone still wants to put something there, they can write : Any, which means "do no type checking on this thing".)

Array is a generic type, and just writing it like that would mean Array<Any>. But you can also write Array<Int> or even Array<Array<Array<None>>>.

Function types have a -> in them to separate the types of the parameters from the return type, like Int -> Str. If you have several parameters (or zero), you need to surround them with parentheses: (Int, Int, Int) -> Str. You cannot have several things on the right-hand side of the arrow, though. (By the way, TypeScript uses => for the arrow, but we're going with Python's proposal for type annotations, and it uses ->.)

For objects, the type looks like this: { name: Str, age: Int, greet: () -> Str }.

Semantics

The numeric operators, like infix:<+> et al, would only be allowed on Int literals and variables. (Any would work too; see above. This fact will be assumed from now on in this text.)
The string operators, like infix:<~> would only be allowed on Str things.
If the compiler can infer the type of a sub by just bubbling up information from its constituent parts, it will do so. Thus you might actually get your subroutine typed by the compiler. (If this clashes with a type annotation you made yourself, you get a compile-time error.)
Similarly, if you initialize a variable as you declare it, the compiler will deduce the type for you.
However, both variable declarations without assignments and parameters are typed as Any if no type is provided. No guessing is being done from later handling of the variable or parameter.
Assignment is only allowed if the types match.
When a call is made to something annotated with a return type, the call expression is assumed to have that return type.
Indexing is only allowed on Array. The type of the thing we get back is the generic type parameter T of Array[T].
Typed parameters in if, for and while pointy blocks are checked as early as possible against their expression. If something is obviously impossible, that's a failure at compile time. Some things will have to wait until run time, unfortunately. (For example, my a: Any = [...]; for a -> e: Str { ... }.)
Control flow within a sub is analyzed enough that we make sure we always return a type we declared we would.
We are quite likely to need/want to do type guards in practice.
TypeScript does something called contextual typing which we might also start caring about once we run out of simple things to do.

Other worthy ideas

Union types: Int | Str
Optional types: Maybe<Int>
Fixed array types: [Str, Str, Int]
Generics

Implement ADTs, and pattern matching

Here's the proposed syntax, based on the same declaration in an article by Oleg:

type Exp {
    | Var (Str)
    | App (Exp, Exp)
    | Lam (Var, Exp)
    | Let (Var, Exp, Exp)
}

Semantically, the type Exp declaration would introduce five new types into the current scope, the latter four of which are instantiable classes. (For example, Lam(Var("foo"), Var("foo")).) So far, nothing particularly exciting about that. As you can see, ADTs can be recursively defined without much fanfare. If they don't have a base case, you simply won't be able to instantiate one using a finite program.

The | things stand out a little and make it clear that this is a DSL. They're also a little bit easier on the eyes than ; at the end of the line, and visually hint that there is something very declarative going on here, rather than imperative ; stuff.

We also introduce a case statement:

sub depth(Exp e): Int {
    return case e {
        | Var (_): 0
        | Lam (_, e): depth(e) + 1
        | App (e1, e2): max(depth(e1), depth(e2))
        | Let (_, e1, e2): max(depth(e1), depth(e2))
    };
}

There's both a case statement and (as above) a case expression. Both require an xblock. Notice how the individual cases mirror the syntax of the ADT declaration itself, although here we're binding against values instead of declaring their types. The thing after the colon can be either an expression or a block.

In the case of the App and Let cases above, e1 and e2 are introduced as lexical variables whose scope is only the expression or block immediately following the colon. Written as imperative code, the App case would come out as something like this:

if type(e) == "App" {
    my e1 = e.getProp(0);
    my e2 = e.getProp(1);
    return max(depth(e1), depth(e2));
}

The underscores simply mean "no binding". (I wanted to do asterisks, following Perl 6's lead, but it didn't look nice. Dollar signs don't make sense in 007 because we don't have sigils.) In the case of Var we don't care about any of the properties, and we could elide the (_) completely. But even a paren full of underscores would do shape-checking on the type, which provides a little bit of extra consistency.

I guess we could also allow multiple matchers with | between them. Hm.

    return case e {
        | Var (_): 0
        | Lam (_, e): depth(e) + 1
        | App (e1, e2) | Let (_, e1, e2): max(depth(e1), depth(e2))
    };

In this particular instance that helps us. I'm not sure it's worth the extra complexity with the matching machinery, though. We'd need to throw an error if there was a type mismatch somewhere, or if two matchers didn't introduce exactly the same variables.

If you do otherwise as the last case, it will match when nothing else matched. Notice that you can't match on structure with this one, though; it's just a catch-all.

The compiler statically detects whether you've "covered all the cases". Given what we've said so far, that looks entirely tractable, even with things such as nested matching and multiple parameters. If you haven't covered all the cases, and don't have an otherwise, the compiler fails and describes in poignantly descriptive prose what you missed.

I'm sorely tempted to allow individual cases to happen right inside any function ~~or pointy block~~, just binding directly against its signature. Both the case statement itself and return could then be implicit. The depth sub could then be written as:

sub depth(Exp _): Int {
    | Var (_): 0
    | App (e1, e2): max(depth(e1), depth(e2))
    | Lam (_, e): depth(e) + 1
    | Let (_, e1, e2): max(depth(e1), depth(e2))
}

This kind of "implicit block case matching" would go a long way to compensate for our lack of multi things. In fact, I'd say it's a pretty competitive alternative, with a bunch of advantages of its own.

It is unclear how much we need visitors à la #26 when we have pattern matching like this. But let's do both and see which one wins. :) Who knows, maybe they'll end up occupying different niches.

Fail earlier and better on doomed parses

I just fixed two specific cases, namely when my and sub are followed by a non-identifier. Instead of backtracking and trying other rules (usually statement:expr, which is very general), the parser should just give up directly with a <.panic> call.

See these two commits for inspiration on how to do this.

This issue can be closed when the parser has been combed for other opportunities like this, and <.panic> calls installed. As much as possible, each such <.panic> should be accompanied by a test in t/syntax/errors.t.

For loop bug: something assumes that it's always literals we're looping over

$ perl6 bin/007 -e='for [say] -> c { c("OH HAI") }'
Method 'value' not found for invocant of class 'Q::Identifier'
  in sub elements at /home/masak/ours/007/lib/_007/Q.pm:418

$ perl6 bin/007 -e='for [40 + 2] -> n { say(n) }'
Method 'value' not found for invocant of class 'Q::Infix::Addition'
  in sub elements at /home/masak/ours/007/lib/_007/Q.pm:418

Table in tutorial displays wrong

From tutorial page:

feature Perl 6 007 Python

braces yes yes no user-defined operators yes yes no variable declarations yes yes no macros yes yes no implicit typecasts yes no no sigils yes no no multis yes no no implicit returns yes no no

Add a 'man or boy' test

Man or boy, budded off from #9 so we can close that one.

Provide a command-line tool for linting and refactoring 007 code

Call the command spy, to continue the 007 theme.

It would have two major parts: a linter, and a refactoring tool.

linter

Called with spy --lint script.007, the tool could detect various things that are smells rather than errors in a 007 program:

refactoring tool

Extract/inline/rename variable/constant
Extract/inline/rename sub
Extract/inline/rename macro
Add/remove/reorder parameters
Introduce parameter (turn expression in sub into a parameter)

Being refactors, the actions that affect parameters should also make sure to do the "corresponding" things to arguments at all statically detectable call sites.

what's needed to get there

The compiler already generates a Qtree for consumption by the runtime. The thing that needs to be built on top of that is a layer of primitives for locating stuff in the Qtree, and to manipulate it. Also a greater awareness of the connection between Qtree and original source text is needed.

LTA error message on malformed sub

Discovered while writing tests:

$ perl6 bin/007 -e='sub fn()'
Missing identifier

It's not the identifier (fn) that's missing, it's the block after the parens.

I'm not sure why that panic triggers here. Probably something backtracks to that point... but rules shouldn't backtrack, should they?

Implement a 007 parser in 007

Something tells me that Sooner Or Later™ we're going to need something out of parsing that Perl 6 grammars won't easily provide for us (gasp!). At that point, we'll wish the parser was bootstrapped and written in 007.

So let's write a 007 parser in 007 before that happens.

We may never actually swap it in for the real thing. Depends what kind of bootstrapping crisis we actually encounter. But it'd be an interesting exercise in any case, and a way to find where 007 as it stands is lacking right now.

The base assumption is that a recursive-descent parser can be hand-coded with enough if statements, subroutines, index/substr/charat, and pure bloody-mindedness.

But we should also look out for opportunities: can we simplify the parser using macros somehow? What if we had the case expression syntax of #34? What if we thought in terms of parser combinators?

There are currently 189 tests in the test suite. Of these, 34 are parses-to and 27 are parse-error. So, 61 tests could form the basis for this 007 parser in 007. Probably quite a good starting point.

Implement object literals

Objects, if we get then in 007, will be immutable, just like arrays. (Implementing mutation isn't worth it for us, and having everything be immutable has some nice benefits, too.)

Here's what I'm proposing for syntax, based on ES6:

my obj = {
    prop1: 42,
    "quotes allow you to put a non-identifier as a property": "if that floats your goat",
    fn,                     # syntactic sugar for 'fn: fn'
    toString() {            # syntactic sugar for 'toString: sub () { ... }'
    },                      # trailing comma allowed
};

The run-time value would be a Val::Object.

I don't think there should be any JS-like this syntax/semantics. I think we should be running entirely on lexical scoping. That way, people would essentially get private attributes "for free". Since objects are immutable, this would be the only way to get immutability anyway.

In other words, there's no such thing as "object methods" (much like JavaScript), and there's no "object context"/this (unlike JavaScript). If you want to refer to the object, just assign it (like we did above with obj) and it'll be available lexically.

Accessing properties could probably borrow obj[key] — so we do array indexing and object property lookup with the same postfix operator. Or we go the Perl route on that one and have obj{key}. I'd be fine either way. I'd be fine with a getProp(obj, key) setting function too, I think. If we do the postfix lookup, we should consider also having the syntactic sugar obj.key (like JavaScript); that last one would only work if key is a valid identifier, of course.

I'd just like to add that I foresee people wanting to mutate their object, and we should probably supply a setProp(obj, key, newVal) setting function for that. Objects stay immutable, but it gets easier to derive new objects from old ones.

Also, there should be a keys(obj) setting function, returning an array of the key names of obj. This function gives no guarantees whatsoever about the order of the keys. There's no big need for a clone setting function, because objects are immutable.

Things we're explicitly not adding/borrowing from ES6 and other languages:

Classes
~~Inheritance~~
__proto__ and prototypes
Computed properties

Remove 'method declare' hack and introduce AST strengthening

Apart from a run method, all the Q::Statement:: types have a declare method which gets called on the surrounding block's entry.

This method is a hack. More exactly, it was added at a time when we were running only off ASTs, without the parser creating static lexpads for us. Now that it does, the declare methods are superfluous and should be removed.

The interesting thing, though, is that if we just go ahead and remove all the declare methods, a lot of tests in t/semantics/ start failing. This is because the ASTs in these tests were never program text to begin with, and so they never got the static lexpads from the parser that would replace the declare method hacks.

To remedy this, we want to create a new helper routine check in _007::Test that traverses an AST and initializes static lexpads in the same way a parser would. Essentially doing a "parse" of the AST.

Note that we can cheat quite a bit in writing this routine, as we don't have to worry about erroneous conditions in the AST. (The name check primarily means "do stuff that belongs at CHECK time", not "make sure everything's OK".) We can assume the test author wrote a correct AST for the test, otherwise it's a bad test. As far as I can see, we only have to initialize the static lexpads. Then we can remove the declare methods.

Interestingly, the way we seem to be headed with synthetic ASTs (which were created using Q:: constructors in program code somewhere) is that these will also need a similar check routine. So it's probable that this routine actually becomes core, not just a test helper. At that point though, we do need to check everything the parser checks, because we don't trust the user the way we trust the test author.

Statement/expression mismatch and quasi scoping

A bit of a design issue more than an implementation one.

The thing to keep in mind when reading this is that in 007, statements can contain expressions, but expressions cannot really contain statements. That's to say, of course a block term will hold statements in it (Update: block terms don't exist anymore) (Update-update: but sub terms do), but from the point of view of the expression, that block is opaque. More to the point, there's none of Perl 6's do keyword in 007. (do turns the subsequent statement into something that can be used as an expression.)

A macro call is by necessity (part of) an expression (Update: this is too limiting an assumption). But sometimes we want to insert something that isn't just an expression: a whole statement, a sequence of statements, a block, a subroutine declaration, a macro declaration, etc.

In fact, I bet the case where we insert statement-level stuff rather than expression-level stuff is the common case.

I had a solution for this. It can be summarized in this table:

       \...INTO    a whole expr     part of an expr
        \
INSERTING\-----------------------------------------
          |
an expr   |         fine            fine
          |
a stmt    |         stutter         error

Inserting an expression into an expression is fine. Even inserting an expression into part of an expression is fine. (Mainly because this is what expressions do all day.) It even works out pretty well with precedence and stuff by default. Better than C's macros, anyway.

If a statement of basically any kind ends up at the top level of an expression, then that's an error condition in the AST and we have to do something. It's in this case that we "stutter" and simply peel away the outer Q::Statement::Expr, letting the inner statement break free of it as if from a chrysalis.

(We have to do a similar peeling-away process for the first two cases, too, because the result from the macro will be a block with a statement with the expression we want to insert, and that block and that statement have to go away.)

It's only the last case, trying to insert a statement-y thing in the middle of an expression, that should be an error.

Now all of the above was fine, and is doable. But here's the snag, and the reason I'm writing this down.

macro foo() {
    my x = 5;
    return quasi { x };
}

my x = 19;
say(foo());

In the fullness of time — and hopefully not too soon — the above 007 program should output "5", because of how scoping and macro insertion works.

But by what I said above, all that would be inserted into the say(...) expression would be the expression x, a textual identifier without any context associated with it. The result would be that "19" would be printed. (Or, worse, if I hadn't declared that outer x, a "variable undeclared" error at parse-time (hopefully) or runtime (in the worst case).)

Perl 6 solves this in the following way: what's inserted is always a block, which gets run as it gets evaluated. Blocks, blocks, blocks. Even in the middle of an expression. The beauty of this is that the block can be forced to have the appropriate OUTER (the macro), and the variable lookup would work out. (As has been noted elsewhere, this makes the unhygienic lookup more problematic. But let's not worry about that now.)

Now,

In 007, just putting a block there wouldn't do the right thing. Blocks don't self-evaluate.
So we insert a call to the block. Now the problem is that 007 blocks never actually return any value. (They are only used for their side effects.)
So we insert a call to a subroutine instead. Now the problem is that we have to define the subroutine, and since subroutines are always named, we have to gensym the name.

I think that last one actually works. But it feels very wrong. I thought we would be able to do macro insertion with proper scoping with the primitives that we have, and without resorting to generating a lot of extra stuff just to make everything fit together.

Just some random thoughts before going to bed. I suspect we're missing a primitive somewhere, or something. In the best case, the whole thing can be solved by allowing blocks to return the value of the last statement. (I think it can.) But that opens up a smallish can of worms that I had kept closed by doing it the way he have it right now (blocks always evaluate to None):

If blocks return values, then we have two notions of returning. There's the block return, and the return return from subs (and macros). Slightly higher conceptual burden all around.
If blocks return values, should conditionals and loops also return statements? I kinda hope not, and letting them do so feels like the start of an edifice that leads to a total blending-together of statements and expressions. The pinnacle of which would be to sigh and add the do keyword. On the other hand, if we said that conditionals and loops always evaluate to None, then that's an exception in the language.
The notion of "last statement" is complicated. Last statement to be evaluated? Last textually?
Suddenly we have to care about the value of statements that we didn't have to care about before. just in case they're last in a block. What's the value of a my statement? Of a constant statement? Of a sub declaration? Again this takes us down the slippery slope of mixing up statements and expressions.

Make unquote lexical lookup possible

Currently have this failing test locally:

{
    my $program = q:to/./;
        macro foo() {
            my x = 7;
            return quasi {
                say(x);
            }
        }

        foo();
        .

    outputs $program, "5\n", "a variable is looked up in the quasi's environment";
}

It fails with:

Variable 'x' is not declared
  in method find at /home/masak/ours/007/lib/_007/Runtime.pm:78

Essentially because it's looking in the wrong lexical scope: the mainline's, not the macro's. We've decided that we want lexical scoping/hygiene for this by default (in Perl 6, and therefore in 007), and so it should look in the macro's scope.

Let's just pause and recognize the fact that it should do that is absurd on the face of it. At the point of the lookup, the say(x) has been physically copied into the mainline, replacing the foo() macro call. Therefore, the current (wrong) behavior is certainly the natural one: the runtime would normally expect to look for x in the surrounding scopes, innermost first.

Some clever person may say "oh, no worries — let's just do what closures already do in that case to displace the lexical lookup!". But — newsflash — closures are great because they preserve the lexical lookup, while also allowing being passed around as first-class values between other, unrelated scopes. In other words, we get no help from closures, because all they do all day is be extremely consistent lookup-wise.

At a minimum, what the above example would need is two things:

A way for a Q::Block to mark what its OUTER is. (At least if it's different from expected.)
A way for a new Val::Block to be created so that it inherits this information.

Getting all these bits right (static blocks, dynamic blocks, outers, callers) is tricky. I need to mull it over some more before diving in.

Implement quasi unquotes

Hacker News wants unquotes. We happily oblige.

Unquotes in expressions

Whenever the parser is ready to parse a term, it should also expect an unquote.

quasi { say("Mr Bond!") }
quasi { say({{{greeting_ast}}}) }

Technically, I don't see why we shouldn't expect the same for operators. But we get into the interesting issue of what syntactic category it is.

Screw it, I'm tired of theorizing. Let's just steal the colon for this.

quasi { 2 + 2 }
quasi { 2 {{{infix: my_op}}} 2 }

quasi { -14 }
quasi { {{{prefix: my_op}}}14 }

quasi { array_potter[5] }
quasi { array_potter{{{postfix: my_op}}} }

Backporting this solution to terms, you could mark up a quasi as term if you want, but it's the default so you don't have to:

quasi { say("Mr Bond!") }
quasi { say({{{term: greeting_ast}}}) }

At the time of evaluating the quasi (usually macro application time), we'll have the type of the unquoted Qtree. The runtime dies if you try to stick a square Qtree into a round unquote.

But the parser can sometimes reject things early on, too. For example, this shouldn't even parse:

quasi { sub {{{prefix: op}}}(n) { } }

(That slot doesn't hold an operator, it holds an identifier.)

Unquotes for identifiers

007 currently has 5 major places in the grammar where it expects an identifier:

Usages
- Terms in expressions (but that's handled by the previous section)
- is <ident> traits
Declarations
- my and constant
- sub and macro
- parameters in parameter lists (in subs and pointy blocks)

The traits one is kind of uninteresting right now, because we have four trait types. Someone who really wanted to play around with dynamic traits could write a case expression over those four. So let's skip traits — might reconsider this if we user-expose the traits more.

The three declaration cases are the really interesting ones. Notice that each of those has a side effect: introducing whatever name you give it into the surrounding lexical scope. (Handling that correctly is likely part of the #5 thing with Qtree strengthening.)

I would be fine with the {{{identifier: id}}} unquote accepting both Q::Identifier nodes and Str values. Q::Identifier is basically the only node type where I think this automatic coercion would make sense.

Unquotes in other places

These are the remaining things I can think of where an unquote would make sense:

Sequence of statements (Q::Statements)
Parameter list (Q::Parameters)
Argument list
Standalone block (Q::Block)
Trait (Q::Trait) — note: we should probably have a Q::Traits container, just like we do for statements and parameters
Individual statement (Q::Statement) — not sure what this one'll give us over sequence of statements, but it's possible it might be useful
Lambda — interesting; this one is in the grammar but not reified among the Q nodes
"Expr block" — ditto
Unquote (Q::Unquote) — mind explosion — note that this is only allowed inside a Q::Quasi

Implement a `walk` built-in

Visitors can help do dynamic dispatch on a Qtree. They provide a laid-back way to just match on structures rather writing out the conditionals explicitly.

Example: counting statements in the program:

statementCount = 0;

var visitor = {
    Q(q) {
        visitChildren(q, visitor);
    },
    Q::Statement(stmt, super) {
        statementCount = statementCount + 1;
        super(stmt);
    },
};

visit(currentCompunit, visitor);

This example assumes the object literal syntax in #25. We could probably achieve something similar with (a) arrays, or (b) an opaque visitor() constructor and some visitor-specific version of setProp. But it'd be clunky, which speaks in favor of implementing the literal syntax.

The visitor object sent to the visit function is supposed to contain keys corresponding to the available Qtree types. It would probably make sense to die on an unknown type, which is likely to be a typo.

By default, Qtree nodes match against the most specific function provided (like Q::Statement), but if that one isn't present, they search up the inheritance chain all the way up to Q. The first one found is called. If no matching function is found, then the node is simply dropped on the floor in silent success.

The functions defined inside the visitor object can have an optional first parameter, which will be populated with the visited Qtree node. It can have an optional second parameter super, which will be a reference to whatever Qtree node's visitor function is immediately above this Qtree node's visitor function in the inheritance hierarchy. (For example, Q::Statement::If would have a super of Q::Statement, which would have a super of Q.)

The visitChildren is a convenience setting function, something like this:

sub visitChildren(q, visitor) {
    for children(q) -> child {
        visit(child, visitor);
    }
}

We don't have a children() setting function, but we totally could have.

Rename the value-getting subs to `value()`

Currently,

int(Q::Literal::Int(42));            # 42
str(Q::Literal::Str("Bond"));        # "Bond"
array(Q::Literal::Array([1, 2]));    # [1, 2]

At the time, I thought it kind of cute that we could re-use int and str as both coercers and destructors. Now it feels like the "cute today" kind of cute, though. (For some value of "today" in the past.)

I'd like to rename all these to value() instead. Reasons:

It's clearer what that actually does
It puts all literals under a unified interface, kind of (even Q::Literal::None!)
It looks better when written as method calls

Introduce a "defer" keyword

This one can wait for a while, but...

Perl 6 has a callsame keyword (and similar). The idea being that there is some "underlying" routine that we may want to dispatch back to. This happens in the following cases:

methods in the MRO chain
wrapper routines

007 has neither of those. But we do have a desire to override built-in functions, macros and operators. (Like in issue 12.) Getting ahold of the "old" thing and calling it is always possible by means of assigning to a variable:

my old;
old = somefn;
sub somefn() {
    if (weshouldhandle) {
        # do our thing
    }
    else {
        old();
    }
}

(We don't currently have else, but you get the idea.)

(If this were a macro, we'd have to do the assignment in a BEGIN block for it to have its correct value in time.)

I propose the keyword defer to avoid the boilerplate code with the variable. This is very implementable, since in the worst case we can always introduce a hidden symbol pointing back to the old routine. (But it should be possible to do better, since defer can be detected statically.)

A defer outside of a routine is always an error. So is a defer in a routine that doesn't shadow an older routine. All this is statically detectable.

Make parsing of calls to postdeclared subs work

Just discovered this:

$ perl6 bin/007 -e='sub foo() { say("OH HAI") }; foo()'
OH HAI
$ perl6 bin/007 -e='foo(); sub foo() { say("OH HAI") }'
Variable 'foo' is not declared
  [...]

We even have a test for this to work, and it does, on the AST level. The problem is that the parser does not allow the program through.

The immediate fix is to defer checking whether a variable is declared until we discover that we're not making a call to it. So the statement foo; is wrong if foo is not a declared variable, but the statement foo(); gets a free pass. (I'm not sure where that means the declared-variable check should happen. It should still happen ASAP. Probably somewhere inside the expression parser.)

Of course, this should still be an invalid program:

$ perl6 bin/007 -e='{ bar() }; BEGIN { say("never get here") }'

Because we still do make declared-variable checks, but we defer them to the end of the block.

I'm undecided about what we should do with this type of situation:

$ perl6 bin/007 -e='bar(); my bar = -> { say("OH HAI") };'
$ perl6 bin/007 -e='bar(); my bar = 4;'

Strongly leaning to emitting a custom error for that, something like "you called this thing as if it was postdeclared, but then you didn't use sub to postdeclare it". I went through Rakudo core's exception types, and I didn't find a good one for that already existing.

Make it possible to define array comprehensions

The idea is to be able to define a macro that makes this syntax legal.

my numbers = [1, 2, 3];
my squares = [ x <- numbers | x * x ];

Roughly, it'd happen something like this:

We define a macro term:<[>
It parses enough to find out if the first things after the [ are an identifier and an arrow <- (modulo whitespace)
If it isn't, then it defers control to the usual (right now hypothetical) term:<[> which knows how to parse normal arrays.

Assuming it succeeds, it parses the rest, and generates this code:

my res = [];
for numbers -> x {
    res.push(x * x);
}
res;

What's necessary for this to work?

Macro terms
Built-in terms being (conceptually) macros too, so that we can defer to them
Some is parsed functionality
A defer mechanism (which, in a given sub or macro, physically substitutes the current call with a call to the sub or macro that was shadowed)

And, crucially,

A way to avoid infinite macro recursion when the [] is parsed in the code to be generated.

I can see two ways to do that last bit. Either we don't write quasi code at all for that (because it's too complicated anyway), or we simply pull out [] into a constant (EMPTY_ARRAY, say) before the macro, and use that.

Eliminate some repetition in the setting

I just caught sight of this in the setting code:

abs      => -> $arg { Val::Int.new(:value($arg.value.abs)) },
min      => -> $a, $b { Val::Int.new(:value(min($a.value, $b.value))) },
max      => -> $a, $b { Val::Int.new(:value(max($a.value, $b.value))) },
chr      => -> $arg { Val::Str.new(:value($arg.value.chr)) },
ord      => -> $arg { Val::Int.new(:value($arg.value.ord)) },
chars    => -> $arg { Val::Int.new(:value($arg.value.Str.chars)) },
uc       => -> $arg { Val::Str.new(:value($arg.value.uc)) },
lc       => -> $arg { Val::Str.new(:value($arg.value.lc)) },
trim     => -> $arg { Val::Str.new(:value($arg.value.trim)) },
elems    => -> $arg { Val::Int.new(:value($arg.elements.elems)) },
reversed => -> $arg { Val::Array.new(:elements($arg.elements.reverse)) },
sorted   => -> $arg { Val::Array.new(:elements($arg.elements.sort)) },
join     => -> $a, $sep { Val::Str.new(:value($a.elements.join($sep.value.Str))) },
split    => -> $s, $sep { Val::Array.new(:elements($s.value.split($sep.value))) },
index    => -> $s, $substr { Val::Int.new(:value($s.value.index($substr.value) // -1)) },
substr   => sub ($s, $pos, $chars?) { Val::Str.new(:value($s.value.substr($pos.value, $chars.defined ?? $chars.value !! $s.value.chars))) },

Clearly we're manually wrapping a lot of these in the right Val type. How about we do that automatically once and for all somewhere, instead of in every single subroutine?

The easy way: define a wrap which does the right thing for the various Val types, and call it everywhere. Would still make things shorter.
The HOP way: define a wrap which takes a function that returns a Perl 6 value and returns a function that returns an appropriate Val value. Call it once from the loop that put-vars everything in the setting.

Oh, and I noticed two bugs while I was in there:

say      => -> $arg { self.output.say(~$arg) },
type     => sub ($arg) { return 'Sub' if $arg ~~ Val::Sub; $arg.^name.substr('Val::'.chars) },

type returns a Str, not a Val::Str. The HOP way mentioned above would make this implementation right without any further changes. say should proabably return None, but right now it is likely to return a Perl 6 Bool, because that's what $*OUT.say returns:

$ perl6 -e 'say (say "hi").^name'
hi
Bool

Update the tutorial with unquotes

We have them now. They're not very impressive yet, but they exist. The tutorial should mention them. :)

Get rid of the need for the .worthy-attributes hack

This is an internal thing, but kind of a technical debt issue.

method worthy-attributes {
    sub aname($attr) { $attr.name.substr(2) }
    sub avalue($attr) { $attr.get_value(self) }
    sub worthy($attr) {
        avalue($attr) !~~ Hash  # avoids showing static-lexpad
            && (aname($attr) ne "type" || avalue($attr) ne "")
    }

    return self.^attributes.grep(&worthy);
}

We filter away two things:

static-lexpad a thing in ~~Q::Statements~~ blocks ~~which we're not even sure belongs there or anywhere, really~~
Empty string type (because in the cases when they're empty, they don't add anything)

.worthy-attributes used to be just an inner convenience sub, at which point it was OK and nobody's business. But as part of 690f60a, it got promoted to a method, and this makes it ugly. To me it suggests that both static-lexpad and type should die a gruesome death so the method can be removed.

Removing static-lexpad is the big mystery. It's all tangled up in exactly how much "context" a Qtree really has (or should have). Related tickets are #20 and #47.

Removing type should be dead easy. I think it was there to make some nicer stringification, basically. But after #45 I bet it isn't even used. I could be wrong, though — maybe it's used in custom operator lookup? If it is, then the correct way forward is to allow it on all of them, again making this exception go away.

Weird test failure in t/integration/corner-cases.t

As part of the test reorganization for #17, the file t/syntax/corner-cases.t ended up in t/integration/corner-cases.t. It also absorbed a sole test from t/syntax/errors.t which looked like it belonged in a corner-cases.t test file anyway.

This additional test in corner-cases.t causes an earlier test to fail.

Yep, that's right. Adding this new test as test 24 in that file, cases test 22 in that file to start failing.

The test failure seems sensitive to small perturbations in exactly what is added. It does, however, seem very consistent and reproducible.

Runnning perl6 with --optimize=0 doesn't make the problem go away.

$ perl6 --version
This is perl6 version 2015.09-183-g3fb8178 built on MoarVM version 2015.09-39-g1434283

Change Q data dumps to conform to the new object constructor syntax

A piece of code like this:

macro foo() {
    my x = 7;
    return quasi {
        say(x);
    };
}

foo();

has a Qtree which currently stringifies to this:

Statements
  Macro[foo]
    Parameters

    Statements
      My
        Identifier[x]
        Infix[=]
          Identifier[x]
          Int[7]
      Return
        Quasi
          Statements
            Expr
              Call
                Identifier[say]
                Arguments
                  Identifier[x]
  Expr
    Call
      Identifier[say]
      Arguments
        Identifier[x]

That stringification has served us well, but I think it's time to replace it with something more 007-native: the new proposed object constructor syntax. That same Qtree above would then render as this:

Q::Statements [
    Q::Statement::Macro {
        ident: Q::Identifier "foo",
        parameters: Q::Parameters [],
        statements: Q::Statements [
            Q::Statement::My {
                ident: Q::Identifier "x",
                assignment: Q::Infix::Assignment {
                    lhs: Q::Ident "x",
                    rhs: Q::Literal::Int 7
                }
            },
            Q::Statement::Return {
                expr: Q::Quasi {
                    statements: [
                        Q::Statement::Expr {
                            expr: Q::Postfix::Call {
                                expr: Q::Identifier "say",
                                arguments: Q::Arguments [ 
                                    Q::Identifier "x"
                                ]
                            }
                        }
                    ]
                }
            }
        ]
    },
    Q::Statement::Expr {
        expr: Q::Postfix::Call {
            expr: Q::Identifier "say",
            arguments: Q::Arguments [
                Q::Identifier "x"
            ]
        }
    }
]

A bit more verbose. But also much more in line with where 007 is heading in general.

Implement a 007 runtime in 007

Before we do #38, we could aim for the much easier goal of having a runtime written completely in 007. Basically a port of _007::Runtime.

The parser would need a runtime to be able to do constants and begin blocks and declarations and macros anyway.
It's basically how 007 got started: with some ASTs and a dream.
It might also be useful for #14, if we ever implement it fully in 007.

TODO meta-issue

These TODO items were moved from the README.md file to here. Please open individual issues for things you wish to take on.

TODO

setting wishlist

Q:: constructors/destructors in the setting

subroutines for ops

man or boy

unquotes

This issue can be closed when each of the above TODO items either has an issue of its own, or has been taken care of anyway.

Add an examples/ directory

It would be nice to provide a number of different example uses of 007. Here are ones I can think of straight away:

Eventually we might also be able to add 007 runtime and 007 parser and spy to that collection, but that's outside of the scope of this issue.

I imagine we'll find a number of primitives missing when we go through an implement these. One I can think of immediately for those requiring interactivity is prompt.

Variable not recognized inside `if` statement when declared as a block parameter

$ perl6 bin/007 -e="if [1, 2, 3] -> a { say(a) }"
Variable 'a' is not declared
  in method find at /home/masak/mine/007/lib/_007/Runtime.pm:74
  in method get-var at /home/masak/mine/007/lib/_007/Runtime.pm:83
  in method eval at /home/masak/mine/007/lib/_007/Q.pm:69
  in method dispatch:<hyper> at src/gen/m-CORE.setting:1351
  in method eval at /home/masak/mine/007/lib/_007/Q.pm:196
  in method run at /home/masak/mine/007/lib/_007/Q.pm:243
  in method run at /home/masak/mine/007/lib/_007/Q.pm:438
  in method run at /home/masak/mine/007/lib/_007/Q.pm:261
  in method run at /home/masak/mine/007/lib/_007/Q.pm:438
  in method run at /home/masak/mine/007/lib/_007/Runtime.pm:28
  in sub run_007 at bin/007:8
  in sub MAIN at bin/007:15
  in sub MAIN at bin/007:11
  in block <unit> at bin/007:15

Closable with a fix and a regression test.

Some minor problems with array stringification

When I borrowed Python's way of stringifying arrays, I thought printing the elements as-is was all it did. But it also does this:

$ python3 -c 'print(str([1, 2, "foo"]))'
[1, 2, 'foo']

We currently don't:

$ perl6 bin/007 -e='say([1, 2, "foo"])'
[1, 2, foo]

I hereby declare 007's current behavior Wrong and Python's Right. Hence, this is a bug.

Note that since we only support double quotes right now in 007, the more sensible output for us would be [1, 2, "foo"] with double quotes.

Also note that this feature is a bit "asymmetrical" in that the quote signs around strings only show up when the string is nested inside of something:

$ python3 -c 'print(str("foo"))'
foo
$ python3 -c 'print(str(["foo"]))'
['foo']

Which means one code path for stringification of a Val::Str, and another code path for stringification of a Val::Str nested in a Val::Array.

Lastly, note that Python doesn't just smack on quotes at the ends; it also escapes the string quotes and backslashes.

$ python3 -c 'print(""" '\'' " \\ """)'
 ' " \ 
$ python3 -c 'print(str([""" '\'' " \\ """]))'
[' \' " \\ ']

We should, too.

Make it possible to define the `amb` declarator

This is a tricky one. But awesome.

If we can override statement forms, then we can declare a new statement that starts with the amb keyword.

amb x <- [1, 2, 3, 4];
amb y <- [2, 3, 4];
assert x + y == 6;
assert x < y;
say(x);  # 2
say(y);  # 4

The code would translate internally to something very much like this:

for [1, 2, 3, 4] -> x {
    for [2, 3, 4] -> y {
        if !( x + y == 6 ) {
            next;
        }
        if !( x < y ) {
            next;
        }
        say(x);
        say(y);
    }
}

This one is tricky because amb is more global than local: it changes the code from where it's used to the end of the block. (Where do those for loops end? When the block the amb was used in ends.) Which kind of implies also hijacking/overriding the behavior of block/codeunit compilation.

Alternatively, maybe amb should be restricted to only be used in a certain block, call it ambable:

ambable {
    # put all your ambs here
}

Then we'd be in nested macros territory. I bet the book that vendethiel++ has recommended me will have more answers on this.

Similarly (though less radically), the assert keyword (which we don't have yet, but which could be defined using die (which we don't have yet) in a macro) would need to be hijacked too, since it gets converted to nexts in the generated code. Again, this might be easier to pull off inside an ambable block. Otherwise, the check has to be "did we already see an amb in this block?"

It may turn out that we as a (Perl 6) community decide that something like amb is too tricky to be a macro. (Especially as it does control flow.) And that it should instead be a slang. Note, though, that this is just a distinction; it doesn't provide any further insights into how it should work implementation-wise.

Subs and macros are never equal to themselves

These three all return 0, but I believe they should return 1.

$ perl6 bin/007 -e='sub foo() {}; say(foo == foo)'
0
$ perl6 bin/007 -e='macro foo() {}; say(foo == foo)'
0
$ perl6 bin/007 -e='say(infix:<+> == infix:<+>)'
0

Also, this one shouldn't fail. Nor should any of the other Q types. Generally, they should be compared "by value".

$ perl6 bin/007 -e='say(Q::Identifier("foo") == Q::Identifier("foo"))'
Cannot call equal-value(Q::Identifier, Q::Identifier); none of these signatures match:
    (Val, Val)
    (Val::None, Val::None)
    (Val::Int $r, Val::Int $l)
    (Val::Str $r, Val::Str $l)
    (Val::Array $r, Val::Array $l)

Arguably, this one should still be 0, though:

$ perl6 bin/007 -e='my a = []; for [1, 2] { sub fn() {}; a = [fn, a] }; say(a[1][0] == a[0])'
0

(That is, even those two Val::Subs were created from the same Q::Statement::Sub, they were created in different frames, and so they're different.)

Implement module imports

Why modules and module imports? Because eventually we'll want to play around with macros that affect the parsing context that imported them.

I have no reasonable basis for choosing either (Perl) use or (Python) import, so let's go with use because it's short. (But I will keep referring to them as "imports".)

There are two sorts of import. The form that takes an identifier loads something from 007 itself.

use Runtime;      # and 007 knows what this is and knows how to provide it

The form that takes a string literal loads something from a path relative to to loading script or module.

use "./Foo";      # loads Foo.007 in the script's directory
use "Bar/Baz";    # loads Baz.007 in the Bar/ subdirectory in the script's directory

In either case, a symbol gets installed in the lexical scope corresponding to the loaded module. (It's a compile-time error for the file name sans .007 extension to not be a valid identifier.)

Importing the module causes its 007 file to run. (Though in the case of internal 007 modules, this may be faked.) The symbol that gets installed is an object, but let's give it the object type Module. Its properties correspond to the variables defined in the topmost scope at the end of running the module.

A use counts as a variable declaration. Therefore it's a compile-time error to refer to an import before importing it, or to refer to an outer variable x and then importing x on top of it. Aside from this, use statements can occur anywhere. The import logic happens at BEGIN time.

It's fine for a module to import other modules. Paths keep being relative to the thing that does the importing. At any given time, we're pushed a number of compunits on a conceptual stack, waiting for the thing they imported to finish loading. It's an immediate compile-time error to try to import something that's already on that stack.

In the fullness of time, a module being loaded is meant to be able to influence its loading context more than just installing a single symbol into it. The exact mechanism for this I leave unspecified for now — but it could be something as simple as there always being a loader object available in a loaded module. Similarly, the parser of the loaded module could perhaps be accessed through a parser object.

While loops with pointy blocks

From what I can see, while loops can take pointy blocks:

token statement:while {
    while \s+ <xblock>
}

# "eXpr block"
token xblock {
    <EXPR> <pblock>
}

# "pointy block"
token pblock {
    | <lambda> <.newpad> <.ws>
        <parameters>
        <blockoid>
        <.finishpad>
    | <block>
}
token lambda { '->' }

But we never process the parameter(s) at all:

method run($runtime) { # in Q::Statement::While
    while truthy($.expr.eval($runtime)) {
        my $c = $.block.eval($runtime);
        $runtime.enter($c);
        $.block.statements.run($runtime);
        $runtime.leave;
    }
}

Also, I don't see any tests for it. We should have tests for at least the following things:

if we do -> x, then x is available inside the block
the value of x is whatever the expression evaluates to (that is, something truthy)
doing -> with no parameters is allowed (in analogy with our for loops)
doing -> with more than one parameter is a runtime error

Block befuddlement

I started trying to finish #20 by moving the static-lexpad property up one level, but ran into conceptual problems most easily described as "masak got confused".

For a long time I've been wondering if there's something fishy about how we do blocks in 007. This issue is here to investigate that.

First, here's a complete list of all our blocks.

Val::Block — a runtime value that can be stored and passed around
Q::Literal::Block — a term in an expression representing such a runtime value
Q::Statement::Block — a statement form which contains Q::Statements inside { ... }
- ...and Q::CompUnit, which is a subtype of Q::Statement::Block, even though its program-spanning { ... } braces are implicit

I have no quibbles with Val::Block and Q::Statement::Block. Their roles are pretty clear.

We got rid of block literals in #11, but Q::Literal::Block stayed. This is part of the whole confusion thing — why do we still have it when we don't have block literals?

Let's look at the places we currently use Q::Literal::Block:

In BEGIN, if, for, and while, all of which have a block with braces as part of their syntax.
In Q::Statement::Block, but only as a transitional thing to create a Val::Block

Hm. As for the first, I think we should simply rename s/Q::Literal::Block/Q::Block. These things are all blocks, but they're not block literals.

As for the second, Q::Statement::Block could also go back to wrapping a Q::Block. (Which means that either we accept that Q::CompUnit does too, or we make it not-a-subtype of Q::Statement::Block.)

Implement parentheses

The tutorial says that this should work:

10 + -(2 + int("3" ~ "4"))

But we currently don't parse parentheses at all.

Allow trailing comma in arrays

$ perl6 bin/007 -e='[1, 2, 3]; say("without trailing comma works")'
without trailing comma works
$ perl6 bin/007 -e='[1, 2, 3,];  say("with trailing comma works")'
Could not parse program

The second syntax should work and be identical to the first.

We're going to allow trailing comma in object literals, so we should allow it for arrays too.

Backslashes not handled correctly in strings

The following are right:

$ perl6 bin/007 -e='say("OH HAI")'
OH HAI
$ perl6 bin/007 -e='say(chars("OH HAI"))'
6

And these are all wrong:

$ perl6 bin/007 -e='say("OH \"HAI")'
OH \"HAI
$ perl6 bin/007 -e='say(chars("OH \"HAI"))'
8
$ perl6 bin/007 -e='say("OH \\HAI")'
OH \\HAI
$ perl6 bin/007 -e='say(chars("OH \\HAI"))'
8

(With my expectations being OH "HAI, 7, OH \HAI, 7.)

The simplest hypothesis that fits the data is that we correctly parse backslashes in strings, but we don't un-escape them correctly when we store them as values.

Introduce a Q::Compunit type

Q::Compunit would basically be very similar to Q::Literal::Block.

Right now the compunit is added at the last moment as the program is run.

    my $compunit = Val::Block.new(
        :$statements,
        :outer-frame(self.current-frame));
    self.enter($compunit);

Also, the Q::Statements type gets to hold the static lexpads, simply because there's no higher level to put it because we're missing a root level above statements to hang it on.

The AST tests would all need to be modified, but in a very mechanical fashion.

add back Q::CompUnit
put the static lexpads on blocks, not statementlists

Implement a None literal

I think we should just give up and have a None literal. It's currently the only value that we can't express in code.

Then someone could do

var b = None;

return None;

And these would be identical to simply not assigning or not returning.

Put in a syntax error explicitly for when you do `say <expr>`

I do this all the time, expecting that it'll work. It doesn't work in Python, and we're going to steal that doesn't-work part from Python.

$ python3
>>> print "OH HAI"
SyntaxError: invalid syntax

But we're going to steal a hypothetical error message from Perl 6. 😃 Something like Illegal use of listop function call syntax. Did you mean 'say(<expr>)' ?