vezel-dev / celerity Goto Github PK

View Code? Open in Web Editor NEW

7.0 2.0 1.0 2.15 MB

An expressive programming language for writing concurrent and maintainable software.

Home Page: https://docs.vezel.dev/celerity

License: BSD Zero Clause License

C# 98.56% Smalltalk 0.07% C 0.10% Shell 0.04% TypeScript 1.23%

celerity compiler csharp dotnet interpreter jit language gc runtime

celerity's People

Contributors

Stargazers

Watchers

Forkers

wilsonk

celerity's Issues

Disable runtime interop marshaling (.NET 7)

See: dotnet/runtime#60639

This is going to require some work in CsWin32, so probably won't happen immediately when .NET 7 is out.

Native garbage collector implementation

Implement an agent-local garbage collector that doesn't rely on the .NET GC at all.

Depends on:

Implement a basic set of code quality lints

(The warning from #31 should be turned into a lint.)

What's missing:

A pass that warns on undocumented public declarations.
A pass that warns on unused items (private declarations, parameters, bindings).
- This will require that we track references on Symbols (should be trivial to do).
A pass that warns on obviously dead code.
A pass that warns on tests that lack assert statements.

Depends on:

Language idea: Binary values and syntax

We probably need something like bit syntax from Erlang. Realistically, I don't think we need bit strings, just binaries (i.e. accessible as bytes only).

I'm not yet sure how the syntax would look here.

Block expression parsing has a few bugs

If attributes are parsed but no skipped tokens, we just kind of forget to do anything with the attributes.
We should require at least one statement in a block, per the grammar.

Implement a set of basic MIR and LIR optimizations

Constant propagation, value numbering, dead code elimination, strength reduction, etc. All the classic stuff.

Depends on:

Interactive submission parsing doesn't skip bad tokens properly

celerity/src/language/core/Syntax/LanguageParser.cs

Lines 408 to 418 in 4c64834

 private InteractiveDocumentSyntax ParseInteractiveDocument() 

 { 

 var subs = Builder<SubmissionSyntax>(); 

 while (Peek1() is { IsEndOfInput: false }) 

 subs.Add(ParseSubmission()); 

 var eoi = Expect(SyntaxTokenKind.EndOfInput); 

 return new(List(subs), eoi); 

 }

We need to skip bad tokens here or we'll get stuck.

Be more lazy when constructing `SourceTextLineList`

celerity/src/language/core/Text/SourceTextLineList.cs

Line 5 in 18c5d49

 // TODO: We should eventually be a bit more sophisticated with the lazy computation of this list. 

We probably don't need to compute lines for the whole file most of the time.

Reorient the syntax layer around text spans and defer source location resolution

This will be an essential step on the way to supporting incremental parsing and code refactoring in the future. Most text/syntax APIs will operate in terms of text spans, which will only be resolved to source locations (path, line, character) when needed - e.g. when printing diagnostics.

The text/syntax APIs will change to something like the following.

Vezel.Celerity.Text

 public readonly struct SourceLocation :
     IEquatable<SourceLocation>, IEqualityOperators<SourceLocation, SourceLocation, bool>
 {
+    public SourceTextSpan Span { get; }
 }
 public abstract class SourceText
 {
+    public SourceTextLineList Lines { get; }
-    public IEnumerable<SourceTextLine> EnumerateLines();
+    public override string ToString();
+    public string ToString(SourceTextSpan span);
 }
+// Internally caches line position information after it has been computed.
+public sealed class SourceTextLineList : IReadOnlyList<SourceTextLine>
+{
+    public SourceText Text { get; }
+    public int Count { get; }
+    public SourceTextLine this[int index] { get; }
+    public IEnumerator<SourceTextLine> GetEnumerator();
+}
 public readonly struct SourceTextLine :
     IEquatable<SourceTextLine>, IEqualityOperators<SourceTextLine, SourceTextLine, bool>
 {
-    public SourceLocation Location { get; }
-    public string Text { get; }
+    public SourceText Text { get; }
+    public SourceTextSpan Span { get; }
+    public int Line { get; }
+    // Calls Text.ToString(Span).
+    public override string ToString();
 }
+public readonly struct SourceTextSpan :
+    IEquatable<SourceTextSpan>, IEqualityOperators<SourceTextSpan, SourceTextSpan, bool>
+{
+    // Mostly the same stuff as on System.Range.
+}

Vezel.Celerity.Syntax

 public sealed class SyntaxAnalysis
 {
+    public SourceText Text { get; }
 }

Vezel.Celerity.Syntax.Tree

 public abstract class SyntaxItem
 {
+    // Internally stored as the parent on the root node.
+    public SyntaxAnalysis Analysis { get; }
+    public abstract SourceTextSpan Span { get; }
+    public abstract SourceTextSpan FullSpan { get; }
+    // Resolves path, line, and character location for Span by querying Analysis.Text.
+    public SourceLocation GetLocation();
 }
 public sealed class SyntaxTrivia : SyntaxItem
 {
-    public SourceLocation Location { get; }
+    // Computed from an internal position + Text.Length. Span and FullSpan are equivalent on trivia.
+    public override SourceTextSpan Span { get; }
+    public override SourceTextSpan FullSpan { get; }
 }
 public sealed class SyntaxToken : SyntaxItem
 {
-    public SourceLocation Location { get; }
+    // Computed from an internal position + Text.Length.
+    public override SourceTextSpan Span { get; }
+    // Computed from Span, and SyntaxTrivia.Span from LeadingTrivia/TrailingTrivia.
+    public override SourceTextSpan FullSpan { get; }
 }
 public abstract class SyntaxNode : SyntaxItem
 {
+    // Computed from SyntaxToken.Span from any descendant tokens.
+    public override SourceTextSpan Span { get; }
+    // Computed from SyntaxToken.FullSpan from any descendant tokens.
+    public override SourceTextSpan FullSpan { get; }
 }

Get rid of code duplication for escape sequence processing in the lexer

celerity/src/syntax/LanguageLexer.cs

Lines 224 to 227 in bf95975

 private ReadOnlyMemory<byte> CreateString(string text) 

 { 

 // TODO: Build up the string in ParseStringLiteral to avoid duplication of escape sequence processing logic.

Language idea: Consider removing the mandatory `mod` directive

Is there actually a good reason to have this in the language? Module lookup is probably not going to need it at all. Right now, the only function it serves is being able to attach certain well-known attributes to it. Perhaps we could find a different way to do that.

Basic LSP implementation

The syntax/semantic analysis APIs should now be sufficient for a basic LSP implementation. We don't need to be super ambitious here - just performing semantic highlighting would be a great first step.

Depends on:

Overhaul lint API to be less constraining

Not all lint passes will fit into the current model. Lints should just be passed a SemanticTree rather than being called on different node kinds.

This means that we will have to fundamentally rethink how we suppress lint diagnostics, since we currently update the lint configuration as we descend into the tree.

Support diagnostic IDs/names

All diagnostics issued by the compiler need to have a well-known ID (e.g. W0001, E0001, etc). Diagnostics from third-party analyses should use names (e.g. undocumented-declaration).

Implement parsing of union types and adjust syntax

This is currently completely missing. Same for the AST representation.

celerity/doc/language/syntactic-structure/types.md

Lines 3 to 26 in e9819a0

 ```ebnf 

 type ::= primary-type ('or' primary-type)* 

 return-type ::= none-type | 

  type 

 primary-type ::= any-type | 

  literal-type | 

  boolean-type | 

  integer-type | 

  real-type | 

  atom-type | 

  string-type | 

  reference-type | 

  handle-type | 

  module-type | 

  record-type | 

  error-type | 

  tuple-type | 

  array-type | 

  set-type | 

  map-type | 

  function-type | 

  agent-type | 

  nominal-type 

 ```

Implement semantic analysis of well-known attributes

@deprecated "reason"
- Must have a reason string.
- Allowed on modules, constant declarations, and function declarations.
@doc "text", @doc false
- Must have a documentation string, or false to explicitly mark as undocumented.
- Allowed on modules, constant declarations, and function declarations.
  - Allowed on private declarations, but has no effect.
@flaky "reason"
- Must have a reason string.
- Allowed on test declarations.
- Indicates that a test might fail and should not be counted as a failure if it does.
@ignore "reason"
- Must have a reason string.
- Allowed on test declarations.
- Indicates that a test should not be run.
@lint "name:severity"
- Must have a string literal containing the lint name and severity.
  - Severity is one of: none, warning, error
- Allowed anywhere.

Consider creating a larger set of standard diagnostic codes for missing tokens

celerity/src/language/core/StandardDiagnosticCodes.cs

Lines 31 to 44 in 0d5c2ab

 // TODO: Create more specific diagnostics for certain kinds of missing tokens. 

 public static DiagnosticCode ExpectedToken { get; } = CreateCode(); 

 public static DiagnosticCode MissingDeclaration { get; } = CreateCode(); 

 public static DiagnosticCode MissingStatement { get; } = CreateCode(); 

 public static DiagnosticCode MissingType { get; } = CreateCode(); 

 public static DiagnosticCode MissingExpression { get; } = CreateCode(); 

 public static DiagnosticCode MissingBinding { get; } = CreateCode(); 

 public static DiagnosticCode MissingPattern { get; } = CreateCode();

For comparison, Roslyn has a vast sea of error codes of this form starting with CS1001. I am not actually sure if doing this adds meaningful value for end users, though. How many are realistically going to look up the error code for a missing semicolon, or equals sign, or whatever? Presumably, the error message itself saying which exact token is missing should be enough?

Need to think on this more.

Handle underflow/overflow for real literals in the lexer

celerity/src/syntax/LanguageLexer.cs

Lines 209 to 216 in bf95975

 private static double CreateReal(string text) 

 { 

 // TODO: This can return PositiveInfinity/NegativeInfinity for overflow. We need to handle that. 

 return double.Parse( 

 text.Replace("_", null, StringComparison.Ordinal), 

 NumberStyles.AllowDecimalPoint | NumberStyles.AllowExponent, 

 CultureInfo.InvariantCulture); 

 }

Expand on language documentation

It only has the language grammar at the moment. It needs to be expanded into a proper language reference.

Language idea: Friend modules

A module A can declare that module B is a friend and so is allowed to access private members of A. The keyword is already reserved.

Something like this:

a.cel:

mod {
    friend B;

    fn foo() {
        42;
    }
}

b.cel:

mod {
    fn bar() {
        A.foo(); // OK; no panic.
    }
}

For this to work, the semantics of a field expression (. operator) would be changed to pass along the accessing module when looking up the member. The runtime would then check if the resolved module declares the accessing module as a friend.

This sounds inefficient, but I think object shapes and basic block versioning based on types would allow us to fully specialize most such cases. This feature can only realistically be prototyped and considered once we have a runtime capable of such optimizations.

This is tentatively approved for 2.0, pending prototyping.

Language idea: `try`/`catch` expressions

Right now, if you need to repeat a bunch of catch arms and/or error handling logic for a set of different calls, there isn't really a great solution. A try expression (keyword reserved) would address this:

let result = try {
    foo()?;
    bar()?;
    baz()?;
} catch {
    err AError { ... } -> ...,
    err BError { ... } -> ...,
    err CError { ... } -> ...,
};

The idea is fairly straightforward: Any error raised within a try block, whether from a ? call or a raise expression, will transfer control to the catch block, instead of propagating the error up the call stack.

Notably, to keep the complexity and performance of the feature under control, there is no unwinding. An error raised within a try block must be handled by the corresponding catch block, or a panic occurs. If the catch block wants to propagate the error, it must explicitly raise it again. This does allow nesting try blocks, but the control transfers between them is explicit, and the runtime does not need to search for handlers when an error is raised.

Approved for 1.0.

Implement semantic analysis

Depends on:

Semantic analysis means things like use resolution, variable name binding, lambda captures, local mutability checks, loop break/next binding, etc...

Type analysis is out of scope here.

Semantic analysis allows the last statement of a block expression to be a non-expression

Remove preview features annotations (.NET 7)

celerity/Directory.Build.props

Lines 12 to 13 in 21722f5

 <!-- TODO: Switch to latest-all in .NET 7. --> 

 <AnalysisLevel>preview-all</AnalysisLevel>

celerity/Directory.Build.props

Lines 22 to 23 in 21722f5

 <!-- TODO: Switch to latest in .NET 7. --> 

 <LangVersion>preview</LangVersion>

celerity/Directory.Build.props

Lines 38 to 39 in 21722f5

 <!-- TODO: Remove in .NET 7. --> 

 <SuppressNETCoreSdkPreviewMessage>true</SuppressNETCoreSdkPreviewMessage>

Initial interpreter and standard library essentials

In the interest of being able to run Celerity code ASAP and getting a suite of behavior tests done, the initial interpreter implementation will lean heavily on the .NET runtime for garbage collection and data structures (BigInteger, List<T>, Dictionary<TKey, TValue>, HashSet<T> etc). Eventually, these components will be swapped with native ones shared between the interpreter and JIT compiler.

Some essentials of the standard library will need to be implemented - mostly just stuff for manipulating the various data types of the language and interacting with agents.

Partially depends on:

Explore gradual type checking and success typings

Depends on:

Optimize syntax tree traversal methods

celerity/src/syntax/Tree/SyntaxNode.cs

Lines 66 to 81 in bf95975

 // TODO: Optimize some of these (e.g. avoid descending into trivia and tokens when possible). 

 public IEnumerable<SyntaxNode> DescendantNodes() 

 { 

 return Descendants().OfType<SyntaxNode>(); 

 } 

 public IEnumerable<SyntaxToken> DescendantTokens() 

 { 

 return Descendants().OfType<SyntaxToken>(); 

 } 

 public IEnumerable<SyntaxTrivia> DescendantTrivia() 

 { 

 return Descendants().OfType<SyntaxTrivia>(); 

 }

Language idea: Function contracts

It would be interesting to see if there's something we can do in this space. We have the assert statement currently, but I think there's room for a more principled feature here.

The feature would need to support preconditions and postconditions (with access to the return value). Preconditions would run before any code in the function, while postconditions would need to run after any defer and use statements in the function. Postconditions would only run when an error is not raised from the function.

Compile-time contract checking would be out of scope initially, but could always be done on a best-effort basis later down the line.

I have no idea what the syntax would look like yet.

Language idea: Generators (`yield fn`, `yield ret`, `yield break`)

The keyword is already reserved.

Something like:

yield fn range(x, y) {
    if y >= x {
        yield break;
    };
    let mut i = x;
    while i < y {
        yield ret i;
        i = i + 1;
    };
}

yield fns may use yield ret and yield break; normal fns may not.
yield fns must have at least one yield ret or yield break expression.
yield fns may not use raise expressions, normal ret expressions, and error-propagating calls.
yield fns do not have an implicit return value like normal fns.
yield fn is mutually exclusive with ext fn and err fn.
yield fn lambdas are supported.

The transformation into a state machine will happen when the module is loaded by the runtime. If a yield fn passes semantic analysis, it must be transformable.

This is tentatively approved for 2.0.

Warn when integer literals use an uppercase base prefix

People should prefer 0o123 over 0O123, 0b101 over 0B101, etc.

Language idea: `rec with`/`err` ... `with` expressions

Fairly straightforward feature:

let r1 = rec {
    x = 1,
    y = 2,
};
let r2 = rec with r1 {
    x = 4,
    mut y = nil,
    z = 3,
};
assert r2 == rec {
    x = 4,
    y = nil,
    z = 3,
};
r2.y = 5; // OK; no panic.

(Variation on syntax suggested by @Roukanken42.)

A with expression (keyword reserved) basically just clones a record or error value and adds/replaces the specified fields.

Approved for 1.0.

Language idea: Allow `mut` on parameters

fn foo(mut bar) {
    bar = 42;
    bar;
}
assert foo("hi") == 42;

Basically just using a pattern-binding in parameter grammar rules instead of a bespoke binding rule.

Language idea: `const` expressions

A const expression would be to a const declaration what an fn (lambda) expression is to an fn declaration. Basically, it is just an anonymous constant. It has all the same semantics that a regular constant does, but is anonymous and embedded directly in code.

Enable running benchmarks in CI for testing purposes

celerity/.github/workflows/build.yml

Line 45 in 8418984

# dotnet run --project src/benchmarks -c ${{ matrix.cfg }} --no-build

Once we actually have some benchmarks.

Design and implement the shared linear IRs (HIR, MIR, LIR)

There will be 3 IRs in the runtime core. They will all be shared between the interpreter and the JIT compiler. Initially, we will implement HIR and MIR only, with the interpreter prototype (#58) consuming MIR. Later, as we reduce dependence on .NET types, we will implement and consume LIR in the interpreter. Finally, the JIT compiler will be implemented, which will transform LIR to AIR (#81), and then compile AIR to machine code.

The runtime will be based on lazy basic block versioning:

This is important to know to understand the IR design and behaviors described below.

HIR

High-Level IR (HIR) is the first intermediate representation. It mainly focuses on linearizing the code, turning it into SSA form, and desugaring some high-level language concepts. HIR is constructed from the semantic tree upfront when a module is loaded, and never changes after that.

HIR features basic blocks (with parameters), upvalues, constants, and operations as building blocks. Operation value operands can be upvalues, basic block parameters, constants, and (non-void) operations; there are no explicit variables or temporaries. Code is in SSA form, with basic block parameters serving as Φ nodes. There is no propagation of explicit or inferred type information at this stage, but all type tests are made explicit.

Lowering to HIR gets rid of some high-level language concepts like pattern matching, agent send/receive syntax, defer statements, for expressions, try expressions, etc.

The HIR data structures will be very minimalistic and will not be amenable to analysis. For example, there will be no use/definition chains. HIR is only really meant to be walked during lowering to MIR. In other words, HIR serves as a template for specialization in MIR.

MIR

Mid-Level IR (MIR) is where type specialization and most optimizations happen. It is similar to HIR in the building blocks it has, but unlike HIR, everything now carries type information. Types are gathered from the running program through basic block versioning, entry point versioning, value shapes, etc.

Lowering from HIR to MIR happens on demand as the program executes code, and is done with basic block granularity. Due to type specialization, there can be many different versions of MIR code for a given HIR (extended) basic block. Lowering proceeds until a type test is encountered that cannot be resolved with the available type information, or until the end of the function is encountered.

MIR will maintain use/definition chains and various other data structures that simplify transformation of the code. This will facilitate a classic set of optimizations (#61) that can be performed now that type information is available.

LIR

Low-Level IR (LIR) decomposes managed values into their constituent raw value and shape words. LIR mostly has the same building blocks as HIR and MIR, but at this stage, the only types that exist are 64-bit integers, 64-bit floats, and untyped pointers. All high-level operations will have been decomposed to primitive CPU-like operations. LIR is essentially a simple register transfer language.

LIR allows certain optimizations that would be harder to express at the MIR level. For example, in a series of small integer operations, it's obvious that copying the shape word for every intermediate operation is unnecessary. Yet, because MIR only operates on managed values, this notion cannot be expressed. At the LIR level, it is trivial to detect and remove such copies.

Note that, while LIR is very close to the machine, it is not architecture-specific.

Expand test suite

Create a harness for testing the command line driver.
Create a test for setting lint severity with @lint attributes.
Create more tests for the undocumented-public-declaration pass.

Language idea: More advanced string literals

We need to come up with a design for more advanced string literals. In particular, string literals that can span multiple lines are frequently useful.

String interpolation is out of scope for this.

Long term: Rewrite the native support library in Zig

This can happen when Zig reaches 1.0 (or at least when the language design is effectively frozen for 1.0).

Consider expanding `SyntaxItemList<T>` and `SeparatedSyntaxItemList<TElement, TSeparator>` API surface

We can at least expose Span and FullSpan properties, as well as ToString() and ToFullString() methods.

This begs the question, though: Should we also expose GetText() and GetFullText()? That would require a Parent property. But then that might signal that the list is a node in the tree, which is not actually the case. We'd then also have to consider tree traversal methods, at which point these lists start to look an awful lot like SyntaxItems in their own right...

Need to think on this one.

Switch to CommandLineParser and Cathode

We will eventually want the standard library's console API to be oriented around a terminal. The Spectre.Console API sits at too high a level for this to be practical. Further, using Spectre.Console.Cli locks us into using Spectre.Console.

We should switch to CommandLineParser as it allows us to supply a TextWriter, effectively decoupling it from any particular console API. Then, we can use Cathode for all the low-level console interaction.

Avoid keeping the `SourceText` instance around in `SyntaxTree`

celerity/src/language/core/Syntax/SyntaxAnalysis.cs

Lines 6 to 9 in 18c5d49

 public sealed class SyntaxAnalysis 

 { 

 // TODO: We should eventually get rid of this. When we need the source text, we can reconstruct it from the tree. 

 public SourceText Text { get; }

This is neat because, for the happy case, we don't need to access the SourceText at all. Only when there are diagnostics do we need to access line information from the SourceText, and it's reasonable to just reconstruct it for those cases.

Even then, we still have to keep the SourceText around during parsing in order to construct locations for diagnostics. To remedy that, we should probably also consider changing SourceDiagnostic to not carry a SourceLocation, but rather a SourceTextSpan and a SyntaxItem reference. A SourceDiagnostic.GetLocation() method could then be exposed which would resolve the SourceLocation.

Cache repeated string instances in the lexer

When lexing a typical source file, there's going to be a lot of repeated strings - identifiers, literals, white space, and so on. We can't intern these, but it would make good sense to cache tokens up to a certain length and return the same instance instead of building them up repeatedly.

To implement this, instead of building up the token string in a StringBuilder, we would keep track of where the token starts and ends. When creating the token, if the length is below our caching threshold, we first look it up in the token cache. For larger tokens, we shouldn't bother as the lookup will take too long to be worth it.

Implement lexing and parsing

Nothing particularly difficult here. Just (very) boring implementation work.

Notably, though, we should try to do something to prevent stack overflows in the parser. Recursiont looks interesting here.

Implement lexing and document grammar for verbatim/block string literals

See:

Sort of depends on:

Consider merging language analysis assemblies into a single assembly

Splitting the language analysis layers into 5 separate assemblies might have been a bit overkill. Consider a new Vezel.Celerity.Analysis project with the following namespaces consolidated:

Vezel.Celerity.Quality
Vezel.Celerity.Quality.Passes
Vezel.Celerity.Semantics
Vezel.Celerity.Semantics.Binding
Vezel.Celerity.Semantics.Tree
Vezel.Celerity.Syntax
Vezel.Celerity.Syntax.Tree
Vezel.Celerity.Text
Vezel.Celerity.Typing

Support a `celerity.json` project configuration file

Something like:

{
    "name": "my-app", // Unique project identifier.
    "path": "src", // Optional path containing the project's own source files. Defaults to src.
    "kind": "executable", // Optional project kind (executable, library). Defaults to executable.
    "license": "0BSD", // Optional SPDX license expression.
    "version": "1.0.0", // Optional Semantic Versioning 2.0.0 version. Defaults to 0.0.0.

    // List of module search paths. The runtime will match the module path against the
    // prefixes listed here and then look up the remainder of the module path in the
    // specified directory.
    //
    // So e.g. LibA::Foo would find LibA here and then locate dep/lib-a/src/foo.cel,
    // whereas Company::LibB::Bar::Baz would locate dep/lib-b/src/bar/baz.cel.
    "paths": {
        "MyApp": "src", // Only necessary if the app itself uses e.g. MyApp::Main.
        "LibA": "dep/lib-a/src",
        "Company::LibB": "dep/lib-b/src",
    },

    // Overrides default lint severities.
    "lints": {
        "unused-local-symbol": null, // Don't run this pass at all.
        "test-without-assert": "none", // Hide diagnostics from this pass.
        "unreachable-code": "error" // Promote diagnostics from this pass to errors.
    }
}

The tooling APIs will pick this up and use it appropriately for the various celerity CLI commands.

Note that nothing about this file will flow transitively; we're intentionally keeping things super simple. A top-level executable project will have to declare module search paths for all dependencies, direct or transitive, that it needs. Also, the file is completely optional.

Language idea: Macros and AST quotation

The macro, quote, and unquote keywords are reserved for future exploration in this space.

Some obvious issues to tackle here:

How will macros work with a module system that loads modules lazily?
Should macros be able to generate declarations? Types? Statements?
- I lean towards only allowing expression macros.
Do we want to formally specify the AST produced by quote expressions?
Should macros be hygienic? If so, how much?
Do we give up on type analysis when encountering macros?

Implement better error recovery for parsing of separated syntax lists

celerity/src/syntax/LanguageParser.cs

Lines 209 to 265 in 53f45a6

 private (ImmutableArray<T>.Builder Elements, ImmutableArray<SyntaxToken>.Builder Separators) ParseSeparatedList<T>( 

 Func<LanguageParser, T> parser, 

 SyntaxTokenKind separator, 

 SyntaxTokenKind closer, 

 bool allowEmpty, 

 bool allowTrailing) 

 where T : SyntaxNode 

 { 

 // TODO: The way we parse a parameter list (and other similar syntax nodes) causes the parser to misinterpret 

 // the entire function body for some invalid inputs. We need to do better here. 

 var result = SeparatedBuilder<T>(); 

 var (elems, seps) = result; 

 bool NextIsRelevant() 

 { 

 return Peek1() is { IsEndOfInput: false } next && next.Kind != closer; 

 } 

 if (!allowTrailing) 

 { 

 if (allowEmpty && !NextIsRelevant()) 

 return result; 

 elems.Add(parser(this)); 

 while (Optional(separator) is { } sep) 

 { 

 seps.Add(sep); 

 elems.Add(parser(this)); 

 } 

 return result; 

 } 

 if (!allowEmpty) 

 { 

 elems.Add(parser(this)); 

 if (Optional(separator) is not { } sep) 

 return result; 

 seps.Add(sep); 

 } 

 while (NextIsRelevant()) 

 { 

 elems.Add(parser(this)); 

 if (Optional(separator) is not { } sep2) 

 break; 

 seps.Add(sep2); 

 } 

 return result; 

 }

Provide a way to specify existing bindings when analyzing an interactive document

Also, we need to process let statements in a similar fashion to what we do in block expressions:

celerity/src/language/core/Semantics/LanguageAnalyzer.cs

Lines 279 to 284 in 8172942

 public override InteractiveDocumentSemantics VisitInteractiveDocument(InteractiveDocumentSyntax node) 

 { 

 var subs = ConvertList(node.Submissions, static (@this, sub) => @this.VisitSubmission(sub)); 

 return new(node, subs); 

 }

celerity/src/language/core/Semantics/LanguageAnalyzer.cs

Lines 854 to 884 in 8172942

 public override BlockExpressionSemantics VisitBlockExpression(BlockExpressionSyntax node) 

 { 

 using var ctx = PushScope<BlockScope>(); 

 var stmts = Builder<StatementSemantics>(node.Statements.Count); 

 // Let statements are somewhat special in that they introduce a 'horizontal' scope in the tree; that is, 

 // bindings in a let statement become available to siblings to the right of the let statement. 

 var lets = new List<ScopeContext<Scope>>(); 

 var defers = ctx.Scope.DeferStatements; 

 foreach (var stmt in node.Statements) 

 { 

 if (stmt is LetStatementSyntax) 

 lets.Add(PushScope<Scope>()); 

 var sema = VisitStatement(stmt); 

 if (sema is DeferStatementSemantics defer) 

 defers.Add(defer); 

 stmts.Add(sema); 

 } 

 for (var i = lets.Count - 1; i >= 0; i--) 

 lets[i].Dispose(); 

 defers.Reverse(); 

 return new(node, List(node.Statements, stmts), defers.DrainToImmutable()); 

 }

Language idea: Self-referential functions/lambdas with `this`

A feature from Charm:

let forever = fn() -> this();
forever();

	private InteractiveDocumentSyntax ParseInteractiveDocument()
	{
	var subs = Builder<SubmissionSyntax>();

	while (Peek1() is { IsEndOfInput: false })
	subs.Add(ParseSubmission());

	var eoi = Expect(SyntaxTokenKind.EndOfInput);

	return new(List(subs), eoi);
	}

	private ReadOnlyMemory<byte> CreateString(string text)
	{
	// TODO: Build up the string in ParseStringLiteral to avoid duplication of escape sequence processing logic.

	```ebnf
	type ::= primary-type ('or' primary-type)*
	return-type ::= none-type \|
	type
	primary-type ::= any-type \|
	literal-type \|
	boolean-type \|
	integer-type \|
	real-type \|
	atom-type \|
	string-type \|
	reference-type \|
	handle-type \|
	module-type \|
	record-type \|
	error-type \|
	tuple-type \|
	array-type \|
	set-type \|
	map-type \|
	function-type \|
	agent-type \|
	nominal-type
	```

	// TODO: Create more specific diagnostics for certain kinds of missing tokens.
	public static DiagnosticCode ExpectedToken { get; } = CreateCode();

	public static DiagnosticCode MissingDeclaration { get; } = CreateCode();

	public static DiagnosticCode MissingStatement { get; } = CreateCode();

	public static DiagnosticCode MissingType { get; } = CreateCode();

	public static DiagnosticCode MissingExpression { get; } = CreateCode();

	public static DiagnosticCode MissingBinding { get; } = CreateCode();

	public static DiagnosticCode MissingPattern { get; } = CreateCode();

	private static double CreateReal(string text)
	{
	// TODO: This can return PositiveInfinity/NegativeInfinity for overflow. We need to handle that.
	return double.Parse(
	text.Replace("_", null, StringComparison.Ordinal),
	NumberStyles.AllowDecimalPoint \| NumberStyles.AllowExponent,
	CultureInfo.InvariantCulture);
	}

	<!-- TODO: Switch to latest-all in .NET 7. -->
	<AnalysisLevel>preview-all</AnalysisLevel>

	<!-- TODO: Switch to latest in .NET 7. -->
	<LangVersion>preview</LangVersion>

	<!-- TODO: Remove in .NET 7. -->
	<SuppressNETCoreSdkPreviewMessage>true</SuppressNETCoreSdkPreviewMessage>

	// TODO: Optimize some of these (e.g. avoid descending into trivia and tokens when possible).

	public IEnumerable<SyntaxNode> DescendantNodes()
	{
	return Descendants().OfType<SyntaxNode>();
	}

	public IEnumerable<SyntaxToken> DescendantTokens()
	{
	return Descendants().OfType<SyntaxToken>();
	}

	public IEnumerable<SyntaxTrivia> DescendantTrivia()
	{
	return Descendants().OfType<SyntaxTrivia>();
	}

	public sealed class SyntaxAnalysis
	{
	// TODO: We should eventually get rid of this. When we need the source text, we can reconstruct it from the tree.
	public SourceText Text { get; }

	private (ImmutableArray<T>.Builder Elements, ImmutableArray<SyntaxToken>.Builder Separators) ParseSeparatedList<T>(
	Func<LanguageParser, T> parser,
	SyntaxTokenKind separator,
	SyntaxTokenKind closer,
	bool allowEmpty,
	bool allowTrailing)
	where T : SyntaxNode
	{
	// TODO: The way we parse a parameter list (and other similar syntax nodes) causes the parser to misinterpret
	// the entire function body for some invalid inputs. We need to do better here.

	var result = SeparatedBuilder<T>();
	var (elems, seps) = result;

	bool NextIsRelevant()
	{
	return Peek1() is { IsEndOfInput: false } next && next.Kind != closer;
	}

	if (!allowTrailing)
	{
	if (allowEmpty && !NextIsRelevant())
	return result;

	elems.Add(parser(this));

	while (Optional(separator) is { } sep)
	{
	seps.Add(sep);
	elems.Add(parser(this));
	}

	return result;
	}

	if (!allowEmpty)
	{
	elems.Add(parser(this));

	if (Optional(separator) is not { } sep)
	return result;

	seps.Add(sep);
	}

	while (NextIsRelevant())
	{
	elems.Add(parser(this));

	if (Optional(separator) is not { } sep2)
	break;

	seps.Add(sep2);
	}

	return result;
	}

	public override InteractiveDocumentSemantics VisitInteractiveDocument(InteractiveDocumentSyntax node)
	{
	var subs = ConvertList(node.Submissions, static (@this, sub) => @this.VisitSubmission(sub));

	return new(node, subs);
	}

	public override BlockExpressionSemantics VisitBlockExpression(BlockExpressionSyntax node)
	{
	using var ctx = PushScope<BlockScope>();

	var stmts = Builder<StatementSemantics>(node.Statements.Count);

	// Let statements are somewhat special in that they introduce a 'horizontal' scope in the tree; that is,
	// bindings in a let statement become available to siblings to the right of the let statement.
	var lets = new List<ScopeContext<Scope>>();
	var defers = ctx.Scope.DeferStatements;

	foreach (var stmt in node.Statements)
	{
	if (stmt is LetStatementSyntax)
	lets.Add(PushScope<Scope>());

	var sema = VisitStatement(stmt);

	if (sema is DeferStatementSemantics defer)
	defers.Add(defer);

	stmts.Add(sema);
	}

	for (var i = lets.Count - 1; i >= 0; i--)
	lets[i].Dispose();

	defers.Reverse();

	return new(node, List(node.Statements, stmts), defers.DrainToImmutable());
	}

vezel-dev / celerity Goto Github PK

celerity's People

Contributors

Stargazers

Watchers

Forkers

celerity's Issues

Vezel.Celerity.Text

Vezel.Celerity.Syntax

Vezel.Celerity.Syntax.Tree

HIR

MIR

LIR

Recommend Projects

Recommend Topics

Recommend Org

Jobs