vezel-dev / celerity Goto Github PK
View Code? Open in Web Editor NEWAn expressive programming language for writing concurrent and maintainable software.
Home Page: https://docs.vezel.dev/celerity
License: BSD Zero Clause License
An expressive programming language for writing concurrent and maintainable software.
Home Page: https://docs.vezel.dev/celerity
License: BSD Zero Clause License
See: dotnet/runtime#60639
This is going to require some work in CsWin32, so probably won't happen immediately when .NET 7 is out.
(The warning from #31 should be turned into a lint.)
What's missing:
Symbol
s (should be trivial to do).assert
statements.Depends on:
Constant propagation, value numbering, dead code elimination, strength reduction, etc. All the classic stuff.
Depends on:
celerity/src/language/core/Syntax/LanguageParser.cs
Lines 408 to 418 in 4c64834
We need to skip bad tokens here or we'll get stuck.
We probably don't need to compute lines for the whole file most of the time.
This will be an essential step on the way to supporting incremental parsing and code refactoring in the future. Most text/syntax APIs will operate in terms of text spans, which will only be resolved to source locations (path, line, character) when needed - e.g. when printing diagnostics.
The text/syntax APIs will change to something like the following.
public readonly struct SourceLocation :
IEquatable<SourceLocation>, IEqualityOperators<SourceLocation, SourceLocation, bool>
{
+ public SourceTextSpan Span { get; }
}
public abstract class SourceText
{
+ public SourceTextLineList Lines { get; }
- public IEnumerable<SourceTextLine> EnumerateLines();
+ public override string ToString();
+ public string ToString(SourceTextSpan span);
}
+// Internally caches line position information after it has been computed.
+public sealed class SourceTextLineList : IReadOnlyList<SourceTextLine>
+{
+ public SourceText Text { get; }
+ public int Count { get; }
+ public SourceTextLine this[int index] { get; }
+ public IEnumerator<SourceTextLine> GetEnumerator();
+}
public readonly struct SourceTextLine :
IEquatable<SourceTextLine>, IEqualityOperators<SourceTextLine, SourceTextLine, bool>
{
- public SourceLocation Location { get; }
- public string Text { get; }
+ public SourceText Text { get; }
+ public SourceTextSpan Span { get; }
+ public int Line { get; }
+ // Calls Text.ToString(Span).
+ public override string ToString();
}
+public readonly struct SourceTextSpan :
+ IEquatable<SourceTextSpan>, IEqualityOperators<SourceTextSpan, SourceTextSpan, bool>
+{
+ // Mostly the same stuff as on System.Range.
+}
public sealed class SyntaxAnalysis
{
+ public SourceText Text { get; }
}
public abstract class SyntaxItem
{
+ // Internally stored as the parent on the root node.
+ public SyntaxAnalysis Analysis { get; }
+ public abstract SourceTextSpan Span { get; }
+ public abstract SourceTextSpan FullSpan { get; }
+ // Resolves path, line, and character location for Span by querying Analysis.Text.
+ public SourceLocation GetLocation();
}
public sealed class SyntaxTrivia : SyntaxItem
{
- public SourceLocation Location { get; }
+ // Computed from an internal position + Text.Length. Span and FullSpan are equivalent on trivia.
+ public override SourceTextSpan Span { get; }
+ public override SourceTextSpan FullSpan { get; }
}
public sealed class SyntaxToken : SyntaxItem
{
- public SourceLocation Location { get; }
+ // Computed from an internal position + Text.Length.
+ public override SourceTextSpan Span { get; }
+ // Computed from Span, and SyntaxTrivia.Span from LeadingTrivia/TrailingTrivia.
+ public override SourceTextSpan FullSpan { get; }
}
public abstract class SyntaxNode : SyntaxItem
{
+ // Computed from SyntaxToken.Span from any descendant tokens.
+ public override SourceTextSpan Span { get; }
+ // Computed from SyntaxToken.FullSpan from any descendant tokens.
+ public override SourceTextSpan FullSpan { get; }
}
celerity/src/syntax/LanguageLexer.cs
Lines 224 to 227 in bf95975
Is there actually a good reason to have this in the language? Module lookup is probably not going to need it at all. Right now, the only function it serves is being able to attach certain well-known attributes to it. Perhaps we could find a different way to do that.
The syntax/semantic analysis APIs should now be sufficient for a basic LSP implementation. We don't need to be super ambitious here - just performing semantic highlighting would be a great first step.
Depends on:
Not all lint passes will fit into the current model. Lints should just be passed a SemanticTree
rather than being called on different node kinds.
This means that we will have to fundamentally rethink how we suppress lint diagnostics, since we currently update the lint configuration as we descend into the tree.
All diagnostics issued by the compiler need to have a well-known ID (e.g. W0001
, E0001
, etc). Diagnostics from third-party analyses should use names (e.g. undocumented-declaration
).
This is currently completely missing. Same for the AST representation.
celerity/doc/language/syntactic-structure/types.md
Lines 3 to 26 in e9819a0
@deprecated "reason"
@doc "text"
, @doc false
false
to explicitly mark as undocumented.@flaky "reason"
@ignore "reason"
@lint "name:severity"
none
, warning
, error
celerity/src/language/core/StandardDiagnosticCodes.cs
Lines 31 to 44 in 0d5c2ab
For comparison, Roslyn has a vast sea of error codes of this form starting with CS1001. I am not actually sure if doing this adds meaningful value for end users, though. How many are realistically going to look up the error code for a missing semicolon, or equals sign, or whatever? Presumably, the error message itself saying which exact token is missing should be enough?
Need to think on this more.
celerity/src/syntax/LanguageLexer.cs
Lines 209 to 216 in bf95975
It only has the language grammar at the moment. It needs to be expanded into a proper language reference.
A module A
can declare that module B
is a friend and so is allowed to access private members of A
. The keyword is already reserved.
Something like this:
a.cel
:
mod {
friend B;
fn foo() {
42;
}
}
b.cel
:
mod {
fn bar() {
A.foo(); // OK; no panic.
}
}
For this to work, the semantics of a field expression (.
operator) would be changed to pass along the accessing module when looking up the member. The runtime would then check if the resolved module declares the accessing module as a friend.
This sounds inefficient, but I think object shapes and basic block versioning based on types would allow us to fully specialize most such cases. This feature can only realistically be prototyped and considered once we have a runtime capable of such optimizations.
This is tentatively approved for 2.0, pending prototyping.
Right now, if you need to repeat a bunch of catch
arms and/or error handling logic for a set of different calls, there isn't really a great solution. A try
expression (keyword reserved) would address this:
let result = try {
foo()?;
bar()?;
baz()?;
} catch {
err AError { ... } -> ...,
err BError { ... } -> ...,
err CError { ... } -> ...,
};
The idea is fairly straightforward: Any error raised within a try
block, whether from a ?
call or a raise
expression, will transfer control to the catch
block, instead of propagating the error up the call stack.
Notably, to keep the complexity and performance of the feature under control, there is no unwinding. An error raised within a try
block must be handled by the corresponding catch
block, or a panic occurs. If the catch
block wants to propagate the error, it must explicitly raise
it again. This does allow nesting try
blocks, but the control transfers between them is explicit, and the runtime does not need to search for handlers when an error is raised.
Approved for 1.0.
Depends on:
Semantic analysis means things like use
resolution, variable name binding, lambda captures, local mutability checks, loop break
/next
binding, etc...
Type analysis is out of scope here.
celerity/Directory.Build.props
Lines 12 to 13 in 21722f5
celerity/Directory.Build.props
Lines 22 to 23 in 21722f5
celerity/Directory.Build.props
Lines 38 to 39 in 21722f5
In the interest of being able to run Celerity code ASAP and getting a suite of behavior tests done, the initial interpreter implementation will lean heavily on the .NET runtime for garbage collection and data structures (BigInteger
, List<T>
, Dictionary<TKey, TValue>
, HashSet<T>
etc). Eventually, these components will be swapped with native ones shared between the interpreter and JIT compiler.
Some essentials of the standard library will need to be implemented - mostly just stuff for manipulating the various data types of the language and interacting with agents.
Partially depends on:
Depends on:
celerity/src/syntax/Tree/SyntaxNode.cs
Lines 66 to 81 in bf95975
It would be interesting to see if there's something we can do in this space. We have the assert
statement currently, but I think there's room for a more principled feature here.
The feature would need to support preconditions and postconditions (with access to the return value). Preconditions would run before any code in the function, while postconditions would need to run after any defer
and use
statements in the function. Postconditions would only run when an error is not raised from the function.
Compile-time contract checking would be out of scope initially, but could always be done on a best-effort basis later down the line.
I have no idea what the syntax would look like yet.
The keyword is already reserved.
Something like:
yield fn range(x, y) {
if y >= x {
yield break;
};
let mut i = x;
while i < y {
yield ret i;
i = i + 1;
};
}
yield fn
s may use yield ret
and yield break
; normal fn
s may not.yield fn
s must have at least one yield ret
or yield break
expression.yield fn
s may not use raise
expressions, normal ret
expressions, and error-propagating calls.yield fn
s do not have an implicit return value like normal fn
s.yield fn
is mutually exclusive with ext fn
and err fn
.yield fn
lambdas are supported.The transformation into a state machine will happen when the module is loaded by the runtime. If a yield fn
passes semantic analysis, it must be transformable.
This is tentatively approved for 2.0.
People should prefer 0o123
over 0O123
, 0b101
over 0B101
, etc.
Fairly straightforward feature:
let r1 = rec {
x = 1,
y = 2,
};
let r2 = rec with r1 {
x = 4,
mut y = nil,
z = 3,
};
assert r2 == rec {
x = 4,
y = nil,
z = 3,
};
r2.y = 5; // OK; no panic.
(Variation on syntax suggested by @Roukanken42.)
A with
expression (keyword reserved) basically just clones a record or error value and adds/replaces the specified fields.
Approved for 1.0.
fn foo(mut bar) {
bar = 42;
bar;
}
assert foo("hi") == 42;
Basically just using a pattern-binding
in parameter grammar rules instead of a bespoke binding rule.
A const
expression would be to a const
declaration what an fn
(lambda) expression is to an fn
declaration. Basically, it is just an anonymous constant. It has all the same semantics that a regular constant does, but is anonymous and embedded directly in code.
celerity/.github/workflows/build.yml
Line 45 in 8418984
Once we actually have some benchmarks.
There will be 3 IRs in the runtime core. They will all be shared between the interpreter and the JIT compiler. Initially, we will implement HIR and MIR only, with the interpreter prototype (#58) consuming MIR. Later, as we reduce dependence on .NET types, we will implement and consume LIR in the interpreter. Finally, the JIT compiler will be implemented, which will transform LIR to AIR (#81), and then compile AIR to machine code.
The runtime will be based on lazy basic block versioning:
This is important to know to understand the IR design and behaviors described below.
High-Level IR (HIR) is the first intermediate representation. It mainly focuses on linearizing the code, turning it into SSA form, and desugaring some high-level language concepts. HIR is constructed from the semantic tree upfront when a module is loaded, and never changes after that.
HIR features basic blocks (with parameters), upvalues, constants, and operations as building blocks. Operation value operands can be upvalues, basic block parameters, constants, and (non-void) operations; there are no explicit variables or temporaries. Code is in SSA form, with basic block parameters serving as ฮฆ nodes. There is no propagation of explicit or inferred type information at this stage, but all type tests are made explicit.
Lowering to HIR gets rid of some high-level language concepts like pattern matching, agent send/receive syntax, defer
statements, for
expressions, try
expressions, etc.
The HIR data structures will be very minimalistic and will not be amenable to analysis. For example, there will be no use/definition chains. HIR is only really meant to be walked during lowering to MIR. In other words, HIR serves as a template for specialization in MIR.
Mid-Level IR (MIR) is where type specialization and most optimizations happen. It is similar to HIR in the building blocks it has, but unlike HIR, everything now carries type information. Types are gathered from the running program through basic block versioning, entry point versioning, value shapes, etc.
Lowering from HIR to MIR happens on demand as the program executes code, and is done with basic block granularity. Due to type specialization, there can be many different versions of MIR code for a given HIR (extended) basic block. Lowering proceeds until a type test is encountered that cannot be resolved with the available type information, or until the end of the function is encountered.
MIR will maintain use/definition chains and various other data structures that simplify transformation of the code. This will facilitate a classic set of optimizations (#61) that can be performed now that type information is available.
Low-Level IR (LIR) decomposes managed values into their constituent raw value and shape words. LIR mostly has the same building blocks as HIR and MIR, but at this stage, the only types that exist are 64-bit integers, 64-bit floats, and untyped pointers. All high-level operations will have been decomposed to primitive CPU-like operations. LIR is essentially a simple register transfer language.
LIR allows certain optimizations that would be harder to express at the MIR level. For example, in a series of small integer operations, it's obvious that copying the shape word for every intermediate operation is unnecessary. Yet, because MIR only operates on managed values, this notion cannot be expressed. At the LIR level, it is trivial to detect and remove such copies.
Note that, while LIR is very close to the machine, it is not architecture-specific.
@lint
attributes.undocumented-public-declaration
pass.We need to come up with a design for more advanced string literals. In particular, string literals that can span multiple lines are frequently useful.
String interpolation is out of scope for this.
This can happen when Zig reaches 1.0 (or at least when the language design is effectively frozen for 1.0).
We can at least expose Span
and FullSpan
properties, as well as ToString()
and ToFullString()
methods.
This begs the question, though: Should we also expose GetText()
and GetFullText()
? That would require a Parent
property. But then that might signal that the list is a node in the tree, which is not actually the case. We'd then also have to consider tree traversal methods, at which point these lists start to look an awful lot like SyntaxItem
s in their own right...
Need to think on this one.
We will eventually want the standard library's console API to be oriented around a terminal. The Spectre.Console API sits at too high a level for this to be practical. Further, using Spectre.Console.Cli locks us into using Spectre.Console.
We should switch to CommandLineParser as it allows us to supply a TextWriter
, effectively decoupling it from any particular console API. Then, we can use Cathode for all the low-level console interaction.
celerity/src/language/core/Syntax/SyntaxAnalysis.cs
Lines 6 to 9 in 18c5d49
This is neat because, for the happy case, we don't need to access the SourceText
at all. Only when there are diagnostics do we need to access line information from the SourceText
, and it's reasonable to just reconstruct it for those cases.
Even then, we still have to keep the SourceText
around during parsing in order to construct locations for diagnostics. To remedy that, we should probably also consider changing SourceDiagnostic
to not carry a SourceLocation
, but rather a SourceTextSpan
and a SyntaxItem
reference. A SourceDiagnostic.GetLocation()
method could then be exposed which would resolve the SourceLocation
.
When lexing a typical source file, there's going to be a lot of repeated strings - identifiers, literals, white space, and so on. We can't intern these, but it would make good sense to cache tokens up to a certain length and return the same instance instead of building them up repeatedly.
To implement this, instead of building up the token string in a StringBuilder
, we would keep track of where the token starts and ends. When creating the token, if the length is below our caching threshold, we first look it up in the token cache. For larger tokens, we shouldn't bother as the lookup will take too long to be worth it.
Nothing particularly difficult here. Just (very) boring implementation work.
Notably, though, we should try to do something to prevent stack overflows in the parser. Recursiont looks interesting here.
Splitting the language analysis layers into 5 separate assemblies might have been a bit overkill. Consider a new Vezel.Celerity.Analysis project with the following namespaces consolidated:
Something like:
{
"name": "my-app", // Unique project identifier.
"path": "src", // Optional path containing the project's own source files. Defaults to src.
"kind": "executable", // Optional project kind (executable, library). Defaults to executable.
"license": "0BSD", // Optional SPDX license expression.
"version": "1.0.0", // Optional Semantic Versioning 2.0.0 version. Defaults to 0.0.0.
// List of module search paths. The runtime will match the module path against the
// prefixes listed here and then look up the remainder of the module path in the
// specified directory.
//
// So e.g. LibA::Foo would find LibA here and then locate dep/lib-a/src/foo.cel,
// whereas Company::LibB::Bar::Baz would locate dep/lib-b/src/bar/baz.cel.
"paths": {
"MyApp": "src", // Only necessary if the app itself uses e.g. MyApp::Main.
"LibA": "dep/lib-a/src",
"Company::LibB": "dep/lib-b/src",
},
// Overrides default lint severities.
"lints": {
"unused-local-symbol": null, // Don't run this pass at all.
"test-without-assert": "none", // Hide diagnostics from this pass.
"unreachable-code": "error" // Promote diagnostics from this pass to errors.
}
}
The tooling APIs will pick this up and use it appropriately for the various celerity
CLI commands.
Note that nothing about this file will flow transitively; we're intentionally keeping things super simple. A top-level executable project will have to declare module search paths for all dependencies, direct or transitive, that it needs. Also, the file is completely optional.
The macro
, quote
, and unquote
keywords are reserved for future exploration in this space.
Some obvious issues to tackle here:
quote
expressions?celerity/src/syntax/LanguageParser.cs
Lines 209 to 265 in 53f45a6
Also, we need to process let
statements in a similar fashion to what we do in block expressions:
celerity/src/language/core/Semantics/LanguageAnalyzer.cs
Lines 279 to 284 in 8172942
celerity/src/language/core/Semantics/LanguageAnalyzer.cs
Lines 854 to 884 in 8172942
let forever = fn() -> this();
forever();
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.