GithubHelp home page GithubHelp logo

qwertie / ecsharp Goto Github PK

View Code? Open in Web Editor NEW
170.0 11.0 25.0 86.16 MB

Home of LoycCore, the LES language of Loyc trees, the Enhanced C# parser, the LeMP macro preprocessor, and the LLLPG parser generator.

Home Page: http://ecsharp.net

License: Other

C# 88.64% Batchfile 0.02% HTML 11.34%
collections geometry syntax-trees generic-collections math dotnet csharp visual-studio parser-generator programming-languages

ecsharp's People

Contributors

default0 avatar jonathanvdc avatar qwertie avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ecsharp's Issues

Proposal: modularize the standard macro library

Hi there. As you may have inferred from the title of this issue, I would like to modularize the standard macro library. I'll start off with some background information on how ecsc handles things, and then try to justify why I think the standard macro library should be split up. So please bear with me.

Background information

As ecsc has evolved, it has increasingly come to rely on a combination of macros and magic node types to lower existing C# constructs to lower-level EC# constructs. For example, this innocent-looking foreach loop

foreach (var item in col)
    Console.WriteLine(item);

gets macro-expanded to

#builtin_static_if(#builtin_static_is_array(#builtin_decltype(col), 1), #builtin_stash_locals(col, colLen, i, {
    var col = #builtin_restore_locals(col, colLen, i, col);
    var colLen = col.Length;
    for (#var(#builtin_decltype(colLen), i = 0); i < colLen; i++) {
        var item = col[i];
        #builtin_restore_locals(col, colLen, i, Console.WriteLine(item));
    }
}), #builtin_stash_locals(enumerator, {
    var enumerator = #builtin_restore_locals(enumerator, col).GetEnumerator();
    try {
        while (enumerator.MoveNext()) {
            var item = enumerator.Current;
            #builtin_restore_locals(enumerator, Console.WriteLine(item));
        }
    } finally {
        if (enumerator is System.IDisposable)
            ((System.IDisposable) enumerator).Dispose();
    }
}));

The lowered code is admittedly pretty cryptic, but it actually Does The Right Thing™. And you don't even have to take my word for it; if you stare at the expanded version long enough, then you'll see that first, a static if (#builtin_static_if) determines if col is a one-dimensional array (#builtin_static_is_array, #builtin_decltype). If this happens to be the case, then a simple for loop is used to iterate over col's values. Otherwise, it will use MoveNext/Current to iterate over col.

The calls to #builtin_stash_locals and #builtin_restore_locals are used to implement hygienic macros. #builtin_stash_locals hides local variable names – if the specified names are already in use by local variables, then they are removed from the local scope, and pushed onto a stack – which allows the inner expression to re-use those names safely. #builtin_restore_locals performs the opposite operations: it discards any locals that intersect with the names it specifies, and restores the old variables that were mapped to these names (if any).

A remarkable property of #builtin_static_if, #builtin_static_is_array, #builtin_decltype, #builtin_stash_locals and #builtin_restore_locals is that, from ecsc's perspective, these node types aren't special at all. Like any other node type, they simply get analyzed during the IRGen/semantic analysis phase, and that's all there is to it.

Problem

One thing that annoys me about this, though, is that – by introducing #builtin_static_if – both static_if and #builtin_static_if are now legal, and, more importantly, they behave differently. static_if is defined as a macro in LeMP.StdMacros, and it is far less powerful than #builtin_static_if because the latter construct can take full advantage of the fact that it is only evaluated during the IRGen/semantic analysis phase.

Had I used static_if in the macro expansion above, then LeMP would have told me that: "'static if' is incredibly limited right now. Currently it only supports a literal boolean or (x 'tree==' y)". That's unfortunate, especially since #builtin_static_if doesn't share those limitations.

Ideally, I'd like to "fix" this confusing situation by defining static_if as a macro that expands to #builtin_static_if. Unfortunately, I can't do that right now, because the standard macro library is structured as a single, monolithic binary; I can't just go ahead and define static_if in ecsc's macro library, as that re-definition would then conflict with the existing static_if definition from LeMP.StdMacros.

Proposal

So, here's my proposal: to divide the standard library into (roughly)

  • high-level commands, like alt class, and
  • low-level commands (the primitive operations or "primops"), like static_if.

Separate implementations of the low-level commands can then be created for LeMP and ecsc. The former would probably just be a re-packaging of the current macro implementations in LeMP.StdMacros, while the latter would expand nodes such as static_if to ecsc builtins.

The higher-level standard macro library can then depend on the lower-level macros without concerning itself with the specific implementation of the primitive operations.

The move to a low-level/high-level split in the standard macros can be gradual. Initially, we could just try moving static_if and #useSequenceExpressions into a low-level "primops" library. Later on, it might prove useful to implement certain operations as heuristics in the LeMP "primops" and as exact builtins in the ecsc "primops." When I say heuristics, I'm mostly referring to things like this.

static bool IsQuickBindLhs(LNode value)
{
    if (!value.IsId)
        return true;
    return char.IsUpper(value.Name.Name.TryGet(0, '\0'));
}

LeMP can't do much better than this, but ecsc can tell with absolute certainty if the left-hand side is a value, and I think that it's a shame that we're not taking advantage of that information when it's readily available.

Anyway, no part of this proposal is missing-critical, but I do think that it'd be pretty neat if standard macros were able to take advantage of ecsc builtins under the hood. Thanks for considering my proposal.

Final changes to LES3

I'm changing a bunch of things, mainly in order to finalize LESv3. However, the changes to operator precedence/classification affect both LESv2 and LES3, partly because both languages share precedence-choosing code but also for the sake of consistency. These changes are not committed as of today.

!! to suffix operator

To reiterate, !! changed from an infix to a suffix operator based on Kotlin's !! operator. I'm duplicating this change now in LESv2. And I noticed that suffix operator printing seems broken in LESv2 (e.g. a++ came out as `'++suf`a - it parses fine but looks ugly) so I fixed that.

.dotKeywords will be stored as #hashKeywords and # is an id-char

To save us the trouble of changing the whole Loyc codebase, the dot in a .keyword will be changed into a # in the Loyc tree. Also, # will be an identifier character. Thus .return x is equivalent to #return(x). I'm also reverting the definition of LNode.IsSpecialName to exclude ..

.keyw0rds can now contain digits

.keywords will be allowed to be any valid non-backquoted identifier.

Number parsing change

In order to simplify the lexer by removing a lookahead loop, I'm removing the requirement to have a "p0" clause on hexadecimal floating-point numbers. Consequently, 0x1.Ep0 can be written as 0x1.E, which will be treated as a floating-point number instead of 0x1 . E. However, 0x1.Equals(1) now has the bizarre interpretation quals"0x1.E"(1).

Weird Operators

Given a "weird" operator like x *+ y, the precedence of *+ was previously based on the first character, +. I think the motivation for this was to act similar to Nim operators. But then I noticed something: MATLAB and Julia have "elementwise" operators like X .* Y which multiply pairs of scalars in matrices X and Y. So I checked several other languages. Apparently, most other languages do not have any "compound" operators, but Swift does. Swift has &*, &/, &+, &-, and &%. In Swift, MATLAB and Julia, the last character determines the precedence. Bacause Swift, MATLAB, etc. are more popular than Nim, I'm changing LES to decide precedence based on the last character instead of the first. Sound good @jonathanvdc ?

Earlier I added a !. operator with precedence of . so that Kotlin's !!. operator would have the correct precedence. This change makes the !. operator redundant. Null-dot ?. must still be special-cased as its precedence is lower than ..

Note: Long operators like |%| will continue to be based on the first and last character. In the case of combo operators, like x s>> y, the initial identifier is ignored for the purpose of deciding precedence so that this particular operator has the same precedence as >>.

Operators and fractions

I decided that, to better match other programming languages, x*.2 should be parsed as x * 0.2 rather than as x *. 2. However, the tricky thing is that 0..2 and 0...3 should still be parsed as ranges, 0 .. 2 and 0 ... 3 respectively. To achieve this I tried splitting Operator into two rules:

token DotOperator returns [object result] : 
	'.' OpChar* 
	{$result = ParseOp(out _type);};
token Operator returns [object result] : 
	('$'|OpCharExceptDot) (OpCharExceptDot | '.' ((~'0'..'9' | EOF) =>))*
	{$result = ParseOp(out _type);};

[inline] extern OpCharExceptDot :
	'~'|'!'|'%'|'^'|'&'|'*'|'-'|'+'|'='|'|'|'<'|'>'|'/'|'?'|':';
[inline] extern OpChar : 
	'~'|'!'|'%'|'^'|'&'|'*'|'-'|'+'|'='|'|'|'<'|'>'|'/'|'?'|':'|'.';

This is an efficient implementation. However, with this grammar, If you write x.+.2 it is parsed as x .+. 2 which is not necessarily the desired result. It's a perfectly reasonable way to parse, but I think it would be better to match Julia/MATLAB. So I thought of an alternative which actually looks simpler in the grammar:

token Operator returns [object result] : 
	( '$'? (OpCharExceptDot | '.' ((~'0'..'9' | EOF) =>) '.'*)+ / '$' )
	{$result = ParseOp(out _type);};

This turns out to generate more, and slower, code. But for the sake of compatibility, I'll accept that.

:: operator

I think the precedence of :: should be changed to match C++ and EC#. In LESv2 I chose the syntax x::Loyc.Symbol for variable declarations, which required :: to have a lower precedence than .. Why didn't I just use the syntax x: Loyc.Symbol? I think it was because it conflicted with LES's old "Python mode" where you could write

if c:
    print "c is true!"

Meaning if c { print "c is true!" }. This feature was removed 2016-06-21, I think in order to make the language easier to parse and ensure colon would behave like any other operator.

So I will raise the precedence of :: up to ., but will make the change exclusive to LESv3 because I have a fairly large amount of LESv2 code relying on the old precedence.

Minor implementation detail

?. will now be classified as TokenType.Dot while .* will no longer be classified as TokenType.Dot.

Arrow operators

I have reduced the precedence of arrow operators -> <-. Technically their precedence is now above && but below | ^. My thinking is that some people would like to use arrows as assignment operators: flag <- x > y || y < 0 gets the structure (flag <- (x > y)) || (y < 0).

The previous precedence of -> <- was sort of a compromise between C (which wants high precedence) and other languages that want lower precedence. But arguably the old precedence served neither case very well. Now I'm thinking that people wanting to use a C++-style arrow operator as in obj->f() should pick another operator, such as obj*.f().

Continuators

The set of possible continuators should be carefully considered because it cannot be compatibly changed later. Since no one has offered an opinion, I am going to suggest that the set be the ten words else, elseif, elsif, catch, except, finally, while, until, plus, using, plus the set of all identifiers that begin with two underscores (__etc). Note that instead of double-underscores it would seem more natural to add #hashKeywords to the set; however, this creates an ambiguity in case of code like the following:

    .foo c { f() }
    #bar(x)

I think it's more likely this was intended to be two separate statements, rather than that #bar(x) is intended to continue the .foo statement. I selected __, a traditional "potential reserved word" marker in C++, because it is not currently used for any purpose in Loyc. Unlike continuators like else that will be stored with # in the Loyc tree (#else), double-underscore identifiers will be stored unchanged. Though it could equally be argued that #s should not be added even in the former case.

Since continuators are not allowed to be used as binary operators, I removed and, or, but from the continuator set, thinking that some users would prefer to use them as binary operators.

And introducing token lists / prefix-expressions / whatever you call 'em

I proposed that LESv3 not have token literals (unlike LESv2 and EC#) and instead adopt "token lists" such as ' + x 1. No one offered an opinion on this, or about whether '(+ x y) should be represented as `'`(`'()`(`'+`, x, y)) or as `'`(`'+`(x, y)) so I'm somewhat arbitrarily selecting the first representation.

Decisions on other questions previously raised

  • Should attributes be allowed in the middle of an expression, as in foo: @immutable string? Still undecided; currently, no.
  • Should comma be allowed as a separator within braces, as in { a, b, c }? I don't think so. The parser isn't currently complaining about it, but I may add a check, possibly a check dependent on whether { is followed by a string (to carve out an exception for JSON syntax).
  • Should comma be allowed as a separator within tuples as in (x, y)? Yes, but (x,) will be a syntax error and parsed as (x, ``).
  • What should # mean? It will be treated as an identifier character like an underscore.
  • Should continuator keywords be bona fide keywords? No, I think not.
  • Edit: Should non-ASCII identifiers be supported without backquotes? I suppose that can wait for the next version.

Idea

Numeric literals can have any identifier as a suffix, including backquoted identifiers. This feature could be used to support my favorite feature that few languages allow: compound unit types like 1.2`kB/record` or 3e8`m/s` . Currently an expression like size `kB` is meaningless, but I suppose it could be defined as some sort of suffix operator which could then be used for unit types. However, this idea has the disadvantage that a numeric value with units would have a different syntactic structure than any other expression with units, and 3`px` would have a different meaning than 3 `px` .

How to make macros available for more purposes

like define a new syntax element:

repository hellorep {

}

that will be replaced with:

public class hellorep : IRepository {
..other stuff...
}

for example the Excess Project can do this but the Project is dead

Creating a parser for expression trees

I just very recently learned about LLLPG. I think it is really great work and inspired me to try and tackle a challenge that I find simply incredible that noone has ever tackled comprehensively before: a parser for expression trees.

What is the goal?
Expression trees are presumably capable of representing every single C# expression, including LINQ, lambdas, anonymous type declarations, etc. The goal is to have a parser that can take any C# expression and produce the equivalent expression tree.

Why expression trees?
I believe expression trees are hands-down the most powerful and balanced runtime code generation tool across the C# stack today. They:

  • work in virtually every platform because of LINQ-query compatibility;
  • have excellent debugger support;
  • can generate fast and garbage-collectible code;
  • provide multiple extensibility points for mixing an expression tree with an existing host environment

At first look they are the PERFECT candidate for lightweight C# scripting, except for one huge catch: apart from the built-in C# parser support to generate Expression<TLambda> instances, there is no equally lightweight support for parsing Expression trees from text.

Why not use Roslyn?
Roslyn is an absolutely amazing piece of software; a death star to detonate the entire constellation of .NET code generation problems. This is also it's greatest weakness, because by necessity it is not lightweight at all. Taking a dependency on Roslyn is a huge commitment and effectively eliminates many of the platforms and scenarios where you might want to deploy this (such as mobile).

Are you sure noone tried to do this before?
Actually, they have. The best attempt so far is ExpressionEvaluator. ExpressionEvaluator fulfills the requirement of being relatively lightweight (except for the Antlr runtime). However, they have no support for LINQ or lambdas yet, which in my opinion are the most interesting uses of scripting C# expressions (for manipulating collections, etc). In studying their source code I realized that there is much I need to relearn about parsers (it's been a while since I last wrote parsers and lexers - for university projects) and eventually got to the point where I am essentially starting from scratch again (grammar files).

It was at this point that I learned about LLLPG. I realized it raised the serious possibility of generating parsers with no extra runtime dependencies (e.g. without Antlr) which is a huge bonus, and it looks self-contained and lightweight enough that I could actually start relearning parser writing. Making this parser a reality would be the best learning exercise.

My question is: has any of you guys ever thought/tried to do this? Am I crazy in thinking that this would be insanely useful and a great contribution to industry/independent developers? I post this issue thinking that it is in line with the spirit of Loyc. I have gone through the ECS grammar, parser and lexer and am thinking of adapting it to the purposes of this exercise, but would have to extend it (eventually) to handle LINQ and so on.

I would love to get some feedback/suggestions/comments before I get started, especially if there is any particular deadend that you imagine I may be getting myself into. I have a reasonable time to learn at this moment, so if there is any time to do this, now is the time.

Thanks for developing this amazing tool and hope that somehow this is the start of seeing the dream of a C# Expression Tree parser come true!

EC# parser bug: lost 'new' attribute

I think I encountered an EC# parser bug while I was implementing abstract/virtual/sealed/override/new in ecsc. It's definitely not a deal-breaker bug, but it is somewhat annoying.

This file

public abstract class Base
{
    public Base() { }
    public virtual int g()
    {
        return 2;
    }
}

public class Derived : Base
{
    public Derived() { }
    public new int g()
    {
        return 3;
    }
}

is parsed as:

@[#public, #abstract] #class(Base, #(), {
    @[#public] #cons(@``, Base, #(), {
        });
    @[#public, #virtual] #fn(#int32, g, #(), {
        #return(2);
    });
});
@[#public] #class(Derived, #(Base), {
    @[#public] #cons(@``, Derived, #(), {
        });
    @[#public] #fn(#int32, g, #(), {
        #return(3);
    });
});

The new attribute seems to be gone. This does not modify the program's behavior, but the missing attribute causes ecsc to report the warning below, despite the fact that g is clearly marked new.

Overrides.cs:15:20: warning: member hiding: method 'g' hides a base method. Consider using the 'new' keyword if hiding was intentional. [-Whidden-member]

    public new int g()
                   ^  
remark: hidden method: Overrides.cs:6:24

    public virtual int g()
                       ^  

I'm assuming that this is the parser's fault.

I don't have much time on my hands at the moment - and this bug is not top priority for me right now - but I wouldn't mind taking a stab at fixing this when I have some spare time.

Anyway, can you by any chance reproduce this bug?

Thanks in advance.

replacePP doing mysterious substitution outside of macro scope

Ok, one more issue -- sorry to be finding so many! It's entirely possible this is my misunderstanding, but here's what I found nonetheless.

The following code will, amongst other things, emit these two lines:

public CSVar this[CSVar x, CSVar y, CSVar z] => GetExpr(x, y, z);
public CSVar this[CSVar x, CSVar y, CSExpr z] => GetExpr(x, y, z);

This is surprising, since neither of the 2 macros that do the work (Indexer2D and Indexer3D) should emit anything that starts with public CSVar (they should only emit public CSExpr). If you comment out this line:

ExecMacro2D(Indexer2D);

Then the errant lines disappear (along with all the 2D indexer code). Similarly, if I rename arg1, 2 & 3 to some other name (say, arga, b & c), the the problem goes away.

My understanding was that the replacePP function would be limited to its current scope, but even if it weren't, I can't understand how it would replace the CSExpr return type, since that should be a single, complete token.

Code showing the problem follows:

replace(AllTypes => (Var, Expr));

// Execute the given macro with all permutations of types for 3 arguments.
define ExecMacro2D($macro2d) {
    unroll (T1 in AllTypes) {
        unroll(T2 in AllTypes) {
            $macro2d(T1, T2);
        }
    }
}

// Execute the given macro with all permutations of types for 3 arguments.
define ExecMacro3D($macro3d) {
    unroll (T1 in AllTypes) {
        unroll(T2 in AllTypes) {
            unroll(T3 in AllTypes) {
                $macro3d(T1, T2, T3);
            }
        }
    }
}

define Indexer2D($T1, $T2) {
    replacePP(arg1 => concatId(CS, $T1));
    replacePP(arg2 => concatId(CS, $T2));
    public CSExpr this[arg1 x, arg2 y] => GetExpr(x, y);
}
ExecMacro2D(Indexer2D);

define Indexer3D($T1, $T2, $T3) {
    replacePP(arg1 => concatId(CS, $T1));
    replacePP(arg2 => concatId(CS, $T2));
    replacePP(arg3 => concatId(CS, $T3));
    public CSExpr this[arg1 x, arg2 y, arg3 z] => GetExpr(x, y, z);
}
ExecMacro3D(Indexer3D);

Annotating LNodes with comments

Hi.

TL;DR: I'm trying to prefix LNodes with comments that the LES printer will output. Is that something that can be done? If so, how?

Why I need this

I've recently created an LNode-based on-disk IR for Flame. I figured I could use the binary encoding in my fork of Loyc to handle the typical compile-link scenario: a Flame-based compiler such as dsc or fecs generates binary IR files (I gave them the *.flo extension), which are then be linked together later on. Here's a quick example:

dsc A.ds -platform ir -indirect-platform clr -o A.flo
dsc B.ds -platform ir -indirect-platform clr -o B.flo
dsc A.flo B.flo -platform clr -o Program.exe

This all went smoothly. As usual, Loyc proved to be a pleasure to work with. I just generate sequences of LNodes, which I then pass to the binary encoder.

My design goals for Flame's IR were twofold, however: on one hand, I wanted an efficient binary format that could be used as a target platform, source language, and library dependency, but on the other hand I also though this was a great opportunity to see what's actually going on. dsc applies a number of transformations on its input, and some kind of textual IR would be ideal to see these passes at work.

LES seemed like a great candidate for that: all I had to do was hand my LNodes to the LES printer, and store the resulting string in the output file. *.fir files were born. Here's HelloWorld.fir, which I wrote manually. You can also make dsc generate them like so: dsc A.ds -platform ir -S -indirect-platform clr -o A.fir.

The IR's syntax is atrocious, I know. But it's also unambiguous and pretty regular, and humans were never its intended audience. It also stores types, methods and fields in lookup tables, to avoid having to parse and resolve the same method references over and over again.

Sadly, this design makes files with big type/method/field tables tough to read for humans. For example, a generic type instance in the type table (#of node in the table) refers to its generic declaration and type arguments as indices in that same type table. This makes sense for computers, because an IR file that uses the List<int> type probably also needs the List<> and int types at some point. So why not share them? Clearly, however, #of(#type_table_reference(0), #type_table_reference(1)) is nowhere near as readable as (the unfortunately ambiguous and tough to resolve) List<int>.

I came up with a simple solution, which is to prefix every type/method/field table entry with a comment that contains a human-readable type name. The hello world example I referred to earlier demonstrates the technique.

Which brings us to the question I'd like to ask: is there some mechanism in Loyc that allows me to prefix LNodes with comments that the LES printer will output? If so, how would I go about creating such prefixed nodes?

Okay, okay. I know. That's a pretty long use-case compared to a relatively simple question. I just wanted to be as thorough as possible here. Also, if you have any comments/suggestions/questions about the IR in general, I'd love to hear them.

Thanks in advance.

Set up AppVeyor & NuGet packages

I think we should have NuGet packages for each component: Loyc.Essentials, Loyc.Collections, Loyc.Syntax, Loyc.Utilities, Loyc.Ecs, LeMP and LLLPG. But right now there is only LoycCore (see Core/LoycCore.nuspec) and I've been too distracted to set up new packages. Hmm, I wonder if NuGet has a way of letting multiple people (U&I) have permission to update a given package. Anyway, since I have to update NuGet packages manually, I do it less often than I should.

I saw AppVeyor once and it looks cool... I am interested in having independent automated builds since it's too easy to push something that only works on my machine, or that has the .NET 3.5 build broken (which I guess I'll remove soon anyway... but after all other spring cleaning is done). But I always figured I'd have to give it some special thought because ideally you'd want to not just build EC#/LeMP but also make sure that it compiles itself correctly. It's a horribly manual process right now:

  1. I build Release .NET 4 and close Visual Studio
  2. I run the batch file that rebuilds the VS extensions and offers to reinstall them
  3. I reopen the solution and re-run LeMP on a bunch of source files
  4. I use Git Extensions to check the diffs and make sure any changes are as expected (usually only the header comment changes to reflect the new version number)
  5. If I have lots of presence of mind, I make sure it still builds OK
  6. I push the new binaries

But just doing 1 & 2 in CI would be a nice start.

So @jonathanvdc, help setting up AppVeyor and NuGet packages would be appreciated. Do you know how to set up AppVeyor such that it can auto-increment the build number when changes are pushed? I saw a video where a guy had set up an auto-increment feature, but I have no idea how it works on a technical level.

Another thing we should do is set up a Loyc organization on GH to hold a repository of finished features, with our partly done features under our usernames. Except I don't know what to call the organization since some zero-commit jagoff took the name "Loyc".

operator true and operator false trigger syntax errors

Classes using operator true or operator false trigger a syntax error in LeMP:

#ecs;

// Create a class with some properties
public class Class
{
	public static bool operator true(Class c) {
		return false;
	}
}

LeMP returns

  Syntax error in expression at 'operator'; possibly missing semicolon

Microsoft's docs on operator true and false are here

Can't get a simple test case to work

Just found the drop about ECS in Roslyn issue thread and decided to give it a try.
I've installed LoycSyntaxForVs and run LLLPG, created a simple console app, added a .ecs file with the following content (taken from ecsharp/Doc/EC#.cs):

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace ConsoleApplication8
{
    class Alice_Bob 
    {
        public new(public int Alice, public int Bob) { }
    }
}

Then I've assigned a custom tool for .ecs file to LeMP and tried to build the solution. It failed with this:

1>------ Rebuild All started: Project: ConsoleApplication8, Configuration: Debug Any CPU ------
1>d:\AllThatJazz\VS\Projects\ConsoleApplication8\ConsoleApplication8\Test.ecs(11,13,11,14): error CS1519: Invalid token '(' in class, struct, or interface member declaration
1>d:\AllThatJazz\VS\Projects\ConsoleApplication8\ConsoleApplication8\Test.ecs(11,32,11,38): error CS1001: Identifier expected
1>d:\AllThatJazz\VS\Projects\ConsoleApplication8\ConsoleApplication8\Test.ecs(11,32,11,38): error CS1002: ; expected
1>d:\AllThatJazz\VS\Projects\ConsoleApplication8\ConsoleApplication8\Test.ecs(11,46,11,47): error CS1003: Syntax error, ',' expected
1>d:\AllThatJazz\VS\Projects\ConsoleApplication8\ConsoleApplication8\Test.ecs(11,50,11,51): error CS1002: ; expected
1>d:\AllThatJazz\VS\Projects\ConsoleApplication8\ConsoleApplication8\Test.ecs(13,1,13,2): error CS1022: Type or namespace definition, or end-of-file expected
========== Rebuild All: 0 succeeded, 1 failed, 0 skipped ==========

And:

Warning The custom tool 'LeMP' failed. MissingMethodException: Method not found: 'Void Loyc.Ecs.EcsNodePrinter.PrintPlainCSharp(Loyc.Syntax.LNode, System.Text.StringBuilder, Loyc.IMessageSink, System.Object, System.String, System.String)'.
at Loyc.VisualStudio.LeMP.Generate(String inputFilePath, String inputFileContents, String defaultNamespace, IVsGeneratorProgress progressCallback)
at Loyc.CustomToolBase.Generate(String inputFilePath, String inputFileContents, String defaultNamespace, IntPtr[] outputFileContents, UInt32& outputSize, IVsGeneratorProgress progressCallback) ConsoleApplication8 d:\AllThatJazz\VS\Projects\ConsoleApplication8\ConsoleApplication8\Test.ecs

Proposal: change trivia marker to a single character.

Originally Loyc trees had a convention that there was a single character # to mark all "special" identifiers. In this system, trivia (non-semantic information such as comments) were attached to things as attributes and required to have a Name that starts with #trivia_, e.g. #trivia_SLComment for a single-line comment.

Later I decided to switch to a different special prefix for operators, an apostrophe (why an apostrophe? To minimize visual noise - I liked that it was a small character. Also, I didn't want to use a punctuation mark that already connoted a specific operator). I didn't want to slow down the check that distinguishes "normal" and "special" names so rather than check if name[0] == '\'' || name[0] == '#', LNode.IsSpecialName checks if name[0] <= '\''. Thus any prefix below ASCII 40 is reserved for special names.

Now I'm thinking that #trivia_ is a bit clunky - why not define a single-character prefix for trivia? At first I was thinking that the space character would be a good prefix, but then I remembered that #trivia_ does have one virtue, it greps well: code referring to #trivia_ is easy to find. With that in mind I think the best prefix is %, because % is a fairly rare character in most code, especially if that code is compiler-related. What do you think @jonathanvdc?

The next question then is whether to keep an alphanumeric representation of trivia concepts, like %SLComment for "single-line comment", or whether to go with a compact symbolic representation like %// - with the understanding that if one is parsing a language where comments are denoted # like this or (* like this *) it would still be recommended to use %// and %/**/ to represent those comments in the Loyc tree.

Custom tool error ... NullReferenceException

I went through the installation instructions on the installation page:
http://loyc.net/lemp/install.html

Installed into VS2013.

example.out.cs was created, it only includes using statements, and there were two errors:

Error   2   Custom tool error: Bug: unhandled exception in parser - Object reference not set to an instance of an object. (NullReferenceException)  C:\Sketchbook\LoycTest\src\HelloWorld\example.ecs   6   1   HelloWorld
Error   1   Custom tool error: An expected condition was false: nsName.Calls(S.Assign, 2) && (aliasedType = nsName[1]) != @null C:\Sketchbook\LoycTest\src\HelloWorld\example.ecs   6   1   HelloWorld

Pointer-to-pointer/exponentiation operator ambiguity

I was doing some low-level programming just now and thought I'd write a simple echo program:

public static unsafe class Program
{
    public static extern int puts(byte* str);

    public static int Main(int argc, byte** argv)
    {
        for (int i = 1; i < argc; i++)
        {
            puts(argv[i]);
        }
        return 0;
    }
}

Seems reasonable, right? Well actually, the EC# parser parses byte** argv as @'**(#uint8, argv) instead of #var(#of(@'*, #of(@'*, #uint8)), argv). (I guess any expression a ** b is ambiguous in this way.)

I've resorted to writing byte* * argv for now, but that's just poor style. Would you consider dropping the @'** from EC#? There's not even a macro for @'** right now: removing the operator shouldn't break any code.

Support for special LongSparseAList

SparseAList is very solid collection, And I use it for mapping lots of sequential data into different integer places. but sadly my main requirement is 64bit integers.

It seems extending SparseAList to support 64bit indexer is not possible. since its based on AList and whole thing has to be changed to support 64bit indexing. @qwertie told me that this has additional 50% overhead in size.

However in my case benefit of SparseAList overweighs additional overhead of 64bit version. I would like to have this collection, since collection complexity is at monster level (it seems 😃), I'm looking for approach to minimize (or completely avoid) possible mistakes by manually copying every bit.

as suggested by @qwertie this can be achieved by LeMP, I would like to do to this.

Comments not emitted based on surrounding whitespace

While researching the test-cases in #58, I discovered that newlines surrounding comments can affect whether or not the comments are emitted in the output. For example:

#ecs;

namespace Test
{
	internal class Class
	{
		// #region test region
		
		[DllImport(Constants.Lib, EntryPoint = "testfunction")]
		public static extern int TestFunction(int i);
		//#endregion
	}
}

Will omit the first comment, whereas the comment is preserved in either of the following two versions:

#ecs;

namespace Test
{
	internal class Class
	{
		// #region test region
		[DllImport(Constants.Lib, EntryPoint = "testfunction")]
		public static extern int TestFunction(int i);
		//#endregion
	}
}
#ecs;

namespace Test
{
	internal class Class
	{

		// #region test region

		[DllImport(Constants.Lib, EntryPoint = "testfunction")]
		public static extern int TestFunction(int i);
		//#endregion
	}
}

Reference-free parser

Is it possible to use LLLPG to generate a parser that has no external references (i.e. is self-contained)? If not, please consider this a feature request! 😄

StackOverflowException in LLLPG in certain left-recursive grammars

An indirectly left-recursive grammar resulted in a StackOverflowException while evaluating the IsNullable property. This wouldn't be such a big deal, except that for some reason this exception is not caught while running in Visual Studio 2010, causing the entire IDE to crash.

vs plugin wont compile

hello i have a ecs code that will compiled in the repl but not in visual studio in the assembly

#region erases newlines

I posed this question over on Stackoverflow but in the course of investigating it, I think it might be a bug. Here's a minimal test case that shows the issue:

#ecs;

namespace Test
{
	internal class Class
	{
		#region test region
		[DllImport(Constants.Lib, EntryPoint = "testfunction")]
		public static extern int TestFunction(int i);
		#endregion
	}
}

When the region markers are in place, the DllImport + method prototype are emitted like so:

		[DllImport(Constants.Lib, EntryPoint = "testfunction")] public static extern int TestFunction(int i);

(note: no newline, and #region, #endregion lines are omitted)

Whereas if you comment out the region markers, the newline is preserved:

		[DllImport(Constants.Lib, EntryPoint = "testfunction")] 
		public static extern int TestFunction(int i);

Though the emitted code still compiles and does what it's supposed to, region markers are super useful in classes large enough to be generated by a macro system. It would be great to have them emitted, and also not affect the whitespace of the enclosed code.

Holy crap!

Is this the appropriate place to say, 'holy crap! You are a genius!' And I get your comment about c# enhanced being a gimmick. There is a bigger picture. Nevertheless, any professional c# developer that tolerates c# during the day to earn a salary and then goes home and codes in something better, more expressive, and more flexible should really appreciate what you are doing. Thanks!

Installation page a little hard to find

Hi David,

I hope you don't mind this feedback. I was poking around the documentation to understand it better. I just realized (I think) that you were just updating the documentation. I was going to mention the sourceforge links but I think you just switched them. :)

I found the installation page, but I had to go to
http://loyc.net/ --> Click LeMP Home Page --> Scroll to the bottom

Unity3d support

Is it possible integrate with unity3d? which using mono/il2cpp

Immutability 'with'

Could loyc macros implement concise immutability syntax (fsharp style) in c#? Including 'with' to make changes easily to immutable types?

Has anybody written an s-expression macro?

There are several Javascript transpilers that take an s-expression version of Javascript and compile it to standard Javascript. Would it be possible to do something with lemp macros for c#? Has anybody tried this yet?

LeMP throws away variable in cast

This code:

#ecs;

var a = "I'm a string!";
if(a is string s) {
    Console.WriteLine(s);
}

Will produce the following:

// Generated from Untitled.ecs by LeMP 2.6.2.3.
var a = "I'm a string!";
if (a is string) {
	Console.WriteLine(s);
}

Note that the 's' has been dropped from the end of the a is string if condition. AFAIK this shouldn't happen -- the original code compiled and worked just fine.

BaseParserForList<Token, int> EOF Issue

Hello,

I would like to start off by saying this is a pretty impressive project, I have spent the last week or two digesting the sheer amount of code it involves. may the odds be ever in your favor

I am trying to use LLLPG 1.3.2, with the following parser definition (partial code, obviously)

public partial class GpcParser : BaseParserForList<Token, int>
{
    LLLPG(parser(laType(TokenType), matchType(int)));

    alias("{" = TT.LBrace);
    alias("}" = TT.RBrace);
    alias(";" = TT.Semicolon);

    private token ScanToEndOfStmt() @[
        // Used for error recovery
        (greedy(~(";"|"{")))*
        greedy(";" | "{" "}"?)?
    ];
}

The following code is generated for the ScanToEndOfStmt function

void ScanToEndOfStmt()
{
    TokenType la0;
    // Line 50: greedy(~(EOF|TT.LBrace|TT.Semicolon))*
    for (;;) {
        la0 = (TokenType) LA0;
        if (!(la0 == EOF || la0 == TT.LBrace || la0 == TT.Semicolon))
            Skip();
        else
            break;
    }
    // Line 51: greedy(TT.Semicolon | TT.LBrace (TT.RBrace)?)?
    la0 = (TokenType) LA0;
    if (la0 == TT.Semicolon)
        Skip();
    else if (la0 == TT.LBrace) {
        Skip();
        // Line 51: (TT.RBrace)?
        la0 = (TokenType) LA0;
        if (la0 == TT.RBrace)
            Skip();
    }
}

Which fails to compile with the following error:

Operator '==' cannot be applied to operands of type 'TokenType' and 'int'

on this segment la0 == EOF

EOF is defined as MatchType EOF in BaseParser<Token, MatchType>

Is there something I am missing regarding usage of this base class, or is this a true bug?

I am trying to read my way through the EcsParser as an example, which does not use the new base classes, and adapt it for my own language implementation.

Thanks,
Austin Morton

How to control parameter type with a macro?

Have been experimenting with letting LeMP handle yet more of my code generation and stumbled onto this case:

define TypeForArg($T1) {
    static if($T1 `code==` Int) int;
    else $T1;
}

public static void F(TypeForArg(Int) i);

This will output the following:

public static void F(int);

Note that the i has been dropped from the output.

'using' conversion semantics

Hi there. I implemented the EC# using conversion in ecsc a while ago, and I'd just like to make sure that I got its semantics right.

In my understanding, x using T does the exact same thing as (T)x, except in the specific case where the (T)x cast is resolved as a downcast, i.e. it is compiled as a castclass T or unbox.any T opcode, in which case x using T results in a compile-time error.

This is exactly what ecsc does right now. Note that the paragraph above implies that if x using T is resolved as an implicit or explicit user-defined conversion, then it will be compiled identically to (T)x – there is no compile-time error – despite the fact that this may result in a run-time exception.

Is that about right? Here's an overview of some using conversions. I've commented the illegal using conversion out.

using static System.Console;
using System.Numerics;

public class A
{
    public A()
    { }
}

public class B : A
{
    public B()
    { }
}

public static class Program
{
    public static void Main()
    {
        B x = new B();
        WriteLine(x using A);
        WriteLine(x using B);
        // WriteLine((x using A) using B); // <-- error: downcast
        WriteLine(2.5 using int);
        WriteLine(2 using double);
        var bignum = new BigInteger(long.MaxValue);
        WriteLine(bignum using int); // legal, but will throw
    }
}

User-defined conversions haven't actually been implemented yet in ecsc, but I'd like to get a clear understanding of what the using conversion does first.

Proposal: BigInteger literals

I'm proposing to add BigInteger literals to LES. First, I'll describe my use case, and then I'll propose a syntax for these literals.

All right, so I've recently added (theoretical) support for arbitrarily-sized integer types to Flame. There's just a small catch: the new integer types don't play nice with on-disk Flame IR, because those IR files are LES-based. And LES only supports the C# primitive types. For example, I can encode UInt64.MaxValue as a ulong literal right now, but there's just no appropriate literal type for UInt128.MaxValue at the moment. A hypothetical BigInteger literal, on the other hand, should be capable of handling any integer size.

I could solve this problem with an ugly workaround, and encode BigInteger values as byte arrays. But I'd really rather not, because that would only make reading and writing IR files harder.

So instead I propose to add built-in support for BigInteger literals to LES. These are then to be represented as normal integers, with some suffix that marks them as BigInteger values. I was thinking of using 'B' for this purpose, but any suffix will do, really.

The UInt128.MaxValue constant would, under this scheme, be encoded as 340282366920938463463374607431768211456B.

Do you think that this proposal has merit? If so, would you mind if I implemented this feature and sent you a pull request?

Add mechanism to mark unused variables as ok

Consider the following:

replace(BufferAndNativeTypes => (
    (Int, int),
    (Float, float),
    (Short, short),
    (UShort, ushort),
    (Byte, byte)));

            unroll((BufferType, NativeType) in BufferAndNativeTypes) {
                if(typeof(T) == typeof(NativeType)) {
                    return;
                }
            }

It will generate a warning, because NativeType isn't used.

Generally when I iterate over BufferAndNativeTypes I use both entries in the tuple, but here I don't. I generally like to eliminate all warnings where possible, so that it's meaningful when I do get one. I'd love to have a way to mark to LeMP that NativeType being unused is expected, so it doesn't need to generate a warning.

At present I'm doing this:

                #rawText("// "); #rawText(stringify(BufferType));

which works, but it's a bit of a hack.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.