Hi there. As you may have inferred from the title of this issue, I would like to modul

Before I read your message in its entirety, have you tried the <code class="notranslat

Proposal: modularize the standard macro library,about qwertie/ecsharp

Comments (10)

qwertie commented on May 29, 2024

Before I read your message in its entirety, have you tried the MacroMode.PriorityOverride option to override static_if?

from ecsharp.

qwertie commented on May 29, 2024

Whew! I just committed support for preserving newlines and comments in EC#, a feature that was much harder than I thought it would be, particularly on the printer side. And I fixed a bug in LLLPG that broke the parser and had me baffled for awhile. Hopefully trivia preservation will be much easier in LES, but sadly there are two different printers to deal with (I thought I could deal with all three printers and parsers in one day, hahahaha.. er.. nope, not by a long shot).

The problem I have with "modularizing" StdMacros is in deciding where to draw the lines. It is not obvious to me what counts as "high" or "low" level. I mean, for most of the macros in StdMacros you can point to one or more programming languages where the feature implemented by that macro is built into the language, and one or more other languages where it is an "add-on". But I guess I can agree with your examples: static_if feels like a low-level feature while alt class feels like, and operates as, an add-on, even though ADTs are built into many languages as a low-level feature.

Hopefully for now you can solve the problem with MacroMode.PriorityOverride - but actually you should use PriorityInternalOverride as PriorityOverride is meant for end-users.

from ecsharp.

qwertie commented on May 29, 2024

Doesn't implementing IsQuickBindLhs correctly [in a macro] require an enhanced macro system like Nemerle has, with a second macro-processor pass that provides access to more type & member information? I haven't thought about that feature for a long time. (I'd love to have Nemerle-like macro hygiene, btw, but it involves concepts like "imports" and "colored identifiers" or "private symbols" or whatever the kids are calling them these days, and it seems to require closer integration with a compiler, which LeMP is not currently designed to provide.)

from ecsharp.

qwertie commented on May 29, 2024

Say... is there a reason you're not using macros (quote {}) to write RequiredMacros.cs?

from ecsharp.

jonathanvdc commented on May 29, 2024

Thanks for your replies! I'll try to respond to each of them in some arbitrary order below.

Actually, I must confess that I didn't even know MacroMode.PriorityInternalOverride existed until now. It seems like that enum value could solve most (if not all) of my macro-related problems, so I'll definitely give it a try.
But I disagree that implementing IsQuickBindLhs requires an enhanced macro system. All it takes is a few builtins, which are evaluated at compile-time. Admittedly, this does require compiler support for the builtins, but – strictly speaking – there's no need for an additional macro processing phase. Like foreach statements, the :: operator can be lowered to static if (#isQuickBindLhs(<lhs>) { <quickbind> } else { <scope resolution> }. LeMP could implement the hypothetical #isQuickBindLhs node as a macro that evaluates to a Boolean value, and process the static if at macro expansion time. ecsc can map both #isQuickBindLhs and static if to builtins, which are only evaluated during the IRGen phase.
Also, ecsc defines builtins that can be used to build hygienic macros, and ecsc's macro library actually uses those builtins to make sure that local variable definitions do not interfere with the outer scope's locals (and vice-versa). Again, these builtins are ecsc-specific, but perhaps LeMP can use a few heuristics to obtain similar results.
I did consider using quote { } in RequiredMacros.cs, but I sort of decided to hold off on that for now. Keeping the *.out.cs files in sync with the *.ecs files is a bit of a pain, so my plan is to just wait a little longer until ecsc can compile its own macro library, and then start using quote { } in the macro library.
Oh, and trivia preservation sounds great, by the way. Flame actually has some (theoretical) support for comment preservation and printing, but none of the current front-ends support it. I might just look into embedding comments in the IR that ecsc generates – once the next version of Loyc lands, that is. I've always wanted to include comments from the source programming language into, for example, generated C++ code, and this is the missing piece I needed to do just that. Awesome!

from ecsharp.

jonathanvdc commented on May 29, 2024

Update: I have defined overrides for the static_if and @[#static] #if macros in RedefinedMacros.cs, and the override mechanism seems to work beautifully. Thanks for helping me out!

(I'll close this issue now, given that my problem has been solved.)

from ecsharp.

qwertie commented on May 29, 2024

I guess I have waited long enough to comment...

It's pretty interesting how you've stretched out the lexical macro system to execute what would otherwise be conventional compiler passes, but something about this approach makes me uncomfortable. Let's see...

I have a vague uneasiness that macros don't compose or chain very well (e.g. if a macro emits a call to #foo() thinking that there is a macro called #foo, often it is not guaranteed that the intended version of #foo is actually called... #foo's namespace might somehow not be imported at the call site, or maybe there's a name collision as there are two macros called #foo).
Your foreach macro builds upon a series of compiler builtins. In this case it works, and potentially it could allows third parties to add their own new constructs with "smart" behavior - as you mentioned, #useSequenceExpressions could use these builtins to act smarter. But I feel like there must be limits to how much one can accomplish this way. But even if those limits were overcome, by making your compile-time builtins turing-complete (and whatever else they would need), programming this way has a couple of major downsides:
- The macro author has to write code in a different (and more clunky) way than if he was writing an ordinary compiler pass or if he was writing a macro that has access to semantic information.
- The some of the macro's logic is encoded in Loyc trees. Creating those trees wouldn't be necessary in a compiler pass, and processing those trees is an act of interpretation, so it necessarily takes more CPU time than if compiled code (like a compiler pass) did the same task.

So... I think ultimately we should move away from using lexical macros to do semantic tasks, though I'm not prepared at the moment to suggest what to do instead.

Symbols and hygiene

I'd like to share an idea I originally had about how a Loyc compiler would work. As you know, LNode.Name is a Symbol, and while currently Symbol is basically just a string (it has an Id too, which maybe I should delete), it is deliberately not sealed, and in the back of my mind my plan was always to use Symbol for resolved references in source code.

Imagine if your compiler built its symbol-tables as usual, but the symbols in the symbol tables were actually Symbols, i.e. derived from class Symbol. Now, if a method contains this code:

var x = 23;
Foo(x);

The EC# parser, of course, produces a Loyc tree for Foo(x) which refers to a pair of ordinary global symbols. My idea was that a "symbol resolution" pass would scan over the source code, replacing global symbols with resolved symbols. The output would still be a Loyc tree and it would still print out as Foo(x), but the symbol with Name = "x" would actually "be" the local variable x and include type information so that you could look up the type of x directly from the LNode, something like ((LocalVariable)x.Name).VarType, if that makes sense. Similarly, the Symbol with Name = "Foo" would actually represent the method Foo(int).

Hygiene could also be achieved by a macro simply producing Symbols that are in a "local" SymbolPool (although I wonder if Symbol Pools are even a useful concept - non-global Symbols could be created without a pool, that's how it works in JavaScript). This doesn't work in the usual LeMP workflow (that I use), since EcsNodePrinter just prints the Symbol.Name without regard for whether it's a global symbol or not, so two different Symbols called x end up with the same name in the output. But inside a compiler that problem need not occur, and when outputting plain C# I could fix the problem by adding a unique-naming pass, before printing, whose job would be to ensure that all non-global symbols get a unique name string.

So... does such a design sound like a good idea to you?

from ecsharp.

jonathanvdc commented on May 29, 2024

Looks like the discussion is sort of diverging into two separate subjects. I've split my response into two sections accordingly.

Semantic macros

Truth be told, semantic macros as a replacement for a lexical macro-based design are a tough sell for me, so I'll argue against them now. (Sorry!)

Let's start off with a point of agreement. I think you're absolutely right that writing a #foreach macro is pretty clunky. It's just that the alternative so far – baking it into the compiler – is so much worse. Perhaps it's worth explaining my two reasons for writing #foreach as a macro:

foreach is inherently platform-dependent. For example, .NET runtime has interface IEnumerable<T> and interface IEnumerable<T>, with foreach being little more than syntactic sugar for a loop that calls methods on these interfaces. But ecsc is not as biased toward the .NET runtime as csc and mcs, and can (at least theoretically) generate code for other platforms. By implementing foreach as a macro, a different platform-specific definition can be written for foreach. This wouldn't have been possible if I had baked foreach in the ecsc source code.
Writing foreach as a lexical macro is, for all its clunkiness, still significantly less clunky than writing it as a "normal" expression/statement that the compiler can analyze. ecsc's lock implementation is significantly more complex than the foreach macro, and that complexity buys ecsc very little. Admittedly, error diagnostics for lock are much clearer than the error diagnostics for the foreach macro, but the foreach macro can be platform-specific, whereas the lock implementation is forever dependent on the existence of a type named System.Threading.Monitor, regardless of the platform for which code is generated.

These two points, especially the first, favor neither lexical nor semantic macros. Both can be used to create platform-specific definitions for (essentially platform-agnostic) constructs such as lock, using and foreach. I don't think a semantic macro will be more concise than a lexical macro with builtins (especially if that lexical macro uses quote { }), but I suppose that a semantic macro might be able to provide better diagnostics than the current lexical macros with builtins.

But I'm not convinced by your other criticisms. Specifically:

On chaining macros. RequiredMacros.cs defines all dependencies for the #foreach macro in the same namespace, in the same file, in the same assembly. It's all pretty cohesive. There's just no way that #foreach will get imported without its dependencies.
And I don't think it's reasonable to expect #foreach (or any other macro, for that matter) to continue to work fine when someone writes a conflicting definition (with an equal priority) for its dependencies. If that happens, then I expect LeMP to either report an error, or just silently pick one of those macro definitions and have ecsc diagnose any errors that this may cause. Besides, the macros in LeMP.StdMacros are just as vulnerable to this sort of thing as #foreach. I could easily redefine if as a macro that always picks the then-branch, and I'm sure that'd break no small amount of macros in LeMP.StdMacros.
The performance argument. You're right that encoding the macro's logic in Loyc trees results in additional processing time. Interpreting these trees is probably slower than handling the macro's logic from the compiler itself. But I believe semantic macros would be slower still, because they necessitate an additional intermediate representation: an IR that is annotated with type and symbol information. And embedding that type of information in classes derived from Symbol doesn't seem like a bad design, but that IR would still need to be constructed and then lowered to Flame's IR (ecsc translates Loyc trees directly into Flame IR at the moment). The construction/deconstruction passes would quite possibly impact performance far more negatively than a handful of builtin nodes ever could.

Though my main argument against semantic macros is their complexity. They add yet another major pass to the compilation process. That means that anybody who seeks to fully understand how EC# works must also learn how semantic macros work, which constructs they can affect, the order in which they are expanded, what the structure and rules of the IR they operate on are, etc.

Also, semantic macros complicate macro development. Right now, an EC# macro is always a lexical macro. But if semantic macros are introduced as well, then programmers will suddenly have to ask themselves which type of macro they should create. If semantic macros are strictly more powerful than lexical macros, then they might have to re-write entire lexical macros as semantic macros when they realize they need a feature that only semantic macros offer.

Plus, semantic macros are arguably harder to verify by the compiler than lexical macros. Right now, the (Flame) IR generated by ecsc is correct by construction. If the input LNodes contain errors that must be diagnosed at compile-time, then ecsc will do just that, and will subsequently terminate the compilation process; ecsc either reports an error or produces a correct IR tree. But a semantic macro could sneakily insert a semantically invalid construct, without the compiler taking notice. This would result in very hard-to-detect bugs.

As a final point in favor of lexical macros with builtins: this type of thing has precedent in other languages. The D programming language, for example, defines std.traits, a module that defines function templates which answer questions about the source code. That is eerily similar to ecsc's builtins. For example, isArray is the D equivalent of #builtin_is_array_type. If I recall correctly, then C++ has similar metaprogramming libraries.

Going forward, I think that it would be better to wrap ecsc's builtins into a standardized macro library, just like D has done. (Apparently, Phobos uses dmd builtins under the hood, as well: isNested seems like a fair example).

Symbols and hygiene

I very much like the idea of using separate SymbolPools to guarantee that symbols don't overlap. It certainly is a lot cleaner than the builtin-based design. I don't have a lot of spare time right now, but I think it's definitely worth implementing. I'll try to implement it in ecsc and its macro library once I get the opportunity to do so.

Or do you think that it makes more sense to just use LeMP's (future) renaming pass in ecsc? That would have the extra advantage of producing less confusing output when ecsc is instructed to print macro-expanded source code (via the -E switch).

from ecsharp.

qwertie commented on May 29, 2024

Semantic macros

Let me start here because I suspect you misunderstand what I was suggesting:

If the input LNodes contain errors that must be diagnosed at compile-time, then ecsc will do just that, and will subsequently terminate the compilation process; ecsc either reports an error or produces a correct IR tree. But a semantic macro could sneakily insert a semantically invalid construct, without the compiler taking notice. This would result in very hard-to-detect bugs.

I'm suggesting the same kind of semantic macros as Nemerle has. The macros would run after the list of types and methods has been built, but before method bodies have been converted to IR. You can't sneak anything invalid in if macros run before semantic analysis. (I think in Nemerle you might be able to do semantic analysis on local variables too, and create new class members.... I don't know how that works and I haven't easily found details online; if we do this we should install Nemerle and do some experiments to understand the details.)

And the second-stage macros should operate on Loyc trees so that users can easily switch which kind of macro they are writing, and so that the same MacroProcessor can still be used.

As a final point in favor of lexical macros with builtins: this type of thing has precedent in other languages. The D programming language, for example, defines std.traits, a module that defines function templates which answer questions about the source code.

Isn't that quite different? I didn't try any serious metaprogramming in D but IIRC, you can write code that calls those trait (builtin) functions directly and immediately acts on the results. That's more powerful and general than generating a syntax tree that eventually calls trait functions, as a lexical macro must do.

They add yet another major pass to the compilation process.

Since some of the compiler's own work could be implemented with them, it's not necessarily an additional pass beyond what you'd do anyway, is it? How many passes do you use already?

On chaining macros. RequiredMacros.cs defines all dependencies for the #foreach macro in the same namespace, in the same file, in the same assembly. It's all pretty cohesive. There's just no way that #foreach will get imported without its dependencies

I'll tell you a secret... I originally planned to let people write fully qualified macro names like Namespace.MacroName(...), I don't remember actually implementing that, but I'm seeing some code to support it in MacroProcessorTask.GetApplicableMacros so, yeah, maybe you can already invoke a macro without its namespace being imported.

Symbols and hygiene

Or do you think that it makes more sense to just use LeMP's (future) renaming pass in ecsc?

I'm confused because this is not an either/or question. If multiple symbol pools are used then a renaming pass is required to avoid name collisions in the text output.

Another thought, perhaps one should be able to write quote(pool) {...} to use a specified symbol pool for all identifiers in a quotation.

One more thought. For hygienic namespace lookup, to accomplish the same thing that Nemerle does, macros could instruct LeMP to look up macros from specific namespaces by wrapping their output in a certain command, say, #macroNamespaceContext().

As an example, let's say macro1 is in NamespaceA and it wants to return a call to macro2 in NamespaceB - without using a fully-qualified name - and there is also a macro2 in NamespaceC. macro1 is called from user code like this:

#importMacros(NamespaceA);
#importMacros(NamespaceC);
class Foo { ... }
macro1(otherMacro(Foo));

The user is also trying to call otherMacro from NamespaceC.

Let's say the macro originally returns something nonhygienic like this:

return quote { macro2($(node[0])) };

That is, macro1 is a useless macro that forwards the call to macro2. But the point is it wants macro2 from NamespaceB, while otherMacro should still be resolved from NamespaceC. The returned syntax tree in this case is

macro2(otherMacro(Foo));

Whereas the hygienic return would look more like this:

SymbolPool pool = new SymbolPool();
return F.Call("#macroNamespaceContext", 
    F.Literal(pool), 
    F.Literal(new DList<Symbol> {(Symbol)"NamespaceC"}),
    F.Call(pool["macro2"], node[0]));

so the first argument is a Literal containing a SymbolPool, the second argument is a Literal containing an IReadOnlyCollection<Symbol> of namespaces*, and the third argument is the actual output from the macro. Finally, the macro processor would use each Symbol's SymbolPool to decide in which list of namespaces to search for that Symbol.

(* the macro processor currently doesn't support qualified names stored as LNode - the qualified name Foo.Bar is just a Symbol with a literal dot in its name - and maybe it should stay that way since Symbols are dramatically faster than LNode as a dictionary key, or for comparisons.)

Obtaining hygiene is a bit clunky this way, but a helper function and/or macro could help.

from ecsharp.

jonathanvdc commented on May 29, 2024

My reply turned out to be significantly longer than I hoped it would be. Sorry about that.

Semantic macros

The macros would run after the list of types and methods has been built, but before method bodies have been converted to IR. You can't sneak anything invalid in if macros run before semantic analysis.

Right, but isn't that a little contradictory? I mean, suppose that a type Foo has been analyzed, and we now encounter an expression Foo.Bar. Foo in Foo.Bar could a type, a namespace or a value, and there's really only one way to ascertain what Foo is in the context wherein it appears: by analyzing the expression Foo.Bar. And semantic macros need to be able to answer questions like: "is Foo a type?" You probably also want semantic macros to be able to discover what Foo's type is, if it is a value rather than a type, which requires for all preceding statements and expressions to have been analyzed.

So, really, semantic macros would have to be evaluated in the middle of the semantic analysis process.

Which brings up the problem of what exactly a semantic macro returns. For example, a #foreach macro has to know what the type of its collection expression is, so it'll want to evaluate its collection argument node first, check what that collection's type is, build a loop with one or more induction variables, and only then evaluate the loop body node in the context of that loop.

At the moment, that logic is captured by the #foreach lexical macro as a Loyc tree, and the compiler then analyzes the resulting tree.

Now suppose that a semantic macro has to do the same job. There are basically two options here:

It could produce an IR tree, but then the macro is given the opportunity to sneak in illegal IR.
It could return a Loyc tree which is then fed to the remainder of the semantic analysis process. But then the collection expression is evaluated twice: once by the semantic macro to figure out which construction it should use, and once more by the semantic analysis, as the semantic macro will have to embed the collection expression in the Loyc tree produces. This isn't just bad for performance, it also implies that any (warning/error) diagnostic related to the collection expression is now printed twice.

Alternatively, the semantic macro could produce some kind of wishy-washy Loyc tree that contains both unanalyzed Loyc trees and IR trees. But that's just option one in disguise, as the macro could easily insert invalid IR trees in that mixed Loyc/IR tree.

I also don't feel comfortable with imposing Flame IR on the macro writer, because that implies that any breaking change to Flame is a breaking change to EC#. But if Flame IR is not used in semantic macros, then they'd require (at the very minimum) an entire type system to insulate the semantic macros from Flame's type system. That'd be a lot of work, would definitely hurt performance, make it harder for (other) people to write an EC# compiler, and would add little value to the language.

Anyway, I just like the (apparent) simplicity of lexical macros. They're fairly easy to write and understand, and I don't feel the same way about semantic macros. Maybe I'm wrong in thinking that, but the only way to know for sure what EC# semantic macros would be like is to build a prototype, and then take things from there. That'd be a massive undertaking, though, and I'd rather just keep on extending ecsc up to the point where it starts becoming a reasonable alternative to csc and mcs for some projects. Can we put semantic macros on hold until we get to that point? There's still a lot of work to be done before ecsc is mature enough to compile itself.

Isn't that quite different? I didn't try any serious metaprogramming in D but IIRC, you can write code that calls those trait (builtin) functions directly and immediately acts on the results. That's more powerful and general than generating a syntax tree that eventually calls trait functions, as a lexical macro must do.

I don't completely understand what you mean by that. Can you give me an example of "code that calls those trait (builtin) functions directly and immediately acts on the results?"

Since some of the compiler's own work could be implemented with them, it's not necessarily an additional pass beyond what you'd do anyway, is it? How many passes do you use already?

I was referring to the passes specified by the language itself: preprocessing, lexing, parsing, lexical macro expansion and semantic analysis. In hindsight, I was wrong to say that semantic macros necessitate another pass. They probably just extend the semantic analysis pass, but in an awkward way.

I'll tell you a secret... I originally planned to let people write fully qualified macro names like Namespace.MacroName(...), I don't remember actually implementing that, but I'm seeing some code to support it in MacroProcessorTask.GetApplicableMacros so, yeah, maybe you can already invoke a macro without its namespace being imported.

Well, I sure didn't see that one coming. I'm glad you're considering hygienic macro imports, though. I wouldn't mind switching to those once they become available.

Symbols and hygiene

I'm confused because this is not an either/or question. If multiple symbol pools are used then a renaming pass is required to avoid name collisions in the text output.

What I meant was that storing Symbol instances in the ecsc's symbol would solve a problem that has already been solved if LeMP is going to rename all locals anyway.

Another thought, perhaps one should be able to write quote(pool) {...} to use a specified symbol pool for all identifiers in a quotation.

Sounds like a great idea.

Obtaining hygiene is a bit clunky this way, but a helper function and/or macro could help.

Yeah, that syntax is sort of clunky. But I suppose that's okay if we can find an appropriate macro to build on top of that. I think a variant of quote (pool) { ... } that implicitly creates a pool, and then imports a set of namespaces could work nicely. Here's an example of what I imagine what that might look like (I know #quoteWithMacroImports is a bit silly; I haven't put much thought into what to call this macro).

return #quoteWithMacroImports (NamespaceC, OtherMacros)
{
    macro2($(node[0]))
};

from ecsharp.

Proposal: modularize the standard macro library about ecsharp HOT 10 CLOSED

Comments (10)

Symbols and hygiene

Semantic macros

Symbols and hygiene

Semantic macros

Symbols and hygiene

Semantic macros

Symbols and hygiene

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs