GithubHelp home page GithubHelp logo

inspirer / textmapper Goto Github PK

View Code? Open in Web Editor NEW
105.0 105.0 23.0 61.73 MB

Lexer and Parser generator

Home Page: http://textmapper.org

License: MIT License

Shell 0.01% C++ 1.40% Tcl 1.49% Yacc 4.66% Go 92.36% Starlark 0.08%

textmapper's People

Contributors

ahhhhmed avatar dependabot[bot] avatar inspirer avatar jdb avatar jmorcos avatar luckygeck avatar mewmew avatar pwaller avatar superbobry avatar takamori avatar vii avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

textmapper's Issues

Problems trying to convert PostgreSQL-16 grammar

Trying to convert a working PostgreSQL-16 grammar to texmapper I found several issues:

  • Having a lexer rule like PRIVILEGES : /[pP][rR][iI][vV][iI][lL][eE][gG][eE][sS]/ prevent to have a rule like privileges : privilege_list ... with this error message postgresql-16.tm,2441: redeclaration of terminal: privileges

The original grammar has no unresolved shift/reduce of reduce/reduce conflicts but textmapper report:

postgresql-16.tm,4198: input: MODE_TYPE_NAME x_ucharacter
reduce/reduce conflict (next: eoi, IDENT, ABORT_P, ABSENT, ABSOLUTE_P, ACCESS, ACTION, ADD_P, ADMIN, AFTER, AGGREGATE, ALL, ALSO, ALTER, ALWAYS, ANALYSE, ANALYZE, AND, ANY, ARRAY, AS, ASC, ASENSITIVE, ASSERTION, ASSIGNMENT, ASYMMETRIC, AT, ATOMIC, ATTACH, ATTRIBUTE, AUTHORIZATION, BACKWARD, BEFORE, BEGIN_P, BETWEEN, BIGINT, BINARY, BIT, BOOLEAN_P, BOTH, BREADTH, BY, CACHE, CALL, CALLED, CASCADE, CASCADED, CASE, CAST, CATALOG_P, CHAIN, CHARACTERISTICS, CHECK, CHECKPOINT, CLASS, CLOSE, CLUSTER, COALESCE, COLLATE, COLLATION, COLUMN, COLUMNS, COMMENT, COMMENTS, COMMIT, COMMITTED, COMPRESSION, CONCURRENTLY, CONFIGURATION, CONFLICT, CONNECTION, CONSTRAINT, CONSTRAINTS, CONTENT_P, CONTINUE_P, CONVERSION_P, COPY, COST, CREATE, CROSS, CSV, CUBE, CURRENT_P, CURRENT_CATALOG, CURRENT_DATE, CURRENT_ROLE, CURRENT_SCHEMA, CURRENT_TIME, CURRENT_TIMESTAMP, CURRENT_USER, CURSOR, CYCLE, DATA_P, DATABASE, DEALLOCATE, DEC, DECIMAL_P, DECLARE, DEFAULT, DEFAULTS, DEFERRABLE, DEFERRED, DEFINER, DELETE_P, DELIMITER, DELIMITERS, DEPENDS, DEPTH, DESC, DETACH, DICTIONARY, DISABLE_P, DISCARD, DISTINCT, DO, DOCUMENT_P, DOMAIN_P, DOUBLE_P, DROP, EACH, ELSE, ENABLE_P, ENCODING, ENCRYPTED, END_P, ENUM_P, ESCAPE, EVENT, EXCEPT, EXCLUDE, EXCLUDING, EXCLUSIVE, EXECUTE, EXISTS, EXPLAIN, EXPRESSION, EXTENSION, EXTERNAL, EXTRACT, FALSE_P, FAMILY, FETCH, FINALIZE, FIRST_P, FLOAT_P, FOLLOWING, FOR, FORCE, FOREIGN, FORMAT, FORWARD, FREEZE, FROM, FULL, FUNCTION, FUNCTIONS, GENERATED, GLOBAL, GRANT, GRANTED, GREATEST, GROUP_P, GROUPING, GROUPS, HANDLER, HAVING, HEADER_P, HOLD, IDENTITY_P, IF_P, ILIKE, IMMEDIATE, IMMUTABLE, IMPLICIT_P, IMPORT_P, IN_P, INCLUDE, INCLUDING, INCREMENT, INDENT, INDEX, INDEXES, INHERIT, INHERITS, INITIALLY, INLINE_P, INNER_P, INOUT, INPUT_P, INSENSITIVE, INSERT, INSTEAD, INT_P, INTEGER, INTERSECT, INTERVAL, INTO, INVOKER, IS, ISNULL, ISOLATION, JOIN, JSON, JSON_ARRAY, JSON_ARRAYAGG, JSON_OBJECT, JSON_OBJECTAGG, KEY, KEYS, LABEL, LANGUAGE, LARGE_P, LAST_P, LATERAL_P, LEADING, LEAKPROOF, LEAST, LEFT, LEVEL, LIKE, LIMIT, LISTEN, LOAD, LOCAL, LOCALTIME, LOCALTIMESTAMP, LOCATION, LOCK_P, LOCKED, LOGGED, MAPPING, MATCH, MATCHED, MATERIALIZED, MAXVALUE, MERGE, METHOD, MINVALUE, MODE, MOVE, NAME_P, NAMES, NATIONAL, NATURAL, NCHAR, NEW, NEXT, NFC, NFD, NFKC, NFKD, NO, NONE, NORMALIZE, NORMALIZED, NOT, NOTHING, NOTIFY, NOTNULL, NOWAIT, NULL_P, NULLIF, NULLS_P, NUMERIC, OBJECT_P, OF, OFF, OFFSET, OIDS, OLD, ON, ONLY, OPERATOR, OPTION, OPTIONS, OR, ORDER, ORDINALITY, OTHERS, OUT_P, OUTER_P, OVERLAY, OVERRIDING, OWNED, OWNER, PARALLEL, PARAMETER, PARSER, PARTIAL, PARTITION, PASSING, PASSWORD, PLACING, PLANS, POLICY, POSITION, PRECEDING, PREPARE, PREPARED, PRESERVE, PRIMARY, PRIOR, PRIVILEGES, PROCEDURAL, PROCEDURE, PROCEDURES, PROGRAM, PUBLICATION, QUOTE, RANGE, READ, REAL, REASSIGN, RECHECK, RECURSIVE, REF_P, REFERENCES, REFERENCING, REFRESH, REINDEX, RELATIVE_P, RELEASE, RENAME, REPEATABLE, REPLACE, REPLICA, RESET, RESTART, RESTRICT, RETURN, RETURNING, RETURNS, REVOKE, RIGHT, ROLE, ROLLBACK, ROLLUP, ROUTINE, ROUTINES, ROW, ROWS, RULE, SAVEPOINT, SCALAR, SCHEMA, SCHEMAS, SCROLL, SEARCH, SECURITY, SELECT, SEQUENCE, SEQUENCES, SERIALIZABLE, SERVER, SESSION, SESSION_USER, SET, SETOF, SETS, SHARE, SHOW, SIMILAR, SIMPLE, SKIP, SMALLINT, SNAPSHOT, SOME, SQL_P, STABLE, STANDALONE_P, START, STATEMENT, STATISTICS, STDIN, STDOUT, STORAGE, STORED, STRICT_P, STRIP_P, SUBSCRIPTION, SUBSTRING, SUPPORT, SYMMETRIC, SYSID, SYSTEM_P, SYSTEM_USER, TABLE, TABLES, TABLESAMPLE, TABLESPACE, TEMP, TEMPLATE, TEMPORARY, TEXT_P, THEN, TIES, TIME, TIMESTAMP, TRAILING, TRANSACTION, TRANSFORM, TREAT, TRIGGER, TRIM, TRUE_P, TRUNCATE, TRUSTED, TYPE_P, TYPES_P, UESCAPE, UNBOUNDED, UNCOMMITTED, UNENCRYPTED, UNION, UNIQUE, UNKNOWN, UNLISTEN, UNLOGGED, UNTIL, UPDATE, USER, USING, VACUUM, VALID, VALIDATE, VALIDATOR, VALUE_P, VALUES, VARCHAR, VARIADIC, VERBOSE, VERSION_P, VIEW, VIEWS, VOLATILE, WHEN, WHERE, WHITESPACE_P, WINDOW, WITH, WITHOUT, WORK, WRAPPER, WRITE, XML_P, XMLATTRIBUTES, XMLCONCAT, XMLELEMENT, XMLEXISTS, XMLFOREST, XMLNAMESPACES, XMLPARSE, XMLPI, XMLROOT, XMLSERIALIZE, XMLTABLE, YES_P, ZONE, LESS_EQUALS, GREATER_EQUALS, NOT_EQUALS, TYPECAST, FORMAT_LA, NULLS_LA, NOT_LA, Op, ';', '=', ')', ',', '*', '/', '+', '-', '%', '[', ']', '^', '<', '>', ':')
    SimpleTypename : x_ucharacter
    CharacterWithoutLength : x_ucharacter

postgresql-16.tm,4273: input: MODE_TYPE_NAME x_ucharacter
shift/reduce conflict (next: '(')
    CharacterWithoutLength : x_ucharacter

postgresql-16.tm,4273: input: SELECT distinct_clause x_ucharacter
shift/reduce conflict (next: '(')
    CharacterWithoutLength : x_ucharacter

postgresql-16.tm,4259: input: SELECT distinct_clause CharacterWithLength
reduce/reduce conflict (next: SCONST)
    x_ucharacter : CharacterWithLength
    ConstCharacter : CharacterWithLength

postgresql-16.tm,4260: input: SELECT distinct_clause CharacterWithoutLength
reduce/reduce conflict (next: SCONST)
    x_ucharacter : CharacterWithoutLength
    ConstCharacter : CharacterWithoutLength

conflicts: 2 shift/reduce and 483 reduce/reduce
lalr: 0.585s, text: 1.394s, parser: 6221 states, 3011KB

See attached the converted grammar:
postgresql-16.tm.zip

Also a working grammar can be seen here https://meimporta.eu/lalr-playground/

ast: Foos cannot be a list, since it precedes Foo

I've been unable to find a good way of representing the following grammar:

CatchSwitchTerm -> CatchSwitchTerm
	: 'catchswitch' 'within' Scope=ExceptionScope '[' Handlers=(Label separator ',')+ ']' 'unwind' UnwindTarget=UnwindTarget Metadata=(',' MetadataAttachment)+?
;

%interface UnwindTarget;

UnwindTarget -> UnwindTarget
	: 'to' 'caller' -> UnwindToCaller
	| Label
;

As running the above grammar through Textmapper results in the following error:

$ make
ll.tm,2739: `Handlers` cannot be a list, since it precedes UnwindTarget
lalr: 0.221s, text: 0.521s, parser: 2170 states, 226KB
make: *** [Makefile:7: gen] Error 1

Any suggestions would be warmly welcome.

parser: undefined: ignoredTokens

I get the compile time error parser.go:54:2: undefined: ignoredTokens when trying to compile the Go generated parser for the following grammar (foo.tm):

Note: this is on rev 78fc54e of Textmapper, using the following command to process the grammar @java -jar ${TM_DIR}/tm-tool/libs/textmapper-0.9.21.jar foo.tm

language foo(go);

lang = "foo"
package = "github.com/mewspring/foo"

::lexer

'foobar' : /foobar/

::parser

input : 'foobar' ;

Support {eoi} in patterns

According to the docs, it's valid to use {eoi} in a pattern as long as it remain at the end. Yet, anything other than just /{eoi}/ results in a syntax error.

Reproducer:

<initial> ignored_empty_line: /[\f\t ]*({newline}|{eoi})/ (space)

Prune nonterminals and terminals not reachable from the start state of the grammar, also report as warnings

In relation to #15, it would also be useful if Textmapper output warnings when a grammar contained unused rules, that is, rules that cannot be reached from the start state of the grammar.

Consider the follow grammar (qux.tm):

language qux(go);

lang = "qux"
package = "github.com/mewspring/foo/qux"

:: lexer

'foo' : /foo/
'bar' : /bar/

:: parser

input : Foo ;

Foo : 'foo' ;

Bar : 'bar' ;

Bar is not reachable from input, thus the non-terminal Bar can be pruned from the parser table. This step could be performed automatically by Textmapper when generating the parser tables.

To check which states are reachable, a depth-first search from the start state of the grammar could be used as an initial implementation.

While pruning non-terminals, it would be very useful if this information was presented as warnings to the user. Both as they may then refine and simplify their grammar. Or perhaps just as likely, discover bugs where rules were not used, even though that was the intention of the user.

Similarly, tokens (e.g. bar) may be pruned from the list of tokens, and the lexer table may be refined to exclude tokens that are not present in any rule that is reachable from the start state of the grammar.

Edit:
Current output of Textmapper for qux.tm:

$ textmapper
lalr: 0.013s, text: 0.072s, parser: 5 states, 0KB

Suggested output of Textmapper when pruning non-terminals and terminals:

$ textmapper
warning: nonterminal Bar not reachable from input state, it has been pruned
warning: terminal bar not reachable from input state, it has been pruned
lalr: 0.012s, text: 0.08s, parser: 5 states, 0KB

Or something along those lines.

tm-parsers/tm_test: undefined: tm.SOFT

The following test case is failing on revision b962fbe.

$ go test -count=1 ./...
FAIL	github.com/inspirer/textmapper/tm-parsers/tm [build failed]
# github.com/inspirer/textmapper/tm-parsers/tm_test [github.com/inspirer/textmapper/tm-parsers/tm.test]
tm-parsers/tm/lexer_test.go:146:3: undefined: tm.SOFT

Are %shift modifiers still supported to resolve ambiguities?

From my understanding of http://textmapper.org/documentation.html#grammar-ambiguities, %shift modifiers may be used to resolve shift/reduce ambiguities in grammars.

From the example, I tried to derive a minimal working grammar. However, when I tried to generate the parser using Textmapper, I got the following error:

$ textmapper shiftreduce.tm 
shiftreduce.tm,34: input: 'if' '(' Expr ')' Stmt
shift/reduce conflict (next: 'else')
    IfStmt : 'if' '(' Expr ')' Stmt

conflicts: 1 shift/reduce and 0 reduce/reduce
lalr: 0.019s, text: 0.105s, parser: 18 states, 0KB

From https://github.com/mewspring/foo/blob/427ecc6ddb2a95906cea7570c5b28c33abc26544/shiftreduce/shiftreduce.tm#L33:

IfStmt
	: 'if' '(' Expr ')' Stmt %shift 'else'
	| 'if' '(' Expr ')' Stmt 'else' Stmt
;

Is that the correct syntax for using %shift modifiers, and if so, why is there a shift/reduce error being reported?

If I remove 'if' '(' Expr ')' Stmt %shift 'else' from the grammar, Textmapper is able to successfully produce both lexers and parsers.

Edit: this is on version v0.9.22 of Textmapper.

u@x1 ~> textmapper --version
textmapper v0.9.22/java build 2018

Collaborate to develop a good parser tool

Hello !
I just discovered your project and I'm also interested on developing a good parser tool and actually working on https://github.com/mingodad/lalr/tree/playground and I have several grammars (in different stages of completeness here https://meimporta.eu/lalr-playground/ , including a textmapper grammar that can be tested online, select Textmapper parser from the Examples dropdown and then click on the Parse button).

Let's talk about how we can cooperate to evolve our projects ?

Cheers !

jar: error while generating a sublexer that matches {eoi} as invalid_token

While trying to generate the Go lexer via textmapper.jar (master @ 08d03f4 compiled with openjdk version "13.0.1" 2019-10-15), I got the following error:

$ ls
test.tm
$ java -jar textmapper.jar
jar:file:textmapper.jar!/org/textmapper/tool/templates/go_token.ltp,19: Evaluation of `syntax.symbols[i].isConstant()` failed for [common.Context]: (caused by java.lang.IllegalArgumentException): null
jar:file:textmapper.jar!/org/textmapper/tool/templates/go_token.ltp,28: Evaluation of `syntax.symbols[i].isConstant()` failed for [common.Context]: (caused by java.lang.IllegalArgumentException): null
lalr: 0.009s, text: 0.146s

The grammar that triggers the issue (reduced to a minimal reproducer) is:

language test(go);
lang = "test"

:: lexer

%s initial;
%x inComment;

invalid_token:
error:

whitespace: /[\n\r\t\f\v ]+/ (space)

<initial, inComment> EnterBlockComment:  /\(\*/ (space)

<initial> invalid_token: /\*\)/

<inComment> {
invalid_token: /{eoi}/
ExitBlockComment: /\*\)/ (space)
BlockComment: /[^\(\)\*]+|[\*\(\)]/ (space)
}

The issue seems to be triggered by <inComment> invalid_token: /{eoi}/: I'm assuming the rule is allowed since compiling the same grammar with tm-go/cmd/textmapper works perfectly.

parse to ast error

Hi,
the parse.go deal this the part of llvm ir file will throw error
unable to parse "/Users/admin/test/test.ll" into an AST: syntax error at line 1

define void @testcase_1dep.Foo.Bar({ { i8*, i64 }, { i8*, i64 } }* nocapture sret({ { i8*, i64 }, { i8*, i64 } }) %sret.formal.0, i8* nest nocapture readnone %nest.0, %Foo.0* readnone %f, i8* %command.chunk0, i64 %command.chunk1) #0 !dbg !6 { entry: %tmp.0 = alloca { { i8*, i64 }, { i8*, i64 } }, align 8 %tmpv.0 = alloca [2 x { i8*, i64 }], align 8 %tmpv.1 = alloca { { i8*, i64 }, { i8*, i64 } }, align 8 call void @llvm.dbg.value(metadata %Foo.0* %f, metadata !28, metadata !DIExpression()), !dbg !29 call void @llvm.dbg.value(metadata i8* %command.chunk0, metadata !30, metadata !DIExpression(DW_OP_LLVM_fragment, 0, 64)), !dbg !29 call void @llvm.dbg.value(metadata i64 %command.chunk1, metadata !30, metadata !DIExpression(DW_OP_LLVM_fragment, 64, 64)), !dbg !29 %cast.38 = bitcast [2 x { i8*, i64 }]* %tmpv.0 to i8*, !dbg !31 call void @llvm.memcpy.p0i8.p0i8.i64(i8* noundef nonnull align 8 dereferenceable(16) %cast.38, i8* noundef nonnull align 8 dereferenceable(16) bitcast ({ i8*, i64 }* @const.16 to i8*), i64 16, i1 false), !dbg !31 %command.addr.sroa.0.0.cast.39.sroa_idx = getelementptr inbounds [2 x { i8*, i64 }], [2 x { i8*, i64 }]* %tmpv.0, i64 0, i64 1, i32 0, !dbg !31 store i8* %command.chunk0, i8** %command.addr.sroa.0.0.cast.39.sroa_idx, align 8, !dbg !31 %command.addr.sroa.4.0.cast.39.sroa_idx4 = getelementptr inbounds [2 x { i8*, i64 }], [2 x { i8*, i64 }]* %tmpv.0, i64 0, i64 1, i32 1, !dbg !31 store i64 %command.chunk1, i64* %command.addr.sroa.4.0.cast.39.sroa_idx4, align 8, !dbg !31 %call.0 = call { i8*, i64 } @runtime.concatstrings(i8* nest undef, i8* null, i8* nonnull %cast.38, i64 2), !dbg !31 call void @llvm.dbg.value(metadata i8* undef, metadata !30, metadata !DIExpression(DW_OP_LLVM_fragment, 0, 64)), !dbg !29 call void @llvm.dbg.value(metadata i64 undef, metadata !30, metadata !DIExpression(DW_OP_LLVM_fragment, 64, 64)), !dbg !29 %icmp.0 = icmp eq %Foo.0* %f, null, !dbg !32 br i1 %icmp.0, label %then.0, label %else.0, !make.implicit !5

seem the parse cannot correct del define void @testcase_1dep.Foo.Bar({ { i8*, i64 }, { i8*, i64 } }* nocapture sret({ { i8*, i64 }, { i8*, i64 } }) %sret.formal.0, i8* nest nocapture readnone %nest.0, %Foo.0* readnone %f, i8* %command.chunk0, i64 %command.chunk1) #0 !dbg !6 {
debug value as follow
state: 614
action: -2

Build instructions for textmapper tool

I updated to the latest build which fixes the errors for lexer and parser generation. However, I must confess that I don't know exactly what the right steps are to build Java projects.

Specifically, I would like to know what command is intended to use for building Textmapper, for testing Textmapper, and for re-generating the parsers of the Textmapper repo. I would also like to know where the build artifact of the Textmapper tool (i.e. textmapper-0.9.21.jar or tool-0.9.21-SNAPSHOT-all.jar) is supposed to be located.

I have installed Maven and Ant.

From the tm-tool directory, this is what I get when building with mvn deploy:

Note: this still produces the build artifact that I need for using Textmapper (tm-tool/tool/target/tool-0.9.21-SNAPSHOT-all.jar), and all test cases passes it seems.

$ mvn deploy
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] Textmapper Master 0.9.21-SNAPSHOT .................. SUCCESS [  0.668 s]
[INFO] Textmapper templates ............................... SUCCESS [  1.412 s]
[INFO] Lapg ............................................... SUCCESS [  1.723 s]
[INFO] Textmapper ......................................... SUCCESS [  2.977 s]
[INFO] Textmapper tool 0.9.21-SNAPSHOT .................... FAILURE [  4.174 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 11.720 s
[INFO] Finished at: 2018-10-13T11:21:39+02:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.sonatype.plugins:nexus-staging-maven-plugin:1.6.4:deploy (injected-nexus-deploy) on project tool: Failed to deploy artifacts: Could not transfer artifact org.textmapper:lapg:jar:0.9.21-20181013.092137-1 from/to sonatype-nexus-snapshots (https://oss.sonatype.org/content/repositories/snapshots/): Failed to transfer file: https://oss.sonatype.org/content/repositories/snapshots/org/textmapper/lapg/0.9.21-SNAPSHOT/lapg-0.9.21-20181013.092137-1.jar. Return code is: 401, ReasonPhrase: Unauthorized. -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :tool

From the tm-tool directory, this is what I get when building with ant deploy:

Note: this still produces the build artifact that I need for using Textmapper (tm-tool/libs/textmapper-0.9.21.jar). The build artifact is located in a different directory as compared to mvn. What is the canonical directory for the Textmapper tool artifact? Also, this does not seem to run test cases?

u@x1 ~/g/s/g/i/t/tm-tool> ant deploy
Buildfile: /home/u/goget/src/github.com/inspirer/textmapper/tm-tool/build.xml

rev:
     [echo] revision: 76054ba15463585cf1beb0acebb8fe63bcff1909

build:
    [mkdir] Created dir: /home/u/goget/src/github.com/inspirer/textmapper/build/java.out/textmapper
     [copy] Copying 31 files to /home/u/goget/src/github.com/inspirer/textmapper/build/java.out/textmapper
    [javac] Compiling 446 source files to /home/u/goget/src/github.com/inspirer/textmapper/build/java.out/textmapper
    [javac] Note: Some input files use unchecked or unsafe operations.
    [javac] Note: Recompile with -Xlint:unchecked for details.
      [jar] Building jar: /home/u/goget/src/github.com/inspirer/textmapper/build/textmapper.jar

source:
    [mkdir] Created dir: /home/u/goget/src/github.com/inspirer/textmapper/build/java.src/textmapper
     [copy] Copying 454 files to /home/u/goget/src/github.com/inspirer/textmapper/build/java.src/textmapper
      [jar] Building jar: /home/u/goget/src/github.com/inspirer/textmapper/build/textmapper-src.jar

deploy:
     [copy] Copying 1 file to /home/u/goget/src/github.com/inspirer/textmapper/tm-tool/libs
     [copy] Copying 1 file to /home/u/goget/src/github.com/inspirer/textmapper/tm-tool/libs
     [copy] Copying 1 file to /home/u/goget/src/github.com/inspirer/textmapper/tm-idea/org.textmapper.idea/lib
     [copy] Copying 1 file to /home/u/goget/src/github.com/inspirer/textmapper/tm-idea/org.textmapper.idea/lib
     [copy] Copying 1 file to /home/u/goget/src/github.com/inspirer/textmapper/tm-eclipse/plugins/org.textmapper
     [copy] Copying 1 file to /home/u/goget/src/github.com/inspirer/textmapper/tm-eclipse/plugins/org.textmapper

BUILD SUCCESSFUL
Total time: 3 seconds

throw exception in plugin for idea ultimate on mac os x

error:
update failed for AnAction with ID=ShowUmlDiagram: org/textmapper/tool/parser/TMLexer

stacktrace:

update failed for AnAction with ID=ShowUmlDiagram: org/textmapper/tool/parser/TMLexer
java.lang.NoClassDefFoundError: org/textmapper/tool/parser/TMLexer
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
at java.lang.ClassLoader.defineClass(ClassLoader.java:465)
at com.intellij.util.lang.UrlClassLoader._defineClass(UrlClassLoader.java:153)
at com.intellij.util.lang.UrlClassLoader.defineClass(UrlClassLoader.java:149)
at com.intellij.util.lang.UrlClassLoader._findClass(UrlClassLoader.java:125)
at com.intellij.ide.plugins.cl.PluginClassLoader.d(PluginClassLoader.java:102)
at com.intellij.ide.plugins.cl.PluginClassLoader.loadClass(PluginClassLoader.java:63)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at org.textmapper.idea.lang.syntax.lexer.LapgLexerAdapter.start(Unknown Source)
at com.intellij.lexer.Lexer.start(Lexer.java:45)
at com.intellij.lang.impl.PsiBuilderImpl.a(PsiBuilderImpl.java:204)
at com.intellij.lang.impl.PsiBuilderImpl.(PsiBuilderImpl.java:177)
at com.intellij.lang.impl.PsiBuilderImpl.(PsiBuilderImpl.java:151)
at com.intellij.lang.impl.PsiBuilderImpl.(PsiBuilderImpl.java:185)
at com.intellij.lang.impl.PsiBuilderFactoryImpl.createBuilder(PsiBuilderFactoryImpl.java:52)
at com.intellij.psi.tree.ILazyParseableElementType.doParseContents(ILazyParseableElementType.java:62)
at com.intellij.psi.tree.IFileElementType.parseContents(IFileElementType.java:43)
at com.intellij.psi.impl.source.tree.LazyParseableElement.e(LazyParseableElement.java:165)
at com.intellij.psi.impl.source.tree.LazyParseableElement.getFirstChildNode(LazyParseableElement.java:209)
at com.intellij.psi.impl.source.tree.CompositeElement.countChildren(CompositeElement.java:493)
at com.intellij.psi.impl.source.tree.CompositeElement.getChildrenAsPsiElements(CompositeElement.java:455)
at com.intellij.psi.impl.source.PsiFileImpl.getChildren(PsiFileImpl.java:741)
at com.intellij.uml.java.JavaUmlElementManager.findInDataContext(JavaUmlElementManager.java:84)
at com.intellij.uml.java.JavaUmlElementManager.findInDataContext(JavaUmlElementManager.java:47)
at com.intellij.diagram.DiagramProvider.findProvider(DiagramProvider.java:148)
at com.intellij.uml.core.actions.ShowDiagramBase.update(ShowDiagramBase.java:61)
at com.intellij.openapi.actionSystem.ex.ActionUtil.performDumbAwareUpdate(ActionUtil.java:111)
at com.intellij.openapi.actionSystem.impl.Utils.a(Utils.java:167)
at com.intellij.openapi.actionSystem.impl.Utils.updateGroupChild(Utils.java:226)
at com.intellij.openapi.actionSystem.impl.Utils.a(Utils.java:200)
at com.intellij.openapi.actionSystem.impl.Utils.expandActionGroup(Utils.java:136)
at com.intellij.openapi.actionSystem.impl.Utils.expandActionGroup(Utils.java:85)
at com.intellij.openapi.actionSystem.impl.Utils.fillMenu(Utils.java:241)
at com.intellij.openapi.actionSystem.impl.ActionPopupMenuImpl$MyMenu.show(ActionPopupMenuImpl.java:96)
at com.intellij.ide.ui.customization.CustomizationUtil$3.invokePopup(CustomizationUtil.java:284)
at com.intellij.ui.PopupHandler.mousePressed(PopupHandler.java:48)
at java.awt.AWTEventMulticaster.mousePressed(AWTEventMulticaster.java:263)
at java.awt.AWTEventMulticaster.mousePressed(AWTEventMulticaster.java:262)
at java.awt.Component.processMouseEvent(Component.java:6411)
at javax.swing.JComponent.processMouseEvent(JComponent.java:3275)
at com.intellij.ui.treeStructure.Tree.processMouseEvent(Tree.java:420)
at com.intellij.ide.dnd.aware.DnDAwareTree.processMouseEvent(DnDAwareTree.java:51)
at java.awt.Component.processEvent(Component.java:6179)
at java.awt.Container.processEvent(Container.java:2083)
at java.awt.Component.dispatchEventImpl(Component.java:4776)
at java.awt.Container.dispatchEventImpl(Container.java:2141)
at java.awt.Component.dispatchEvent(Component.java:4604)
at java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4619)
at java.awt.LightweightDispatcher.processMouseEvent(Container.java:4277)
at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4210)
at java.awt.Container.dispatchEventImpl(Container.java:2127)
at java.awt.Window.dispatchEventImpl(Window.java:2489)
at java.awt.Component.dispatchEvent(Component.java:4604)
at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:717)
at java.awt.EventQueue.access$400(EventQueue.java:82)
at java.awt.EventQueue$2.run(EventQueue.java:676)
at java.awt.EventQueue$2.run(EventQueue.java:674)
at java.security.AccessController.doPrivileged(Native Method)
at java.security.AccessControlContext$1.doIntersectionPrivilege(AccessControlContext.java:86)
at java.security.AccessControlContext$1.doIntersectionPrivilege(AccessControlContext.java:97)
at java.awt.EventQueue$3.run(EventQueue.java:690)
at java.awt.EventQueue$3.run(EventQueue.java:688)
at java.security.AccessController.doPrivileged(Native Method)
at java.security.AccessControlContext$1.doIntersectionPrivilege(AccessControlContext.java:86)
at java.awt.EventQueue.dispatchEvent(EventQueue.java:687)
at com.intellij.ide.IdeEventQueue.d(IdeEventQueue.java:700)
at com.intellij.ide.IdeEventQueue._dispatchEvent(IdeEventQueue.java:521)
at com.intellij.ide.IdeEventQueue.dispatchEvent(IdeEventQueue.java:348)
at java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:296)
at java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:211)
at java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:201)
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:196)
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:188)
at java.awt.EventDispatchThread.run(EventDispatchThread.java:122)
Caused by: java.lang.ClassNotFoundException: org.textmapper.tool.parser.TMLexer PluginClassLoader[org.textmapper.idea, 0.9.2]
at com.intellij.ide.plugins.cl.PluginClassLoader.loadClass(PluginClassLoader.java:77)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)

ast: comma separated list is parsed as empty (even when present in input)

Using the parser generated by the latest revision of Textmapper (c3f3e1b), I've run into a bug where the AST Indices method of GetElementPtrInst instructions return an empty slice, even when the input text contains indices.

Given the following input text inst_memory.ll:

Note: in the input text below , i64 0, i64 0 corresponds to the list of indices (where i64 0 is reduced as a TypeValue).

@s = constant [4 x i8] c"foo\00"

define void @f() {
	%1 = getelementptr [4 x i8], [4 x i8]* @s, i64 0, i64 0
	ret void
}

And the relevant parts of the grammar ll.tm:

GetElementPtrInst -> GetElementPtrInst
	: 'getelementptr' InBoundsopt ElemType=Type ',' Src=TypeValue Indices=(',' TypeValue)* Metadata=(',' MetadataAttachment)+?
;

To reproduce the above observation, run the following commands.

$ go get -u github.com/mewspring/foo/indices
$ cd $GOPATH/src/github.com/mewspring/foo/indices/cmd/indices
$ go run main.go inst_memory.ll
text: "getelementptr [4 x i8], [4 x i8]* @s, i64 0, i64 0"
elem type: "[4 x i8]"
src: "[4 x i8]* @s"
len(indices): 0

The expected output is len(indices): 2. So, for some reason, the indices is not present in the AST.

Note, no changes have been made to the generated parser, which is located at https://github.com/llir/ll

The only hand-written code is that of main.go

Potential optimization for listener.go, use ... notation to use arrays instead of slices for interface types

For interface (i.e. set/union types), lookup tables are generated to be used by selector.go. These lookup tables are currently generated as slices.

Since the size of the lookup tables is fixed, it should be possible to use the [...] notation to allocate arrays of the required size to keep all elements of the array literals.

From generated listener.go

-var Type = []NodeType{
+var Type = [...]NodeType{
 	ArrayType,
 	FloatType,
 	FuncType,
 	IntType,
 	LabelType,
 	MMXType,
 	MetadataType,
 	NamedType,
 	PackedStructType,
 	PointerType,
 	StructType,
 	TokenType,
 	VectorType,
 	VoidType,
 }

Used in selector.go:

	Type                            = OneOf(ll.Type...)

Potentially this could remove a few bounds checks.

However, as I'm writing this, it seem more likely that these lookup tables are always used as slices (e.g. passed as an argument to OneOf. Therefore, having them as slices, and not arrays makes more sense, as otherwise we would have to generate slice descriptors for each invocation to OneOf.

Well, feel free to close this issue. I just wanted to write it down, so we can evaluate or discard the idea.

Cheers,
Robin

Node as nonterminal name

Lets consider a .tm file

language prop(java);

prefix = "AST"
package = "ru.aptu.xml"
gentree=true
genast=true
positions="offset, line"
endpositions="offset"

:: lexer

identifier(String): /[a-zA-Z_][a-zA-Z_0-9]*/   { $symbol = current(); }
openChar:       /</
closeChar:      />/
_skip:          /[\n\t\r ]+/ (space)

:: parser

input ::= root=node;
node ::= openChar identifier closeChar;

This compiles successfully without any warning but leads to a semantically wrong program. In /ast/AstNode we will get

public class AstNode extends AstNode {
}

listener: missing value in const declaration

I get the compile time error listener.go:14:2: missing value in const declaration when trying to compile the Go generated listener for the following grammar (bar.tm):

Note: this is on rev 78fc54e of Textmapper, using the following command to process the grammar @java -jar ${TM_DIR}/tm-tool/libs/textmapper-0.9.21.jar bar.tm

language bar(go);

lang = "bar"
package = "github.com/mewspring/foo/bar"
eventBased = true

::lexer

'foobar' : /foobar/

::parser

input : 'foobar' ;

tm-parsers/js/js.tm: test case failing with "The on-disk content differs from the generated one"

I just synced with the latest revision of textmapper (4d45f61), and when running go test ./..., I got the following error:

u@x220 ~/g/s/g/i/textmapper> go test ./...
?   	github.com/inspirer/textmapper/tm-go/cmd/textmapper	[no test files]
--- FAIL: TestGenerate (0.40s)
    --- FAIL: TestGenerate/../../tm-parsers/js/js.tm (0.31s)
        gen_test.go:45: The on-disk content differs from the generated one.
            --- ../../tm-parsers/js/lexer_tables.go
            +++ lexer_tables.go (generated)
            @@ -304,12 +304,18 @@
             		54, 54, 54, 54, 54, 54, 57,
             	}},
             	{3664, 3674, 59, nil},
            -	{3713, 3802, 59, []uint8{
            -		54, 54, 1, 54, 1, 1, 54, 54, 1, 54, 1, 1, 54, 1, 1, 1, 1, 1, 1, 54, 54, 54,
            -		54, 1, 54, 54, 54, 54, 54, 54, 54, 1, 54, 54, 54, 1, 54, 1, 54, 1, 1, 54,
...

bug: Incorrect parent and firstChild set for leftmost and rightmost nonterminals of zero length

Creating a dedicated issue to track the bug identified at #17 (comment)

Edit: For added context, the only hand-written files are parser.go and tree.go, both of which are mostly identical copies of the original parser.go and tree.go of the TextMapper project. Which is why this bug is interesting to fix as it pertains to TextMapper itself.

I'll add my latest debug session below, feel free to skim as it's quite long :)

The debug session below is debugging the none command, which uses the mini.tm grammar, and is invoked on the example.ll input file.

Specifically the following commands may be used to reproduce the debug session:

$ go get -u github.com/mewspring/foo/none/cmd/none
$ go get -u github.com/derekparker/delve/cmd/dlv
$ go get -u github.com/aarzilli/gdlv
$ cd $GOPATH/src/github.com/mewspring/foo/none/cmd/none
$ $GOPATH/bin/gdlv debug ../../example.ll

Cheers,
Robin

Debug session

Is it correct that next should be overwritten here?
Then we will skip /* empty */ nonterminals that are in between other nonterminals.

2018-10-27-010734_1920x1175_scrot

Should it really be o >= endoffset? And not o > endoffset? Now, we decrease the end, so that InstMetadata will not be part of the call instruction.

2018-10-27-011328_1920x1175_scrot

OperandBundles and InstMetadata were not assigned to have CallInst as parent. They should have been. Note, end was too small. Most likely as a cause of using if o >= offset { end-- } instead of if o > offset { end-- }

2018-10-27-011841_1920x1175_scrot

This is probably wrong, as now OperandBundles and InstMetadata won't have CallInst as parent.

2018-10-27-013545_1920x1175_scrot

This is where it goes wrong. Really wrong. As the OperandBundles and InstMetadata of the previous CallInst are added to the current CallInst.

The only reason this happens is since OperandBundles and InstMetadata are zero in length, thus offset if not enough to distinguish which parent nonterminal they belong to.

2018-10-27-013740_1920x1175_scrot

This is the incorrect firstChild. It points to InstMetadata, but should really point to

2018-10-27-013942_1920x1175_scrot

Incorrect parents have now been set for InstMetadata and OperandBundles.

2018-10-27-014217_1920x1175_scrot

No parent was set for OperandBundles and InstMetadata. They should have both been assigned the last CallInst as parent (i.e. index 18).

2018-10-27-014315_1920x1175_scrot

Since InstMetadata (at index 10) is now the firstChild of the CallInst at index 18, and since the InstMetadata does not have a next (i.e. next is 0), the CallInst will not have any other children than the incorrectly added InstMetadata child. Therefore, calling Typ on CallInst will result in NONE type.

2018-10-27-014532_1920x1175_scrot

InstMetadata (at index 17) of previous instruction being added to have RetTerm as parent. The correct InstMetadata to add to RetTerm is at index 20.

2018-10-27-014915_1920x1175_scrot

FuncMetadata incorrectly added to have FuncBody as parent.

2018-10-27-015614_1920x1175_scrot

InstMetadata incorrectly added to have FuncBody as parent (should have been part of TermRet).

2018-10-27-015730_1920x1175_scrot

Panic with unknown node type NONE, since CallInst has InstMetadata as first child.

2018-10-27-020137_1920x1175_scrot

~ character in the package name breaks generated code

I've noticed strange behavior with the code generator for my grammar.

I have a "~" char in the package name I define inside my "tm" file and as result code produced by textmapper becomes invalid.

Example:
Package name inside tm file:

package = "git.sr.ht/~rn/lang"

Generated lexer.go (the problem exist in other files too):

import (
	"git.sr.ht/~rn/lang"
	"strings"
	"unicode/utf8"
)

// generated by Textmapper; DO NOT EDIT

package git.sr.ht/~rn/lang

Textmapper generate valid code once I remove "~" char from package name.

Idiomatic way to distinguish presence of optional nonterminal in AST?

I'm curious about whether the intention is for optional AST nodes to be distinguishable as nil pointers or if they should be determined by invoking the Text method and checking whether the text is empty.

For a concrete example, see how the fob command handles an input where CallingConv is present (input.txt) vs. absent (no_input.txt).

From what I can tell, a pointer type is used for optional nonterminals; i.e. *ast.CallingConv is used for CallingConvopt. However, when CallingConv is not present in the input text, I was expecting the pointer to be nil, however that's not the case.

In particular, for no_input.txt, neither cc, nor cc.Node is nil even when the input does not contain a calling convention:

/*
path: no_input.txt
type:  *ast.CallingConv
value: &{{<nil> 0}}
cc: &ast.CallingConv{}
cc.Node == nil: false
cc.Node: main.node{}
text:  ""
*/

The grammar is defined as follows (fob.tm):

language fob(go);

lang = "fob"
package = "github.com/mewspring/foo/fob"
eventBased = true
eventFields = true

:: lexer

'x86_fastcallcc' : /x86_fastcallcc/
'x86_stdcallcc' : /x86_stdcallcc/
'x86_thiscallcc' : /x86_thiscallcc/

:: parser

input : FuncHeader ;

FuncHeader -> FuncHeader
	: CallingConvopt
;

CallingConv -> CallingConv
	: 'x86_fastcallcc'
	| 'x86_stdcallcc'
	| 'x86_thiscallcc'
;

lexer: label restart defined and not used

I get the compile time error lexer.go:64:1: label restart defined and not used when trying to compile the Go generated lexer for the following grammar (foo.tm):

Note: this is on rev 78fc54e of Textmapper, using the following command to process the grammar @java -jar ${TM_DIR}/tm-tool/libs/textmapper-0.9.21.jar foo.tm

language foo(go);

lang = "foo"
package = "github.com/mewspring/foo"

::lexer

'foobar' : /foobar/

::parser

input : 'foobar' ;

IAE in textmapper IDEA plugin if file has more then one crlf in end of file

On windows platform, textmapper v. 0.9.2, IDEA v. 12.1.4:

from console if prop.tm has two crlf in end of file.
textmapper.sh prop.tm
Welcome to Git (version 1.8.1.2-preview20130201)
Run 'git help git' to display the help index.
Run 'git help ' to display help for specific commands.
prop.tm,1: syntax error before line 1

in idea plugin - IAE before first line:
lapg: internal error: java.lang.IllegalArgumentException

if i change end symbols in idea from crlf to cr or lf, or remove crlfs from end of file.

Unable to parse grammar with unordered sequences using ampersand (โ€˜&โ€™) operator

I tried to use the ampersand operator to handle unordered sequences, but when I try to generate the parser, Textmapper reports a syntax error. Is unordered sequences still supported by Textmapper?

$ java -jar ${TM_DIR}/tm-tool/libs/textmapper-0.9.21.jar a.tm 
a.tm,17: syntax error before line 17

For the following grammar:

language foobar(go);

lang = "foobar"
package = "github.com/mewmew/foobar"

::lexer

'foo' : /foo/
'bar' : /bar/
'baz' : /baz/

::parser

input : ABC ;

# accepts all permutations of A, B and C as well as an empty string
ABC : (A & B & C)? ;

A : 'foo' ;
B : 'bar' ;
C : 'baz' ;

Idiomatic way of distinguising between two alternatives in the grammar

I would wish to use the issue tracker for reporting issues, however this is more of a question.

Given the grammar:

StructType -> StructType
	: '{' Fields=(Type separator ',')+? '}'
	| '<' '{' Fields=(Type separator ',')+? '}' '>'
;

What would be an idiomatic way of differentiating the two alternatives? The AST Node of the first alternative should essential have a boolean member Packed set to false, while the second should have Packed set to true.

I tried using something like:

StructType<flag Packed = false> -> StructType
	: '{' Fields=(Type separator ',')+? '}'
	| [Packed] '<' '{' Fields=(Type separator ',')+? '}' '>'
;

It compiled just fine, but seems to be used for something else. At least, the AST package did not contain any reference of Packed.

Edit: The only method I can think of is to use something like this, which works but looks quite ugly:

StructType -> StructType
	: '{' Fields=(Type separator ',')+? '}'
	| Packed '{' Fields=(Type separator ',')+? '}' '>'
;

Packed -> Packed
	: '<'
;

Edit2: This also comes up in Params, where I would like to have a boolean Variadic field indicating whether '...' is present in the first alternative and whether ',' '...' is present in the second alternative of:

Params -> Params
	: '...'?
	| Params=(Param separator ',')+ (',' '...')?
;

ast: wrong type for Type method; have Type() Type, want Type() baz.NodeType

I get the compile time errors listed below when trying to compile the Go generated ast for the following grammar (baz.tm):

Textmapper did not report any error for the grammar, which is why I was surprised to see an error reported from the Go compiler.

Note: this is on rev cbc923c of Textmapper, using the following command to process the grammar @java -jar ${TM_DIR}/tm-tool/libs/textmapper-0.9.21.jar baz.tm

$ textmapper
lalr: 0.018s, text: 0.088s, parser: 29 states, 0KB

$ go install ./...
# github.com/mewspring/foo/baz/ast
ast/ast.go:122:41: invalid type for composite literal: TopLevelEntity
ast/factory.go:16:21: cannot use &ArrayType literal (type *ArrayType) as type BazNode in return argument:
	*ArrayType does not implement BazNode (wrong type for Type method)
		have Type() Type
		want Type() baz.NodeType
ast/factory.go:28:19: cannot use &TypeDef literal (type *TypeDef) as type BazNode in return argument:
	*TypeDef does not implement BazNode (wrong type for Type method)
		have Type() Type
		want Type() baz.NodeType

report line and column of syntax error when parsing

To help troubleshoot the cause of a syntax errors when parsing, it would be really helpful if the parser generated by textmapper reported both line and column of syntax errors when parsing.

Currently only the line number is reported, and this creates an additional step for users of the parser to figure out the exact cause of the parse error. One recent such example is llir/llvm#105 (comment), the extract of which is included here for completeness.

By @dannypsnl:

By the way, would you like to use official IR Parser by submodule LLVM-IR-parser and mapping ast only? textmapper didn't provide a reasonable result for parsing error. I didn't know which step it stuck.

All I got was: syntax error at line 1

Here is the test code:

package asm

import (
	"testing"

	"github.com/llir/ll/ast"
)

func TestParse(t *testing.T) {
	testCode := `!7 = !DIExpression(DW_OP_LLVM_convert, 16, DW_ATE_unsigned, DW_OP_LLVM_convert, 32, DW_ATE_signed)`
	_, err := ast.Parse("", testCode)
	if err != nil {
		t.Error(err)
	}
}

To get a better parse error, we ended up breaking the line into multiple lines, resulting in syntax error at line 5, which was more helpful. The same could be achieved with a line:column pair when reporting syntax errors.

ref: llir/llvm#105 (comment)

!7
=
!DIExpression(DW_OP_LLVM_convert,
16,
DW_ATE_unsigned,
DW_OP_LLVM_convert,
32,
DW_ATE_signed
)

Note, this might be me failing to use functionality already included in Textmapper. I simply use the generated parser and don't try to do anything fancy handling errors. So perhaps this information is already available?

Cheers,
Robin

Lookahead tokens don't understand optional prefixes

%lookahead flag x;

prog: optionallyPrefixed<+x>;
optionallyPrefixed: prefix? usesLookahead ;
usesLookahead: [x] y | z;

This reported that x was unused in optionallyPrefixed, until I changed that production to

optionallyPrefixed: usesLookahead | prefix usesLookahead ;

proposal: use boolean in multiple return value to distinguish optional fields on AST nodes

The current generation of the AST inserts /*opt*/ comments for optional fields on AST nodes. Besides the comments, no checks will help ensure that the AST node is not used incorrectly, except for always checking IsValid, which is not required for mandatory fields.

The suggestion is to change the method of optional fields from:

func (n AShrExpr) Exact() /*opt*/ Exact {
	return Exact{n.Child(selector.Exact)}
}

to:

func (n AShrExpr) Exact() (Exact, bool) {
	field := n.Child(selector.Exact)
	return Exact{field}, field.IsValid()
}

This would help distinguish optional AST node fields from mandatory fields in the package documentation, and also produce compile time errors if an optional field is used in the same way as a mandatory field.

Currently, to use an optional field, users would do something along the lines of:

if exact := n.Exact(); exact.IsValid() {
	// use exact field.
}

After the proposed change, uses would instead write something along the lines of:

if exact, ok := n.Exact(); ok {
	// use exact field.
}

The change may seem rather minimal, but it does help a lot to make sure that optional fields are never used without first checking for validity.

Proposal: return value types in To*Node() to reduce allocations

For sake of exposition I'm going to write in terms of the LLVM parser we're working on in llir/ll:

Please consider the ToLlvmNode function:

func ToLlvmNode(n *Node) LlvmNode {
	switch n.Type() {
	case ll.AShrExpr:
		return &AShrExpr{n}
	case ll.AShrInst:
		return &AShrInst{n}
	case ll.AddExpr:
[... many more elided ...]

At the moment, for parsing a large input with LLVM, this is one of the larger sources of allocations. (>30% of allocations from the parser) - see llir/llvm#55 for code to reproduce the measurement for yourself.

I note that the allocation can be completely eliminated by instead returning values of the node types (e.g, return AShrExpr{n}). This seems like a reasonable thing to do because all the node types do is embed a pointer to a node, and the value of the pointer doesn't ever need to be modified later, so it is fine to pass the pointer around by value. Because the node types structs only contain this pointer, they actually fit into the second value of the interface data, and therefore the returned value doesn't require an allocation either.

I did a brief experiment on the LLVM codebase a while back and observed about a 10% wall clock speedup and a significant reduction in allocations - giving (IIRC) a significant reduction in CPU and GC cost. The number of modifications I had to make to the LLIR parser was minimal.

I've run out of time for this moment. I intend to supply more (and more specific) details later, and perhaps a patch, if I get a spare moment. But for now I want to put this proposal out there and see what the response is.

For what it's worth I have some other tricks up my sleeve to reduce allocations, though they may be a bit too much of micro-optimization for some tastes. I'd be interested if you'd like to hear of these, but for now let's just consider this one which seems like the lowest hanging fruit.

Question: status of Go version of Textmapper?

Hi Evgeny,

I just came across Textmapper, and having read the Language Reference and the motivation behind the project, it seems to be exactly what I was looking for. Essentially an LR version of ANTLR for Go. I can tell that you have a lot of experience in this domain, as the architecture is well thought out. I still have to dive deep and examine the minute details of the implementation, but my initial reaction of Textmapper is very positive!

Now, of course, I'd like to take tm out for a spin! However, looking at the implementation of tm-go/cmd/textmapper/generate.go, I noticed a TODO in the generate function.

I noticed that you recently ported the Tarjan's algorithm for detecting strongly connected component (in rev 78fc54e). My question is, how far is the Go version of Textmapper from being ready for use?

I'd love to try it out!

Cheerful regards,
Robin

bug: parser and Parser return parameters in different order

A simple example:

...
%input Expr;
Expr {int} -> Expr :
    num '+' num {
        $$ = operAdd($0, $2)
        println($$)
    }
;

It will generate:

...
func (p *Parser) Parse(lexer *Lexer) (error, int) {
	err, v := p.parse(0, 5, lexer)
	val, _ := v.(int)                      // Error occurred!
	return err, val
}

func (p *Parser) parse(start, end int8, lexer *Lexer) (interface{}, error) {
...

Go version of textmapper: 'noUnwind' and 'nounwind' get the same ID in generated code (potential workaround?)

Hej Evgeny,

Very excited to see that the Go version of textmapper has reached feature parity with the Java version (#6). Congrats on the persistence and perseverance. You've done an amazing job.

Of course, with the official Go release of textmapper, I wanted to use it to generate the grammar and parser for the llir project. The Java version was capable of generating the lexer and parser from the LLVM IR grammar (ll.tm), however, given the explicit restriction of the Go version of textmapper to avoid allowing tokens with different casing, we were unable to generate the lexer and parser from the ll.tm grammar using the Go version of textmapper.

More details below:

re: #6 (comment)

I tried to use the latest release of textmapper Go version to generate a lexer and parser for the LLVM IR grammar.

Unfortunately, the official LLVM IR grammar contains cases where the only difference between two tokens is lower case vs. camel case. It contains the nounwind function attribute and the noUnwind function flag.

And, as mentioned in the release notes of textmapper:

similar names in the grammar (capitalization, camel vs snake case, etc.) cause a grammar compilation error to avoid confusion and actual compilation errors down the road

This causes the following error:

$ textmapper generate ll.tm
lalr: 95ms (ll.tm)
ll.tm:346:1: 'noUnwind' and 'nounwind' get the same ID in generated code

I was wondering if there is a potential solution to this issue? E.g. introducing two distinct NonTerminal names in the ll.tm grammar to differentiate the two tokens?

As I'm not controlling the official grammar of LLVM IR, I cannot "fix" the original grammar, but instead have to make it work in textmapper.

Cheers,
Robin

Report error (or warning) when grammar contains two rules with the same name

I noticed today that the grammar I was writing contained a typo where two different rules had the same name.

From mewspring/l-tm@2734890

Prior to this commit AlignStack was defined both as

AlignStack -> AlignStack
: 'alignstack'
;

and

AlignStack -> AlignStack
: 'alignstack' '=' N=UintLit
;

However, no warning was emitted by Textmapper. This seems
like a bug?

What surprised me was that Textmapper did not report any error or warning for this grammar, so it may have gone unnoticed for much longer.

I would like to suggest that Textmapper reports an error or outputs a warning when a grammar contains two rules of the same name. It is quite possible that a valid grammar may contain two rules with the same name for some of the more advanced use cases of Textmapper. Of this, I'm not yet aware. However, in the context of a simple grammar, a warning would be helpful.

Cheers,
Robin

Collision between terminal and AST node identifier in token.go and listener.go

Consider the following example collision.tm.

The terminal 'gc' and the ast Node GC collide as they share the Go identifier GC in both token.go and listener.go.

https://github.com/mewspring/foo/blob/1d1fc88d005370365c8cec836e1c01344e6c15be/collision/listener.go#L15

https://github.com/mewspring/foo/blob/1d1fc88d005370365c8cec836e1c01344e6c15be/collision/token.go#L18

There are two solutions to this problem as a user of Textmapper, either rename the token, or rename the AST node. Both of them work well, but it would be preferable if the work-around was not needed, and the original collision would be resolved instead (perhaps by adding a prefix to token names, e.g. TMTokenGC).

Rename token:

u@x1 ~/D/g/s/g/m/f/collision> git diff collision.tm 
diff --git a/collision/collision.tm b/collision/collision.tm
index 2fcc70f..a15736b 100644
--- a/collision/collision.tm
+++ b/collision/collision.tm
@@ -7,7 +7,7 @@ eventFields = true
 
 :: lexer
 
-'gc' : /gc/
+foobar : /gc/
 
 string_lit : /["][^"]*["]/
 
@@ -29,6 +29,6 @@ FuncHeader -> FuncHeader
 #       previous declaration at ./listener.go:15:2
 #    make: *** [Makefile:9: gen] Error 2
 
 GC -> GC
-       : 'gc' string_lit
+GC -> GC
+       : foobar string_lit
 ;

Rename AST node:

u@x1 ~/D/g/s/g/m/f/collision> git diff collision.tm 
diff --git a/collision/collision.tm b/collision/collision.tm
index 2fcc70f..f5fb601 100644
--- a/collision/collision.tm
+++ b/collision/collision.tm
@@ -29,6 +29,6 @@ FuncHeader -> FuncHeader
 #       previous declaration at ./listener.go:15:2
 #    make: *** [Makefile:9: gen] Error 2
 
-GC -> GC
+GC -> GCNode
        : 'gc' string_lit
 ;

Grammar:

language collision(go);

lang = "collision"
package = "github.com/mewspring/foo/collision"
eventBased = true
eventFields = true

:: lexer

'gc' : /gc/

string_lit : /["][^"]*["]/
:: parser
input : FuncHeader ;
FuncHeader -> FuncHeader
	: GCopt
;
# TODO: Rename GCNode to GC when collision with token 'gc' has been resolved.
#
# GC is defined as an identifier in both listener.go and token.go.
#
#    lalr: 0.014s, text: 0.084s, parser: 8 states, 0KB
#    # github.com/mewspring/foo/collision
#    ./token.go:18:2: GC redeclared in this block
#       previous declaration at ./listener.go:15:2
#    make: *** [Makefile:9: gen] Error 2
GC -> GC
	: 'gc' string_lit
;

tm-go: avoid collision of token names for character literals; error: '|' and 'or' get the same ID in generated code

I tried running the latest version of Textmapper today (the Go version at rev 9261aa2).

Doing so, I received the following error for the grammar ll.tm:

u@x1 ~/D/g/s/g/l/ll2> ~/goget/bin/textmapper generate ll.tm 
ll.tm:576:1: '|' and 'or' get the same ID in generated code

This is because the LLVM IR grammar contains both | and or as two distinct keywords/tokens.

Edit: My suggestion would be to use token.Pipe for |, however, I also realize that a more general solution may be required.

Edit2: @inspirer, any idea what the right approach would be to resolve this?

I update to Textmapper rev cd73b6b today and the issue is still present.

$ textmapper generate ll.tm
ll.tm:582:1: '|' and 'or' get the same ID in generated code

Would it make sense to add a prefix to these generated token IDs to avoid collision? E.g. | -> Character_Or (or Character_Pipe)?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.