inspirer / textmapper Goto Github PK
View Code? Open in Web Editor NEWLexer and Parser generator
Home Page: http://textmapper.org
License: MIT License
Lexer and Parser generator
Home Page: http://textmapper.org
License: MIT License
Trying to convert a working PostgreSQL-16
grammar to texmapper
I found several issues:
PRIVILEGES : /[pP][rR][iI][vV][iI][lL][eE][gG][eE][sS]/
prevent to have a rule like privileges : privilege_list ...
with this error message postgresql-16.tm,2441: redeclaration of terminal: privileges
The original grammar has no unresolved shift/reduce
of reduce/reduce
conflicts but textmapper
report:
postgresql-16.tm,4198: input: MODE_TYPE_NAME x_ucharacter
reduce/reduce conflict (next: eoi, IDENT, ABORT_P, ABSENT, ABSOLUTE_P, ACCESS, ACTION, ADD_P, ADMIN, AFTER, AGGREGATE, ALL, ALSO, ALTER, ALWAYS, ANALYSE, ANALYZE, AND, ANY, ARRAY, AS, ASC, ASENSITIVE, ASSERTION, ASSIGNMENT, ASYMMETRIC, AT, ATOMIC, ATTACH, ATTRIBUTE, AUTHORIZATION, BACKWARD, BEFORE, BEGIN_P, BETWEEN, BIGINT, BINARY, BIT, BOOLEAN_P, BOTH, BREADTH, BY, CACHE, CALL, CALLED, CASCADE, CASCADED, CASE, CAST, CATALOG_P, CHAIN, CHARACTERISTICS, CHECK, CHECKPOINT, CLASS, CLOSE, CLUSTER, COALESCE, COLLATE, COLLATION, COLUMN, COLUMNS, COMMENT, COMMENTS, COMMIT, COMMITTED, COMPRESSION, CONCURRENTLY, CONFIGURATION, CONFLICT, CONNECTION, CONSTRAINT, CONSTRAINTS, CONTENT_P, CONTINUE_P, CONVERSION_P, COPY, COST, CREATE, CROSS, CSV, CUBE, CURRENT_P, CURRENT_CATALOG, CURRENT_DATE, CURRENT_ROLE, CURRENT_SCHEMA, CURRENT_TIME, CURRENT_TIMESTAMP, CURRENT_USER, CURSOR, CYCLE, DATA_P, DATABASE, DEALLOCATE, DEC, DECIMAL_P, DECLARE, DEFAULT, DEFAULTS, DEFERRABLE, DEFERRED, DEFINER, DELETE_P, DELIMITER, DELIMITERS, DEPENDS, DEPTH, DESC, DETACH, DICTIONARY, DISABLE_P, DISCARD, DISTINCT, DO, DOCUMENT_P, DOMAIN_P, DOUBLE_P, DROP, EACH, ELSE, ENABLE_P, ENCODING, ENCRYPTED, END_P, ENUM_P, ESCAPE, EVENT, EXCEPT, EXCLUDE, EXCLUDING, EXCLUSIVE, EXECUTE, EXISTS, EXPLAIN, EXPRESSION, EXTENSION, EXTERNAL, EXTRACT, FALSE_P, FAMILY, FETCH, FINALIZE, FIRST_P, FLOAT_P, FOLLOWING, FOR, FORCE, FOREIGN, FORMAT, FORWARD, FREEZE, FROM, FULL, FUNCTION, FUNCTIONS, GENERATED, GLOBAL, GRANT, GRANTED, GREATEST, GROUP_P, GROUPING, GROUPS, HANDLER, HAVING, HEADER_P, HOLD, IDENTITY_P, IF_P, ILIKE, IMMEDIATE, IMMUTABLE, IMPLICIT_P, IMPORT_P, IN_P, INCLUDE, INCLUDING, INCREMENT, INDENT, INDEX, INDEXES, INHERIT, INHERITS, INITIALLY, INLINE_P, INNER_P, INOUT, INPUT_P, INSENSITIVE, INSERT, INSTEAD, INT_P, INTEGER, INTERSECT, INTERVAL, INTO, INVOKER, IS, ISNULL, ISOLATION, JOIN, JSON, JSON_ARRAY, JSON_ARRAYAGG, JSON_OBJECT, JSON_OBJECTAGG, KEY, KEYS, LABEL, LANGUAGE, LARGE_P, LAST_P, LATERAL_P, LEADING, LEAKPROOF, LEAST, LEFT, LEVEL, LIKE, LIMIT, LISTEN, LOAD, LOCAL, LOCALTIME, LOCALTIMESTAMP, LOCATION, LOCK_P, LOCKED, LOGGED, MAPPING, MATCH, MATCHED, MATERIALIZED, MAXVALUE, MERGE, METHOD, MINVALUE, MODE, MOVE, NAME_P, NAMES, NATIONAL, NATURAL, NCHAR, NEW, NEXT, NFC, NFD, NFKC, NFKD, NO, NONE, NORMALIZE, NORMALIZED, NOT, NOTHING, NOTIFY, NOTNULL, NOWAIT, NULL_P, NULLIF, NULLS_P, NUMERIC, OBJECT_P, OF, OFF, OFFSET, OIDS, OLD, ON, ONLY, OPERATOR, OPTION, OPTIONS, OR, ORDER, ORDINALITY, OTHERS, OUT_P, OUTER_P, OVERLAY, OVERRIDING, OWNED, OWNER, PARALLEL, PARAMETER, PARSER, PARTIAL, PARTITION, PASSING, PASSWORD, PLACING, PLANS, POLICY, POSITION, PRECEDING, PREPARE, PREPARED, PRESERVE, PRIMARY, PRIOR, PRIVILEGES, PROCEDURAL, PROCEDURE, PROCEDURES, PROGRAM, PUBLICATION, QUOTE, RANGE, READ, REAL, REASSIGN, RECHECK, RECURSIVE, REF_P, REFERENCES, REFERENCING, REFRESH, REINDEX, RELATIVE_P, RELEASE, RENAME, REPEATABLE, REPLACE, REPLICA, RESET, RESTART, RESTRICT, RETURN, RETURNING, RETURNS, REVOKE, RIGHT, ROLE, ROLLBACK, ROLLUP, ROUTINE, ROUTINES, ROW, ROWS, RULE, SAVEPOINT, SCALAR, SCHEMA, SCHEMAS, SCROLL, SEARCH, SECURITY, SELECT, SEQUENCE, SEQUENCES, SERIALIZABLE, SERVER, SESSION, SESSION_USER, SET, SETOF, SETS, SHARE, SHOW, SIMILAR, SIMPLE, SKIP, SMALLINT, SNAPSHOT, SOME, SQL_P, STABLE, STANDALONE_P, START, STATEMENT, STATISTICS, STDIN, STDOUT, STORAGE, STORED, STRICT_P, STRIP_P, SUBSCRIPTION, SUBSTRING, SUPPORT, SYMMETRIC, SYSID, SYSTEM_P, SYSTEM_USER, TABLE, TABLES, TABLESAMPLE, TABLESPACE, TEMP, TEMPLATE, TEMPORARY, TEXT_P, THEN, TIES, TIME, TIMESTAMP, TRAILING, TRANSACTION, TRANSFORM, TREAT, TRIGGER, TRIM, TRUE_P, TRUNCATE, TRUSTED, TYPE_P, TYPES_P, UESCAPE, UNBOUNDED, UNCOMMITTED, UNENCRYPTED, UNION, UNIQUE, UNKNOWN, UNLISTEN, UNLOGGED, UNTIL, UPDATE, USER, USING, VACUUM, VALID, VALIDATE, VALIDATOR, VALUE_P, VALUES, VARCHAR, VARIADIC, VERBOSE, VERSION_P, VIEW, VIEWS, VOLATILE, WHEN, WHERE, WHITESPACE_P, WINDOW, WITH, WITHOUT, WORK, WRAPPER, WRITE, XML_P, XMLATTRIBUTES, XMLCONCAT, XMLELEMENT, XMLEXISTS, XMLFOREST, XMLNAMESPACES, XMLPARSE, XMLPI, XMLROOT, XMLSERIALIZE, XMLTABLE, YES_P, ZONE, LESS_EQUALS, GREATER_EQUALS, NOT_EQUALS, TYPECAST, FORMAT_LA, NULLS_LA, NOT_LA, Op, ';', '=', ')', ',', '*', '/', '+', '-', '%', '[', ']', '^', '<', '>', ':')
SimpleTypename : x_ucharacter
CharacterWithoutLength : x_ucharacter
postgresql-16.tm,4273: input: MODE_TYPE_NAME x_ucharacter
shift/reduce conflict (next: '(')
CharacterWithoutLength : x_ucharacter
postgresql-16.tm,4273: input: SELECT distinct_clause x_ucharacter
shift/reduce conflict (next: '(')
CharacterWithoutLength : x_ucharacter
postgresql-16.tm,4259: input: SELECT distinct_clause CharacterWithLength
reduce/reduce conflict (next: SCONST)
x_ucharacter : CharacterWithLength
ConstCharacter : CharacterWithLength
postgresql-16.tm,4260: input: SELECT distinct_clause CharacterWithoutLength
reduce/reduce conflict (next: SCONST)
x_ucharacter : CharacterWithoutLength
ConstCharacter : CharacterWithoutLength
conflicts: 2 shift/reduce and 483 reduce/reduce
lalr: 0.585s, text: 1.394s, parser: 6221 states, 3011KB
See attached the converted grammar:
postgresql-16.tm.zip
Also a working grammar can be seen here https://meimporta.eu/lalr-playground/
I've been unable to find a good way of representing the following grammar:
CatchSwitchTerm -> CatchSwitchTerm
: 'catchswitch' 'within' Scope=ExceptionScope '[' Handlers=(Label separator ',')+ ']' 'unwind' UnwindTarget=UnwindTarget Metadata=(',' MetadataAttachment)+?
;
%interface UnwindTarget;
UnwindTarget -> UnwindTarget
: 'to' 'caller' -> UnwindToCaller
| Label
;
As running the above grammar through Textmapper results in the following error:
$ make
ll.tm,2739: `Handlers` cannot be a list, since it precedes UnwindTarget
lalr: 0.221s, text: 0.521s, parser: 2170 states, 226KB
make: *** [Makefile:7: gen] Error 1
Any suggestions would be warmly welcome.
I get the compile time error parser.go:54:2: undefined: ignoredTokens
when trying to compile the Go generated parser for the following grammar (foo.tm):
Note: this is on rev 78fc54e of Textmapper, using the following command to process the grammar @java -jar ${TM_DIR}/tm-tool/libs/textmapper-0.9.21.jar foo.tm
language foo(go);
lang = "foo"
package = "github.com/mewspring/foo"
::lexer
'foobar' : /foobar/
::parser
input : 'foobar' ;
According to the docs, it's valid to use {eoi} in a pattern as long as it remain at the end. Yet, anything other than just /{eoi}/
results in a syntax error.
Reproducer:
<initial> ignored_empty_line: /[\f\t ]*({newline}|{eoi})/ (space)
In relation to #15, it would also be useful if Textmapper output warnings when a grammar contained unused rules, that is, rules that cannot be reached from the start state of the grammar.
Consider the follow grammar (qux.tm):
language qux(go);
lang = "qux"
package = "github.com/mewspring/foo/qux"
:: lexer
'foo' : /foo/
'bar' : /bar/
:: parser
input : Foo ;
Foo : 'foo' ;
Bar : 'bar' ;
Bar
is not reachable from input
, thus the non-terminal Bar
can be pruned from the parser table. This step could be performed automatically by Textmapper when generating the parser tables.
To check which states are reachable, a depth-first search from the start state of the grammar could be used as an initial implementation.
While pruning non-terminals, it would be very useful if this information was presented as warnings to the user. Both as they may then refine and simplify their grammar. Or perhaps just as likely, discover bugs where rules were not used, even though that was the intention of the user.
Similarly, tokens (e.g. bar
) may be pruned from the list of tokens, and the lexer table may be refined to exclude tokens that are not present in any rule that is reachable from the start state of the grammar.
Edit:
Current output of Textmapper for qux.tm
:
$ textmapper
lalr: 0.013s, text: 0.072s, parser: 5 states, 0KB
Suggested output of Textmapper when pruning non-terminals and terminals:
$ textmapper
warning: nonterminal Bar not reachable from input state, it has been pruned
warning: terminal bar not reachable from input state, it has been pruned
lalr: 0.012s, text: 0.08s, parser: 5 states, 0KB
Or something along those lines.
The following test case is failing on revision b962fbe.
$ go test -count=1 ./...
FAIL github.com/inspirer/textmapper/tm-parsers/tm [build failed]
# github.com/inspirer/textmapper/tm-parsers/tm_test [github.com/inspirer/textmapper/tm-parsers/tm.test]
tm-parsers/tm/lexer_test.go:146:3: undefined: tm.SOFT
From my understanding of http://textmapper.org/documentation.html#grammar-ambiguities, %shift
modifiers may be used to resolve shift/reduce ambiguities in grammars.
From the example, I tried to derive a minimal working grammar. However, when I tried to generate the parser using Textmapper, I got the following error:
$ textmapper shiftreduce.tm
shiftreduce.tm,34: input: 'if' '(' Expr ')' Stmt
shift/reduce conflict (next: 'else')
IfStmt : 'if' '(' Expr ')' Stmt
conflicts: 1 shift/reduce and 0 reduce/reduce
lalr: 0.019s, text: 0.105s, parser: 18 states, 0KB
IfStmt
: 'if' '(' Expr ')' Stmt %shift 'else'
| 'if' '(' Expr ')' Stmt 'else' Stmt
;
Is that the correct syntax for using %shift
modifiers, and if so, why is there a shift/reduce error being reported?
If I remove 'if' '(' Expr ')' Stmt %shift 'else'
from the grammar, Textmapper is able to successfully produce both lexers and parsers.
Edit: this is on version v0.9.22 of Textmapper.
u@x1 ~> textmapper --version
textmapper v0.9.22/java build 2018
Hello !
I just discovered your project and I'm also interested on developing a good parser tool and actually working on https://github.com/mingodad/lalr/tree/playground and I have several grammars (in different stages of completeness here https://meimporta.eu/lalr-playground/ , including a textmapper
grammar that can be tested online, select Textmapper parser
from the Examples
dropdown and then click on the Parse
button).
Let's talk about how we can cooperate to evolve our projects ?
Cheers !
While trying to generate the Go lexer via textmapper.jar
(master
@
08d03f4 compiled with openjdk version "13.0.1" 2019-10-15
), I got the following error:
$ ls
test.tm
$ java -jar textmapper.jar
jar:file:textmapper.jar!/org/textmapper/tool/templates/go_token.ltp,19: Evaluation of `syntax.symbols[i].isConstant()` failed for [common.Context]: (caused by java.lang.IllegalArgumentException): null
jar:file:textmapper.jar!/org/textmapper/tool/templates/go_token.ltp,28: Evaluation of `syntax.symbols[i].isConstant()` failed for [common.Context]: (caused by java.lang.IllegalArgumentException): null
lalr: 0.009s, text: 0.146s
The grammar that triggers the issue (reduced to a minimal reproducer) is:
language test(go);
lang = "test"
:: lexer
%s initial;
%x inComment;
invalid_token:
error:
whitespace: /[\n\r\t\f\v ]+/ (space)
<initial, inComment> EnterBlockComment: /\(\*/ (space)
<initial> invalid_token: /\*\)/
<inComment> {
invalid_token: /{eoi}/
ExitBlockComment: /\*\)/ (space)
BlockComment: /[^\(\)\*]+|[\*\(\)]/ (space)
}
The issue seems to be triggered by <inComment> invalid_token: /{eoi}/
: I'm assuming the rule is allowed since compiling the same grammar with tm-go/cmd/textmapper
works perfectly.
Hi,
the parse.go deal this the part of llvm ir file will throw error
unable to parse "/Users/admin/test/test.ll" into an AST: syntax error at line 1
define void @testcase_1dep.Foo.Bar({ { i8*, i64 }, { i8*, i64 } }* nocapture sret({ { i8*, i64 }, { i8*, i64 } }) %sret.formal.0, i8* nest nocapture readnone %nest.0, %Foo.0* readnone %f, i8* %command.chunk0, i64 %command.chunk1) #0 !dbg !6 { entry: %tmp.0 = alloca { { i8*, i64 }, { i8*, i64 } }, align 8 %tmpv.0 = alloca [2 x { i8*, i64 }], align 8 %tmpv.1 = alloca { { i8*, i64 }, { i8*, i64 } }, align 8 call void @llvm.dbg.value(metadata %Foo.0* %f, metadata !28, metadata !DIExpression()), !dbg !29 call void @llvm.dbg.value(metadata i8* %command.chunk0, metadata !30, metadata !DIExpression(DW_OP_LLVM_fragment, 0, 64)), !dbg !29 call void @llvm.dbg.value(metadata i64 %command.chunk1, metadata !30, metadata !DIExpression(DW_OP_LLVM_fragment, 64, 64)), !dbg !29 %cast.38 = bitcast [2 x { i8*, i64 }]* %tmpv.0 to i8*, !dbg !31 call void @llvm.memcpy.p0i8.p0i8.i64(i8* noundef nonnull align 8 dereferenceable(16) %cast.38, i8* noundef nonnull align 8 dereferenceable(16) bitcast ({ i8*, i64 }* @const.16 to i8*), i64 16, i1 false), !dbg !31 %command.addr.sroa.0.0.cast.39.sroa_idx = getelementptr inbounds [2 x { i8*, i64 }], [2 x { i8*, i64 }]* %tmpv.0, i64 0, i64 1, i32 0, !dbg !31 store i8* %command.chunk0, i8** %command.addr.sroa.0.0.cast.39.sroa_idx, align 8, !dbg !31 %command.addr.sroa.4.0.cast.39.sroa_idx4 = getelementptr inbounds [2 x { i8*, i64 }], [2 x { i8*, i64 }]* %tmpv.0, i64 0, i64 1, i32 1, !dbg !31 store i64 %command.chunk1, i64* %command.addr.sroa.4.0.cast.39.sroa_idx4, align 8, !dbg !31 %call.0 = call { i8*, i64 } @runtime.concatstrings(i8* nest undef, i8* null, i8* nonnull %cast.38, i64 2), !dbg !31 call void @llvm.dbg.value(metadata i8* undef, metadata !30, metadata !DIExpression(DW_OP_LLVM_fragment, 0, 64)), !dbg !29 call void @llvm.dbg.value(metadata i64 undef, metadata !30, metadata !DIExpression(DW_OP_LLVM_fragment, 64, 64)), !dbg !29 %icmp.0 = icmp eq %Foo.0* %f, null, !dbg !32 br i1 %icmp.0, label %then.0, label %else.0, !make.implicit !5
seem the parse cannot correct del define void @testcase_1dep.Foo.Bar({ { i8*, i64 }, { i8*, i64 } }* nocapture sret({ { i8*, i64 }, { i8*, i64 } }) %sret.formal.0, i8* nest nocapture readnone %nest.0, %Foo.0* readnone %f, i8* %command.chunk0, i64 %command.chunk1) #0 !dbg !6 {
debug value as follow
state: 614
action: -2
I updated to the latest build which fixes the errors for lexer and parser generation. However, I must confess that I don't know exactly what the right steps are to build Java projects.
Specifically, I would like to know what command is intended to use for building Textmapper, for testing Textmapper, and for re-generating the parsers of the Textmapper repo. I would also like to know where the build artifact of the Textmapper tool (i.e. textmapper-0.9.21.jar
or tool-0.9.21-SNAPSHOT-all.jar
) is supposed to be located.
I have installed Maven and Ant.
From the tm-tool
directory, this is what I get when building with mvn deploy
:
Note: this still produces the build artifact that I need for using Textmapper (tm-tool/tool/target/tool-0.9.21-SNAPSHOT-all.jar
), and all test cases passes it seems.
$ mvn deploy
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Textmapper Master 0.9.21-SNAPSHOT .................. SUCCESS [ 0.668 s]
[INFO] Textmapper templates ............................... SUCCESS [ 1.412 s]
[INFO] Lapg ............................................... SUCCESS [ 1.723 s]
[INFO] Textmapper ......................................... SUCCESS [ 2.977 s]
[INFO] Textmapper tool 0.9.21-SNAPSHOT .................... FAILURE [ 4.174 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 11.720 s
[INFO] Finished at: 2018-10-13T11:21:39+02:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.sonatype.plugins:nexus-staging-maven-plugin:1.6.4:deploy (injected-nexus-deploy) on project tool: Failed to deploy artifacts: Could not transfer artifact org.textmapper:lapg:jar:0.9.21-20181013.092137-1 from/to sonatype-nexus-snapshots (https://oss.sonatype.org/content/repositories/snapshots/): Failed to transfer file: https://oss.sonatype.org/content/repositories/snapshots/org/textmapper/lapg/0.9.21-SNAPSHOT/lapg-0.9.21-20181013.092137-1.jar. Return code is: 401, ReasonPhrase: Unauthorized. -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn <goals> -rf :tool
From the tm-tool
directory, this is what I get when building with ant deploy
:
Note: this still produces the build artifact that I need for using Textmapper (tm-tool/libs/textmapper-0.9.21.jar
). The build artifact is located in a different directory as compared to mvn
. What is the canonical directory for the Textmapper tool artifact? Also, this does not seem to run test cases?
u@x1 ~/g/s/g/i/t/tm-tool> ant deploy
Buildfile: /home/u/goget/src/github.com/inspirer/textmapper/tm-tool/build.xml
rev:
[echo] revision: 76054ba15463585cf1beb0acebb8fe63bcff1909
build:
[mkdir] Created dir: /home/u/goget/src/github.com/inspirer/textmapper/build/java.out/textmapper
[copy] Copying 31 files to /home/u/goget/src/github.com/inspirer/textmapper/build/java.out/textmapper
[javac] Compiling 446 source files to /home/u/goget/src/github.com/inspirer/textmapper/build/java.out/textmapper
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.
[jar] Building jar: /home/u/goget/src/github.com/inspirer/textmapper/build/textmapper.jar
source:
[mkdir] Created dir: /home/u/goget/src/github.com/inspirer/textmapper/build/java.src/textmapper
[copy] Copying 454 files to /home/u/goget/src/github.com/inspirer/textmapper/build/java.src/textmapper
[jar] Building jar: /home/u/goget/src/github.com/inspirer/textmapper/build/textmapper-src.jar
deploy:
[copy] Copying 1 file to /home/u/goget/src/github.com/inspirer/textmapper/tm-tool/libs
[copy] Copying 1 file to /home/u/goget/src/github.com/inspirer/textmapper/tm-tool/libs
[copy] Copying 1 file to /home/u/goget/src/github.com/inspirer/textmapper/tm-idea/org.textmapper.idea/lib
[copy] Copying 1 file to /home/u/goget/src/github.com/inspirer/textmapper/tm-idea/org.textmapper.idea/lib
[copy] Copying 1 file to /home/u/goget/src/github.com/inspirer/textmapper/tm-eclipse/plugins/org.textmapper
[copy] Copying 1 file to /home/u/goget/src/github.com/inspirer/textmapper/tm-eclipse/plugins/org.textmapper
BUILD SUCCESSFUL
Total time: 3 seconds
error:
update failed for AnAction with ID=ShowUmlDiagram: org/textmapper/tool/parser/TMLexer
stacktrace:
update failed for AnAction with ID=ShowUmlDiagram: org/textmapper/tool/parser/TMLexer
java.lang.NoClassDefFoundError: org/textmapper/tool/parser/TMLexer
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
at java.lang.ClassLoader.defineClass(ClassLoader.java:465)
at com.intellij.util.lang.UrlClassLoader._defineClass(UrlClassLoader.java:153)
at com.intellij.util.lang.UrlClassLoader.defineClass(UrlClassLoader.java:149)
at com.intellij.util.lang.UrlClassLoader._findClass(UrlClassLoader.java:125)
at com.intellij.ide.plugins.cl.PluginClassLoader.d(PluginClassLoader.java:102)
at com.intellij.ide.plugins.cl.PluginClassLoader.loadClass(PluginClassLoader.java:63)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at org.textmapper.idea.lang.syntax.lexer.LapgLexerAdapter.start(Unknown Source)
at com.intellij.lexer.Lexer.start(Lexer.java:45)
at com.intellij.lang.impl.PsiBuilderImpl.a(PsiBuilderImpl.java:204)
at com.intellij.lang.impl.PsiBuilderImpl.(PsiBuilderImpl.java:177)
at com.intellij.lang.impl.PsiBuilderImpl.(PsiBuilderImpl.java:151)
at com.intellij.lang.impl.PsiBuilderImpl.(PsiBuilderImpl.java:185)
at com.intellij.lang.impl.PsiBuilderFactoryImpl.createBuilder(PsiBuilderFactoryImpl.java:52)
at com.intellij.psi.tree.ILazyParseableElementType.doParseContents(ILazyParseableElementType.java:62)
at com.intellij.psi.tree.IFileElementType.parseContents(IFileElementType.java:43)
at com.intellij.psi.impl.source.tree.LazyParseableElement.e(LazyParseableElement.java:165)
at com.intellij.psi.impl.source.tree.LazyParseableElement.getFirstChildNode(LazyParseableElement.java:209)
at com.intellij.psi.impl.source.tree.CompositeElement.countChildren(CompositeElement.java:493)
at com.intellij.psi.impl.source.tree.CompositeElement.getChildrenAsPsiElements(CompositeElement.java:455)
at com.intellij.psi.impl.source.PsiFileImpl.getChildren(PsiFileImpl.java:741)
at com.intellij.uml.java.JavaUmlElementManager.findInDataContext(JavaUmlElementManager.java:84)
at com.intellij.uml.java.JavaUmlElementManager.findInDataContext(JavaUmlElementManager.java:47)
at com.intellij.diagram.DiagramProvider.findProvider(DiagramProvider.java:148)
at com.intellij.uml.core.actions.ShowDiagramBase.update(ShowDiagramBase.java:61)
at com.intellij.openapi.actionSystem.ex.ActionUtil.performDumbAwareUpdate(ActionUtil.java:111)
at com.intellij.openapi.actionSystem.impl.Utils.a(Utils.java:167)
at com.intellij.openapi.actionSystem.impl.Utils.updateGroupChild(Utils.java:226)
at com.intellij.openapi.actionSystem.impl.Utils.a(Utils.java:200)
at com.intellij.openapi.actionSystem.impl.Utils.expandActionGroup(Utils.java:136)
at com.intellij.openapi.actionSystem.impl.Utils.expandActionGroup(Utils.java:85)
at com.intellij.openapi.actionSystem.impl.Utils.fillMenu(Utils.java:241)
at com.intellij.openapi.actionSystem.impl.ActionPopupMenuImpl$MyMenu.show(ActionPopupMenuImpl.java:96)
at com.intellij.ide.ui.customization.CustomizationUtil$3.invokePopup(CustomizationUtil.java:284)
at com.intellij.ui.PopupHandler.mousePressed(PopupHandler.java:48)
at java.awt.AWTEventMulticaster.mousePressed(AWTEventMulticaster.java:263)
at java.awt.AWTEventMulticaster.mousePressed(AWTEventMulticaster.java:262)
at java.awt.Component.processMouseEvent(Component.java:6411)
at javax.swing.JComponent.processMouseEvent(JComponent.java:3275)
at com.intellij.ui.treeStructure.Tree.processMouseEvent(Tree.java:420)
at com.intellij.ide.dnd.aware.DnDAwareTree.processMouseEvent(DnDAwareTree.java:51)
at java.awt.Component.processEvent(Component.java:6179)
at java.awt.Container.processEvent(Container.java:2083)
at java.awt.Component.dispatchEventImpl(Component.java:4776)
at java.awt.Container.dispatchEventImpl(Container.java:2141)
at java.awt.Component.dispatchEvent(Component.java:4604)
at java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4619)
at java.awt.LightweightDispatcher.processMouseEvent(Container.java:4277)
at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4210)
at java.awt.Container.dispatchEventImpl(Container.java:2127)
at java.awt.Window.dispatchEventImpl(Window.java:2489)
at java.awt.Component.dispatchEvent(Component.java:4604)
at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:717)
at java.awt.EventQueue.access$400(EventQueue.java:82)
at java.awt.EventQueue$2.run(EventQueue.java:676)
at java.awt.EventQueue$2.run(EventQueue.java:674)
at java.security.AccessController.doPrivileged(Native Method)
at java.security.AccessControlContext$1.doIntersectionPrivilege(AccessControlContext.java:86)
at java.security.AccessControlContext$1.doIntersectionPrivilege(AccessControlContext.java:97)
at java.awt.EventQueue$3.run(EventQueue.java:690)
at java.awt.EventQueue$3.run(EventQueue.java:688)
at java.security.AccessController.doPrivileged(Native Method)
at java.security.AccessControlContext$1.doIntersectionPrivilege(AccessControlContext.java:86)
at java.awt.EventQueue.dispatchEvent(EventQueue.java:687)
at com.intellij.ide.IdeEventQueue.d(IdeEventQueue.java:700)
at com.intellij.ide.IdeEventQueue._dispatchEvent(IdeEventQueue.java:521)
at com.intellij.ide.IdeEventQueue.dispatchEvent(IdeEventQueue.java:348)
at java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:296)
at java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:211)
at java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:201)
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:196)
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:188)
at java.awt.EventDispatchThread.run(EventDispatchThread.java:122)
Caused by: java.lang.ClassNotFoundException: org.textmapper.tool.parser.TMLexer PluginClassLoader[org.textmapper.idea, 0.9.2]
at com.intellij.ide.plugins.cl.PluginClassLoader.loadClass(PluginClassLoader.java:77)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
Using the parser generated by the latest revision of Textmapper (c3f3e1b), I've run into a bug where the AST Indices
method of GetElementPtrInst
instructions return an empty slice, even when the input text contains indices.
Given the following input text inst_memory.ll:
Note: in the input text below , i64 0, i64 0
corresponds to the list of indices (where i64 0
is reduced as a TypeValue
).
@s = constant [4 x i8] c"foo\00"
define void @f() {
%1 = getelementptr [4 x i8], [4 x i8]* @s, i64 0, i64 0
ret void
}
And the relevant parts of the grammar ll.tm:
GetElementPtrInst -> GetElementPtrInst
: 'getelementptr' InBoundsopt ElemType=Type ',' Src=TypeValue Indices=(',' TypeValue)* Metadata=(',' MetadataAttachment)+?
;
To reproduce the above observation, run the following commands.
$ go get -u github.com/mewspring/foo/indices
$ cd $GOPATH/src/github.com/mewspring/foo/indices/cmd/indices
$ go run main.go inst_memory.ll
text: "getelementptr [4 x i8], [4 x i8]* @s, i64 0, i64 0"
elem type: "[4 x i8]"
src: "[4 x i8]* @s"
len(indices): 0
The expected output is len(indices): 2
. So, for some reason, the indices is not present in the AST.
Note, no changes have been made to the generated parser, which is located at https://github.com/llir/ll
The only hand-written code is that of main.go
For interface (i.e. set/union types), lookup tables are generated to be used by selector.go. These lookup tables are currently generated as slices.
Since the size of the lookup tables is fixed, it should be possible to use the [...]
notation to allocate arrays of the required size to keep all elements of the array literals.
From generated listener.go
-var Type = []NodeType{
+var Type = [...]NodeType{
ArrayType,
FloatType,
FuncType,
IntType,
LabelType,
MMXType,
MetadataType,
NamedType,
PackedStructType,
PointerType,
StructType,
TokenType,
VectorType,
VoidType,
}
Used in selector.go
:
Type = OneOf(ll.Type...)
Potentially this could remove a few bounds checks.
However, as I'm writing this, it seem more likely that these lookup tables are always used as slices (e.g. passed as an argument to OneOf
. Therefore, having them as slices, and not arrays makes more sense, as otherwise we would have to generate slice descriptors for each invocation to OneOf
.
Well, feel free to close this issue. I just wanted to write it down, so we can evaluate or discard the idea.
Cheers,
Robin
Lets consider a .tm file
language prop(java);
prefix = "AST"
package = "ru.aptu.xml"
gentree=true
genast=true
positions="offset, line"
endpositions="offset"
:: lexer
identifier(String): /[a-zA-Z_][a-zA-Z_0-9]*/ { $symbol = current(); }
openChar: /</
closeChar: />/
_skip: /[\n\t\r ]+/ (space)
:: parser
input ::= root=node;
node ::= openChar identifier closeChar;
This compiles successfully without any warning but leads to a semantically wrong program. In /ast/AstNode we will get
public class AstNode extends AstNode {
}
I get the compile time error listener.go:14:2: missing value in const declaration
when trying to compile the Go generated listener for the following grammar (bar.tm):
Note: this is on rev 78fc54e of Textmapper, using the following command to process the grammar @java -jar ${TM_DIR}/tm-tool/libs/textmapper-0.9.21.jar bar.tm
language bar(go);
lang = "bar"
package = "github.com/mewspring/foo/bar"
eventBased = true
::lexer
'foobar' : /foobar/
::parser
input : 'foobar' ;
I just synced with the latest revision of textmapper (4d45f61), and when running go test ./...
, I got the following error:
u@x220 ~/g/s/g/i/textmapper> go test ./...
? github.com/inspirer/textmapper/tm-go/cmd/textmapper [no test files]
--- FAIL: TestGenerate (0.40s)
--- FAIL: TestGenerate/../../tm-parsers/js/js.tm (0.31s)
gen_test.go:45: The on-disk content differs from the generated one.
--- ../../tm-parsers/js/lexer_tables.go
+++ lexer_tables.go (generated)
@@ -304,12 +304,18 @@
54, 54, 54, 54, 54, 54, 57,
}},
{3664, 3674, 59, nil},
- {3713, 3802, 59, []uint8{
- 54, 54, 1, 54, 1, 1, 54, 54, 1, 54, 1, 1, 54, 1, 1, 1, 1, 1, 1, 54, 54, 54,
- 54, 1, 54, 54, 54, 54, 54, 54, 54, 1, 54, 54, 54, 1, 54, 1, 54, 1, 1, 54,
...
Creating a dedicated issue to track the bug identified at #17 (comment)
Edit: For added context, the only hand-written files are parser.go and tree.go, both of which are mostly identical copies of the original parser.go and tree.go of the TextMapper project. Which is why this bug is interesting to fix as it pertains to TextMapper itself.
I'll add my latest debug session below, feel free to skim as it's quite long :)
The debug session below is debugging the none command, which uses the mini.tm grammar, and is invoked on the example.ll input file.
Specifically the following commands may be used to reproduce the debug session:
$ go get -u github.com/mewspring/foo/none/cmd/none
$ go get -u github.com/derekparker/delve/cmd/dlv
$ go get -u github.com/aarzilli/gdlv
$ cd $GOPATH/src/github.com/mewspring/foo/none/cmd/none
$ $GOPATH/bin/gdlv debug ../../example.ll
Cheers,
Robin
Is it correct that next should be overwritten here?
Then we will skip /* empty */
nonterminals that are in between other nonterminals.
Should it really be o >= endoffset
? And not o > endoffset
? Now, we decrease the end, so that InstMetadata will not be part of the call instruction.
OperandBundles and InstMetadata were not assigned to have CallInst as parent. They should have been. Note, end was too small. Most likely as a cause of using if o >= offset { end-- }
instead of if o > offset { end-- }
This is probably wrong, as now OperandBundles and InstMetadata won't have CallInst as parent.
This is where it goes wrong. Really wrong. As the OperandBundles and InstMetadata of the previous CallInst are added to the current CallInst.
The only reason this happens is since OperandBundles and InstMetadata are zero in length, thus offset if not enough to distinguish which parent nonterminal they belong to.
This is the incorrect firstChild. It points to InstMetadata, but should really point to
Incorrect parents have now been set for InstMetadata and OperandBundles.
No parent was set for OperandBundles and InstMetadata. They should have both been assigned the last CallInst as parent (i.e. index 18).
Since InstMetadata (at index 10) is now the firstChild of the CallInst at index 18, and since the InstMetadata does not have a next (i.e. next is 0), the CallInst will not have any other children than the incorrectly added InstMetadata child. Therefore, calling Typ on CallInst will result in NONE type.
InstMetadata (at index 17) of previous instruction being added to have RetTerm as parent. The correct InstMetadata to add to RetTerm is at index 20.
FuncMetadata incorrectly added to have FuncBody as parent.
InstMetadata incorrectly added to have FuncBody as parent (should have been part of TermRet).
Panic with unknown node type NONE
, since CallInst has InstMetadata as first child.
I've noticed strange behavior with the code generator for my grammar.
I have a "~" char in the package name I define inside my "tm" file and as result code produced by textmapper becomes invalid.
Example:
Package name inside tm file:
package = "git.sr.ht/~rn/lang"
Generated lexer.go (the problem exist in other files too):
import (
"git.sr.ht/~rn/lang"
"strings"
"unicode/utf8"
)
// generated by Textmapper; DO NOT EDIT
package git.sr.ht/~rn/lang
Textmapper generate valid code once I remove "~" char from package name.
I'm curious about whether the intention is for optional AST nodes to be distinguishable as nil pointers or if they should be determined by invoking the Text
method and checking whether the text is empty.
For a concrete example, see how the fob command handles an input where CallingConv
is present (input.txt) vs. absent (no_input.txt).
From what I can tell, a pointer type is used for optional nonterminals; i.e. *ast.CallingConv
is used for CallingConvopt
. However, when CallingConv is not present in the input text, I was expecting the pointer to be nil
, however that's not the case.
In particular, for no_input.txt
, neither cc
, nor cc.Node
is nil even when the input does not contain a calling convention:
/*
path: no_input.txt
type: *ast.CallingConv
value: &{{<nil> 0}}
cc: &ast.CallingConv{}
cc.Node == nil: false
cc.Node: main.node{}
text: ""
*/
The grammar is defined as follows (fob.tm):
language fob(go);
lang = "fob"
package = "github.com/mewspring/foo/fob"
eventBased = true
eventFields = true
:: lexer
'x86_fastcallcc' : /x86_fastcallcc/
'x86_stdcallcc' : /x86_stdcallcc/
'x86_thiscallcc' : /x86_thiscallcc/
:: parser
input : FuncHeader ;
FuncHeader -> FuncHeader
: CallingConvopt
;
CallingConv -> CallingConv
: 'x86_fastcallcc'
| 'x86_stdcallcc'
| 'x86_thiscallcc'
;
From https://github.com/inspirer/textmapper/wiki:
Lapg is a combined lexer and parser generator, which converts a description for a context-free LALR grammar into a source file to parse the grammar.
The readme seem to be more up-to-date. Perhaps the Wiki page could just be removed in order to avoid confusion?
I get the compile time error lexer.go:64:1: label restart defined and not used
when trying to compile the Go generated lexer for the following grammar (foo.tm):
Note: this is on rev 78fc54e of Textmapper, using the following command to process the grammar @java -jar ${TM_DIR}/tm-tool/libs/textmapper-0.9.21.jar foo.tm
language foo(go);
lang = "foo"
package = "github.com/mewspring/foo"
::lexer
'foobar' : /foobar/
::parser
input : 'foobar' ;
On windows platform, textmapper v. 0.9.2, IDEA v. 12.1.4:
from console if prop.tm has two crlf in end of file.
textmapper.sh prop.tm
Welcome to Git (version 1.8.1.2-preview20130201)
Run 'git help git' to display the help index.
Run 'git help ' to display help for specific commands.
prop.tm,1: syntax error before line 1
in idea plugin - IAE before first line:
lapg: internal error: java.lang.IllegalArgumentException
if i change end symbols in idea from crlf to cr or lf, or remove crlfs from end of file.
I tried to use the ampersand operator to handle unordered sequences, but when I try to generate the parser, Textmapper reports a syntax error. Is unordered sequences still supported by Textmapper?
$ java -jar ${TM_DIR}/tm-tool/libs/textmapper-0.9.21.jar a.tm
a.tm,17: syntax error before line 17
For the following grammar:
language foobar(go);
lang = "foobar"
package = "github.com/mewmew/foobar"
::lexer
'foo' : /foo/
'bar' : /bar/
'baz' : /baz/
::parser
input : ABC ;
# accepts all permutations of A, B and C as well as an empty string
ABC : (A & B & C)? ;
A : 'foo' ;
B : 'bar' ;
C : 'baz' ;
I would wish to use the issue tracker for reporting issues, however this is more of a question.
Given the grammar:
StructType -> StructType
: '{' Fields=(Type separator ',')+? '}'
| '<' '{' Fields=(Type separator ',')+? '}' '>'
;
What would be an idiomatic way of differentiating the two alternatives? The AST Node of the first alternative should essential have a boolean member Packed
set to false, while the second should have Packed
set to true.
I tried using something like:
StructType<flag Packed = false> -> StructType
: '{' Fields=(Type separator ',')+? '}'
| [Packed] '<' '{' Fields=(Type separator ',')+? '}' '>'
;
It compiled just fine, but seems to be used for something else. At least, the AST package did not contain any reference of Packed.
Edit: The only method I can think of is to use something like this, which works but looks quite ugly:
StructType -> StructType
: '{' Fields=(Type separator ',')+? '}'
| Packed '{' Fields=(Type separator ',')+? '}' '>'
;
Packed -> Packed
: '<'
;
Edit2: This also comes up in Params
, where I would like to have a boolean Variadic
field indicating whether '...'
is present in the first alternative and whether ',' '...'
is present in the second alternative of:
Params -> Params
: '...'?
| Params=(Param separator ',')+ (',' '...')?
;
I get the compile time errors listed below when trying to compile the Go generated ast for the following grammar (baz.tm):
Textmapper did not report any error for the grammar, which is why I was surprised to see an error reported from the Go compiler.
Note: this is on rev cbc923c of Textmapper, using the following command to process the grammar @java -jar ${TM_DIR}/tm-tool/libs/textmapper-0.9.21.jar baz.tm
$ textmapper
lalr: 0.018s, text: 0.088s, parser: 29 states, 0KB
$ go install ./...
# github.com/mewspring/foo/baz/ast
ast/ast.go:122:41: invalid type for composite literal: TopLevelEntity
ast/factory.go:16:21: cannot use &ArrayType literal (type *ArrayType) as type BazNode in return argument:
*ArrayType does not implement BazNode (wrong type for Type method)
have Type() Type
want Type() baz.NodeType
ast/factory.go:28:19: cannot use &TypeDef literal (type *TypeDef) as type BazNode in return argument:
*TypeDef does not implement BazNode (wrong type for Type method)
have Type() Type
want Type() baz.NodeType
To help troubleshoot the cause of a syntax errors when parsing, it would be really helpful if the parser generated by textmapper reported both line and column of syntax errors when parsing.
Currently only the line number is reported, and this creates an additional step for users of the parser to figure out the exact cause of the parse error. One recent such example is llir/llvm#105 (comment), the extract of which is included here for completeness.
By @dannypsnl:
By the way, would you like to use official IR Parser by submodule LLVM-IR-parser and mapping ast only? textmapper didn't provide a reasonable result for parsing error. I didn't know which step it stuck.
All I got was: syntax error at line 1
Here is the test code:
package asm import ( "testing" "github.com/llir/ll/ast" ) func TestParse(t *testing.T) { testCode := `!7 = !DIExpression(DW_OP_LLVM_convert, 16, DW_ATE_unsigned, DW_OP_LLVM_convert, 32, DW_ATE_signed)` _, err := ast.Parse("", testCode) if err != nil { t.Error(err) } }
To get a better parse error, we ended up breaking the line into multiple lines, resulting in syntax error at line 5
, which was more helpful. The same could be achieved with a line:column
pair when reporting syntax errors.
!7
=
!DIExpression(DW_OP_LLVM_convert,
16,
DW_ATE_unsigned,
DW_OP_LLVM_convert,
32,
DW_ATE_signed
)
Note, this might be me failing to use functionality already included in Textmapper. I simply use the generated parser and don't try to do anything fancy handling errors. So perhaps this information is already available?
Cheers,
Robin
%lookahead flag x;
prog: optionallyPrefixed<+x>;
optionallyPrefixed: prefix? usesLookahead ;
usesLookahead: [x] y | z;
This reported that x was unused in optionallyPrefixed, until I changed that production to
optionallyPrefixed: usesLookahead | prefix usesLookahead ;
The current generation of the AST inserts /*opt*/
comments for optional fields on AST nodes. Besides the comments, no checks will help ensure that the AST node is not used incorrectly, except for always checking IsValid
, which is not required for mandatory fields.
The suggestion is to change the method of optional fields from:
func (n AShrExpr) Exact() /*opt*/ Exact {
return Exact{n.Child(selector.Exact)}
}
to:
func (n AShrExpr) Exact() (Exact, bool) {
field := n.Child(selector.Exact)
return Exact{field}, field.IsValid()
}
This would help distinguish optional AST node fields from mandatory fields in the package documentation, and also produce compile time errors if an optional field is used in the same way as a mandatory field.
Currently, to use an optional field, users would do something along the lines of:
if exact := n.Exact(); exact.IsValid() {
// use exact field.
}
After the proposed change, uses would instead write something along the lines of:
if exact, ok := n.Exact(); ok {
// use exact field.
}
The change may seem rather minimal, but it does help a lot to make sure that optional fields are never used without first checking for validity.
For sake of exposition I'm going to write in terms of the LLVM parser we're working on in llir/ll
:
Please consider the ToLlvmNode
function:
func ToLlvmNode(n *Node) LlvmNode {
switch n.Type() {
case ll.AShrExpr:
return &AShrExpr{n}
case ll.AShrInst:
return &AShrInst{n}
case ll.AddExpr:
[... many more elided ...]
At the moment, for parsing a large input with LLVM, this is one of the larger sources of allocations. (>30% of allocations from the parser) - see llir/llvm#55 for code to reproduce the measurement for yourself.
I note that the allocation can be completely eliminated by instead returning values of the node types (e.g, return AShrExpr{n}
). This seems like a reasonable thing to do because all the node types do is embed a pointer to a node, and the value of the pointer doesn't ever need to be modified later, so it is fine to pass the pointer around by value. Because the node types structs only contain this pointer, they actually fit into the second value of the interface
data, and therefore the returned value doesn't require an allocation either.
I did a brief experiment on the LLVM codebase a while back and observed about a 10% wall clock speedup and a significant reduction in allocations - giving (IIRC) a significant reduction in CPU and GC cost. The number of modifications I had to make to the LLIR parser was minimal.
I've run out of time for this moment. I intend to supply more (and more specific) details later, and perhaps a patch, if I get a spare moment. But for now I want to put this proposal out there and see what the response is.
For what it's worth I have some other tricks up my sleeve to reduce allocations, though they may be a bit too much of micro-optimization for some tastes. I'd be interested if you'd like to hear of these, but for now let's just consider this one which seems like the lowest hanging fruit.
Hi Evgeny,
I just came across Textmapper, and having read the Language Reference and the motivation behind the project, it seems to be exactly what I was looking for. Essentially an LR version of ANTLR for Go. I can tell that you have a lot of experience in this domain, as the architecture is well thought out. I still have to dive deep and examine the minute details of the implementation, but my initial reaction of Textmapper is very positive!
Now, of course, I'd like to take tm out for a spin! However, looking at the implementation of tm-go/cmd/textmapper/generate.go, I noticed a TODO
in the generate
function.
I noticed that you recently ported the Tarjan's algorithm for detecting strongly connected component (in rev 78fc54e). My question is, how far is the Go version of Textmapper from being ready for use?
I'd love to try it out!
Cheerful regards,
Robin
A simple example:
...
%input Expr;
Expr {int} -> Expr :
num '+' num {
$$ = operAdd($0, $2)
println($$)
}
;
It will generate:
...
func (p *Parser) Parse(lexer *Lexer) (error, int) {
err, v := p.parse(0, 5, lexer)
val, _ := v.(int) // Error occurred!
return err, val
}
func (p *Parser) parse(start, end int8, lexer *Lexer) (interface{}, error) {
...
ref: http://textmapper.org/documentation.html#semantic-actions
-IfExpression {*IfNode} : 'if' '(' expr ')' stmt { $$ = &IfNode{$expr, $stmt) } ;
+IfExpression {*IfNode} : 'if' '(' expr ')' stmt { $$ = &IfNode{$expr, $stmt} } ;
Note the use of }
rather than )
.
Hej Evgeny,
Very excited to see that the Go version of textmapper has reached feature parity with the Java version (#6). Congrats on the persistence and perseverance. You've done an amazing job.
Of course, with the official Go release of textmapper, I wanted to use it to generate the grammar and parser for the llir
project. The Java version was capable of generating the lexer and parser from the LLVM IR grammar (ll.tm
), however, given the explicit restriction of the Go version of textmapper to avoid allowing tokens with different casing, we were unable to generate the lexer and parser from the ll.tm
grammar using the Go version of textmapper.
More details below:
re: #6 (comment)
I tried to use the latest release of textmapper Go version to generate a lexer and parser for the LLVM IR grammar.
Unfortunately, the official LLVM IR grammar contains cases where the only difference between two tokens is lower case vs. camel case. It contains the nounwind
function attribute and the noUnwind
function flag.
And, as mentioned in the release notes of textmapper:
similar names in the grammar (capitalization, camel vs snake case, etc.) cause a grammar compilation error to avoid confusion and actual compilation errors down the road
This causes the following error:
$ textmapper generate ll.tm
lalr: 95ms (ll.tm)
ll.tm:346:1: 'noUnwind' and 'nounwind' get the same ID in generated code
I was wondering if there is a potential solution to this issue? E.g. introducing two distinct NonTerminal names in the ll.tm
grammar to differentiate the two tokens?
As I'm not controlling the official grammar of LLVM IR, I cannot "fix" the original grammar, but instead have to make it work in textmapper.
Cheers,
Robin
I noticed today that the grammar I was writing contained a typo where two different rules had the same name.
Prior to this commit AlignStack was defined both as
AlignStack -> AlignStack
: 'alignstack'
;and
AlignStack -> AlignStack
: 'alignstack' '=' N=UintLit
;However, no warning was emitted by Textmapper. This seems
like a bug?
What surprised me was that Textmapper did not report any error or warning for this grammar, so it may have gone unnoticed for much longer.
I would like to suggest that Textmapper reports an error or outputs a warning when a grammar contains two rules of the same name. It is quite possible that a valid grammar may contain two rules with the same name for some of the more advanced use cases of Textmapper. Of this, I'm not yet aware. However, in the context of a simple grammar, a warning would be helpful.
Cheers,
Robin
Consider the following example collision.tm.
The terminal 'gc'
and the ast Node GC
collide as they share the Go identifier GC
in both token.go and listener.go.
There are two solutions to this problem as a user of Textmapper, either rename the token, or rename the AST node. Both of them work well, but it would be preferable if the work-around was not needed, and the original collision would be resolved instead (perhaps by adding a prefix to token names, e.g. TMTokenGC
).
Rename token:
u@x1 ~/D/g/s/g/m/f/collision> git diff collision.tm
diff --git a/collision/collision.tm b/collision/collision.tm
index 2fcc70f..a15736b 100644
--- a/collision/collision.tm
+++ b/collision/collision.tm
@@ -7,7 +7,7 @@ eventFields = true
:: lexer
-'gc' : /gc/
+foobar : /gc/
string_lit : /["][^"]*["]/
@@ -29,6 +29,6 @@ FuncHeader -> FuncHeader
# previous declaration at ./listener.go:15:2
# make: *** [Makefile:9: gen] Error 2
GC -> GC
- : 'gc' string_lit
+GC -> GC
+ : foobar string_lit
;
Rename AST node:
u@x1 ~/D/g/s/g/m/f/collision> git diff collision.tm
diff --git a/collision/collision.tm b/collision/collision.tm
index 2fcc70f..f5fb601 100644
--- a/collision/collision.tm
+++ b/collision/collision.tm
@@ -29,6 +29,6 @@ FuncHeader -> FuncHeader
# previous declaration at ./listener.go:15:2
# make: *** [Makefile:9: gen] Error 2
-GC -> GC
+GC -> GCNode
: 'gc' string_lit
;
Grammar:
language collision(go);
lang = "collision"
package = "github.com/mewspring/foo/collision"
eventBased = true
eventFields = true
:: lexer
'gc' : /gc/
string_lit : /["][^"]*["]/
:: parser
input : FuncHeader ;
FuncHeader -> FuncHeader
: GCopt
;
# TODO: Rename GCNode to GC when collision with token 'gc' has been resolved.
#
# GC is defined as an identifier in both listener.go and token.go.
#
# lalr: 0.014s, text: 0.084s, parser: 8 states, 0KB
# # github.com/mewspring/foo/collision
# ./token.go:18:2: GC redeclared in this block
# previous declaration at ./listener.go:15:2
# make: *** [Makefile:9: gen] Error 2
GC -> GC
: 'gc' string_lit
;
I tried running the latest version of Textmapper today (the Go version at rev 9261aa2).
Doing so, I received the following error for the grammar ll.tm:
u@x1 ~/D/g/s/g/l/ll2> ~/goget/bin/textmapper generate ll.tm
ll.tm:576:1: '|' and 'or' get the same ID in generated code
This is because the LLVM IR grammar contains both |
and or
as two distinct keywords/tokens.
Edit: My suggestion would be to use token.Pipe
for |
, however, I also realize that a more general solution may be required.
Edit2: @inspirer, any idea what the right approach would be to resolve this?
I update to Textmapper rev cd73b6b today and the issue is still present.
$ textmapper generate ll.tm
ll.tm:582:1: '|' and 'or' get the same ID in generated code
Would it make sense to add a prefix to these generated token IDs to avoid collision? E.g. |
-> Character_Or
(or Character_Pipe
)?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.