Running an Antlr parser in Python is slow.
This tool generates a Python extension that runs your parser using Antlr's C++ target, and then translates the parsed tree back into Python.
See the Speedy Antlr Tool Documentation for more details
Generate an accelerator extension that makes your Antlr parser in Python super-fast!
License: BSD 3-Clause "New" or "Revised" License
Running an Antlr parser in Python is slow.
This tool generates a Python extension that runs your parser using Antlr's C++ target, and then translates the parsed tree back into Python.
See the Speedy Antlr Tool Documentation for more details
i've used this grammar
https://github.com/antlr/grammars-v4/tree/master/java/java8
and following tutorials,
but
build fails.
src/ASAI/grammar/cpp_src/sa_java8_translator.h:8:10: fatal error: Java8BaseVisitor.h: No such file or directory
#include "Java8BaseVisitor.h"
^~~~~~~~~~~~~~~~~~~~
compilation terminated.
actually Java8BaseVisitor doesnt created when
java -jar ../../../antlr-4.8-complete.jar -Dlanguage=Cpp -visitor -no-listener -o cpp_src Java8Lexer.g4
java -jar ../../../antlr-4.8-complete.jar -Dlanguage=Cpp -visitor -no-listener -o cpp_src Java8Parser.g4
but creates
Java8ParserBaseVisitor.h
Java8ParserBaseVisitor.cpp
so i've edited
sa_java8_translator.h:8:10
from
#include "Java8BaseVisitor.h"
to
#include "Java8ParserBaseVisitor.h"
java8ParserBaseVisitor.h:15
from
class java8ParserBaseVisitor : public java8BaseParserVisitor {
to
class java8ParserBaseVisitor : public java8ParserVisitor {
and then looks build and install completly well
but
NameError: name 'sa_java8_cpp_parser' is not defined
occurs.
(venv) [sparrow@localhost sparrowai]$ find . -name "*sa_java8_cpp_parser*"
./venv/lib/python3.6/site-packages/ASAI-1.0.0-py3.6-linux-x86_64.egg/ASAI/grammar/sa_java8_cpp_parser.cpython-36m-x86_64-linux-gnu.so
./venv/lib/python3.6/site-packages/ASAI-1.0.0-py3.6-linux-x86_64.egg/ASAI/grammar/sa_java8_cpp_parser.py
./venv/lib/python3.6/site-packages/ASAI-1.0.0-py3.6-linux-x86_64.egg/ASAI/grammar/__pycache__/sa_java8_cpp_parser.cpython-36.pyc
./venv/lib/python3.6/site-packages/ASAI-1.0.0-py3.6-linux-x86_64.egg/ASAI/grammar/cpp_src/sa_java8_cpp_parser.cpp
./src/ASAI/grammar/cpp_src/sa_java8_cpp_parser.cpp
./build/lib.linux-x86_64-3.6/ASAI/grammar/cpp_src/sa_java8_cpp_parser.cpp
./build/lib.linux-x86_64-3.6/ASAI/grammar/sa_java8_cpp_parser.cpython-36m-x86_64-linux-gnu.so
./build/temp.linux-x86_64-3.6/src/ASAI/grammar/cpp_src/sa_java8_cpp_parser.o
./build/lib/ASAI/grammar/cpp_src/sa_java8_cpp_parser.cpp
what should i do?
Hi!
First of all thanks a lot for this great tool!
My question is:
I need (in Python) CommonTokenStream(lexer) class instance to get access to hidden tokens like comments with .getHiddenTokensToLeft/Right(...) methods of that stream. Am I right there is no easy way / workaround do get such instance from do_parse?
Sergey.
ANTLR 4.10 switched to std::any
, which seems to be causing these compilation failures:
src/grammar/cpp/sa_zoia_cpp_parser.cpp: In function ‘PyObject* do_parse(PyObject*, PyObject*)’:
src/grammar/cpp/sa_zoia_cpp_parser.cpp:100:44: error: ‘class std::any’ has no member named ‘as’
100 | result = visitor.visit(parse_tree).as<PyObject *>();
| ^~
src/grammar/cpp/sa_zoia_cpp_parser.cpp:100:56: error: expected primary-expression before ‘*’ token
100 | result = visitor.visit(parse_tree).as<PyObject *>();
| ^
src/grammar/cpp/sa_zoia_cpp_parser.cpp:100:57: error: expected primary-expression before ‘>’ token
100 | result = visitor.visit(parse_tree).as<PyObject *>();
| ^
src/grammar/cpp/sa_zoia_cpp_parser.cpp:100:59: error: expected primary-expression before ‘)’ token
100 | result = visitor.visit(parse_tree).as<PyObject *>();
| ^
I altered the generated files manually to fix this for now, replacing the as()
calls with std::any_cast
s, but I'm not a C++ expert so I have no idea if that's the right fix 🤷♀️
Hi,
I'm getting this error when trying to speed up the PlSql grammer:
error(126): PlSqlParser.g4:4587:14: cannot create implicit token for string literal in non-combined grammar: ')'
Is there any support for the PlSqlParser grammer or a way to fix the above issue.
Thank you
Hi,
I get error C2039 due to access to a member of a subclass, *Context
, from an instance of its superclass, Context
.
For example, in the following code, the compiler error is: 'blockLabel': is not a member of 'JavaParser::StatementContext'
Indeed ctx
object doesn't access to blockLable
which is defined in one of the subclasses of StatementContext
,
antlrcpp::Any SA_JavaTranslator::visitStatement(JavaParser::StatementContext *ctx){
speedy_antlr::LabelMap labels[] = {
{"blockLabel", static_cast<void*>(ctx->blockLabel)},
{"statementExpression", static_cast<void*>(ctx->statementExpression)},
{"identifierLabel", static_cast<void*>(ctx->identifierLabel)}
};
if(!StatementContext_cls) StatementContext_cls = PyObject_GetAttrString(translator->parser_cls, "StatementContext");
PyObject *py_ctx = translator->convert_ctx(this, ctx, StatementContext_cls, labels, 3);
return py_ctx;
blockLable
is defined in Statement0Context, and statementExpression
is defined in another subclass of StatementContext
class Statement0Context : public StatementContext {
public:
Statement0Context(StatementContext *ctx);
JavaParser::BlockContext *blockLabel = nullptr;
BlockContext *block();
virtual void enterRule(antlr4::tree::ParseTreeListener *listener) override;
virtual void exitRule(antlr4::tree::ParseTreeListener *listener) override;
virtual antlrcpp::Any accept(antlr4::tree::ParseTreeVisitor *visitor) override;
};
I used static and dynamic casting but I got runtime errors. The following code is compiled but makes runtime parsing error:
antlrcpp::Any SA_JavaTranslator::visitStatement(JavaParser::StatementContext *ctx){
JavaParser::Statement0Context *ctx1 = static_cast<JavaParser::Statement0Context*>(ctx);
JavaParser::Statement15Context *ctx2 = static_cast<JavaParser::Statement15Context*>(ctx);
JavaParser::Statement16Context *ctx3 = static_cast<JavaParser::Statement16Context*>(ctx);
speedy_antlr::LabelMap labels[] = {
{"blockLabel", static_cast<void*>(ctx1->blockLabel)},
{"statementExpression", static_cast<void*>(ctx2->statementExpression)},
{"identifierLabel", static_cast<void*>(ctx3->identifierLabel)}
};
if(!StatementContext_cls) StatementContext_cls = PyObject_GetAttrString(translator->parser_cls, "StatementContext");
PyObject *py_ctx = translator->convert_ctx(this, ctx, StatementContext_cls, labels, 3);
return py_ctx;
}
Every rule context has a .start
and .stop
attribute which contains a token
Some rules are missing one or both of these tokens
python -m pip install .
python example.py
None
Thanks for creating this package! Hopefully I'm not doing something obviously wrong, I'm a total antlr and C++ beginner.
As requested in #5
It would be great if you could support lexer only grammars as well. I have one that I just use ANTLR4 lexer only grammar.
I imagine this would produce a token stream object, rather than a parse tree.
If there are many rules in the input grammar the if-the-else statements lead to Fatal Error C1061 on parsing sa_*_cpp_parser.cpp file.
It seems that the C++ compiler can not compiler more than 128 nested blocks!
https://docs.microsoft.com/en-us/cpp/error-messages/compiler-errors-1/fatal-error-c1061?view=msvc-160
This is great tool! Quite helpful creating a faster parser with Python binding.
It seems though as this does not support separate lexer/parser grammars. When separate lexer/parser grammar files are used I see the following problem.
I have a grammar for a parser named A with the following file/grammar names:
ALexer.g4
AParser.g4
ANTLR4 Python3 target generates these python files:
ALexer.interp
ALexer.py
ALexer.tokens
AParser.interp
AParser.py
AParser.tokens
ANTLR4 Cpp target generates these cpp files:
ALexer.cpp
ALexer.h
ALexer.interp
ALexer.tokens
AParserBaseVisitor.cpp
AParserBaseVisitor.h
AParser.cpp
AParser.h
AParser.interp
AParser.tokens
AParserVisitor.cpp
AParserVisitor.h
Running speedy_antlr_tool.generate and passing AParser.py as argument generates:
sa_a.py
sa_A_cpp_parser.cpp
sa_A_translator.cpp
sa_A_translator.h
speedy_antlr.cpp
speedy_antlr.h
speedy_antlr.o
It seems the templates require a visitor header file with the name ABaseVisitor.h, but the CPP source includes a header named: AParserBaseVisitor.cpp:.
In file included from sa_a_cpp_parser.cpp:16:
sa_a_translator.h:8:10: fatal error: ABaseVisitor.h: No such file or directory
#include "ABaseVisitor.h"
^~~~~~~~~~~~~~~~
Hello Alex,
First of all, thank you so much for creating such a tool. I am the creator of Fugue. Fugue has its own DSL called Fugue SQL which is a higher level language on top of standard SQL. We are using antlr4 to create the language parser. It works great, but it is slow. That is why I am looking for alternative solutions that can be faster. You tool seems exactly what I need.
I created an experimental repo and added you to access it. There are a couple of issues
+=
the list operator in antlr will create private members (example), I think it is almost useless, but it is causing issues because C++ target does not have that, so the tool will throw exceptions. Currently, I have a simple workaround.stop
not found, sometimes parentCtx
not found. I think this is the major issue I need your help.To reproduce, you can clone the repo.
To rebuild the SQL
make sql
To install locally:
pip install -e .
To run the test
pytest tests/test_equivalent.py
If you run the test step multiple steps, you could see different error messages. (You may need to fix the EOF handling issue first to see various other issues)
Thanks!
It seems that the tree.parser
attribute does not set to its correct reference when using USE_CPP_BACKEND
to create a parse tree.
It returns None
. However, this attribute is set to a valid reference of ANTLR parser when using Python backend
It is required to initialize the TokenStreamRewriter
class with parse tree tokens in the following code (line 4 raise an exception):
'NoneType' object has no attribute 'getInputStream'
file_stream = FileStream(java_file_path)
sa_javalabeled.USE_CPP_IMPLEMENTATION = config.USE_CPP_BACKEND
tree = sa_javalabeled.parse(file_stream, 'compilationUnit')
tokens = tree.parser.getInputStream()
rewriter = TokenStreamRewriter(tokens)
I compared the parsing time of real programming languages' grammar on some large inputs. The best parsing time is for ANTLR with Java backend. What is the reason behind choosing C++ for Speedy ANTLR?
Hello everyone,
I get the following error when trying to run python setup install
to build a CPP backend parser for my grammar.
sa_javalabeled_cpp_parser.cpp(91): error C2440: '=': cannot convert from 'antlrcpp::Any' to 'PyObject *'
sa_javalabeled_cpp_parser.cpp(91): note: No user-defined-conversion operator available that can perform this conversion, or the operator cannot be called
The error points to the following lines in sa_javalabeled_cpp_parser.cpp file:
// Translate Parse tree to Python
SA_JavaLabeledTranslator visitor(&translator);
result = visitor.visit(parse_tree);
I use the latest version of speedy-antlr and antlr and python 3.8.x and MVC++14 on windows 10.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.