amykyta3 / speedy-antlr-tool Goto Github PK

View Code? Open in Web Editor NEW

28.0 5.0 7.0 45 KB

Generate an accelerator extension that makes your Antlr parser in Python super-fast!

License: BSD 3-Clause "New" or "Revised" License

Python 46.86% C++ 53.14%

antlr4 antlr-parser python performance speedup slow accelerator

speedy-antlr-tool's Introduction

Speedy Antlr Tool

Running an Antlr parser in Python is slow.

This tool generates a Python extension that runs your parser using Antlr's C++ target, and then translates the parsed tree back into Python.

See the Speedy Antlr Tool Documentation for more details

speedy-antlr-tool's People

Stargazers

Watchers

Forkers

m-zakeri jsonlee0x01 goodwanghan kentrutan prestonfr chisenzhang mohamedelsaber

speedy-antlr-tool's Issues

tried java8parser but build fails

i've used this grammar
https://github.com/antlr/grammars-v4/tree/master/java/java8

Failing Builds.

and following tutorials,
but
build fails.

src/ASAI/grammar/cpp_src/sa_java8_translator.h:8:10: fatal error: Java8BaseVisitor.h: No such file or directory
 #include "Java8BaseVisitor.h"
          ^~~~~~~~~~~~~~~~~~~~
compilation terminated.

actually Java8BaseVisitor doesnt created when

java -jar ../../../antlr-4.8-complete.jar  -Dlanguage=Cpp -visitor -no-listener -o cpp_src Java8Lexer.g4
java -jar ../../../antlr-4.8-complete.jar  -Dlanguage=Cpp -visitor -no-listener -o cpp_src Java8Parser.g4

but creates
Java8ParserBaseVisitor.h
Java8ParserBaseVisitor.cpp

Manual code fix

so i've edited
sa_java8_translator.h:8:10
from
#include "Java8BaseVisitor.h"
to
#include "Java8ParserBaseVisitor.h"

java8ParserBaseVisitor.h:15
from
class java8ParserBaseVisitor : public java8BaseParserVisitor {
to
class java8ParserBaseVisitor : public java8ParserVisitor {

and then looks build and install completly well

but
NameError: name 'sa_java8_cpp_parser' is not defined
occurs.

(venv) [sparrow@localhost sparrowai]$ find . -name "*sa_java8_cpp_parser*"
./venv/lib/python3.6/site-packages/ASAI-1.0.0-py3.6-linux-x86_64.egg/ASAI/grammar/sa_java8_cpp_parser.cpython-36m-x86_64-linux-gnu.so
./venv/lib/python3.6/site-packages/ASAI-1.0.0-py3.6-linux-x86_64.egg/ASAI/grammar/sa_java8_cpp_parser.py
./venv/lib/python3.6/site-packages/ASAI-1.0.0-py3.6-linux-x86_64.egg/ASAI/grammar/__pycache__/sa_java8_cpp_parser.cpython-36.pyc
./venv/lib/python3.6/site-packages/ASAI-1.0.0-py3.6-linux-x86_64.egg/ASAI/grammar/cpp_src/sa_java8_cpp_parser.cpp
./src/ASAI/grammar/cpp_src/sa_java8_cpp_parser.cpp
./build/lib.linux-x86_64-3.6/ASAI/grammar/cpp_src/sa_java8_cpp_parser.cpp
./build/lib.linux-x86_64-3.6/ASAI/grammar/sa_java8_cpp_parser.cpython-36m-x86_64-linux-gnu.so
./build/temp.linux-x86_64-3.6/src/ASAI/grammar/cpp_src/sa_java8_cpp_parser.o
./build/lib/ASAI/grammar/cpp_src/sa_java8_cpp_parser.cpp

what should i do?

Provide access to hidden tokens in a stream

Hi!

First of all thanks a lot for this great tool!

My question is:
I need (in Python) CommonTokenStream(lexer) class instance to get access to hidden tokens like comments with .getHiddenTokensToLeft/Right(...) methods of that stream. Am I right there is no easy way / workaround do get such instance from do_parse?

Sergey.

Compilation failure with ANTLR 4.10

ANTLR 4.10 switched to std::any, which seems to be causing these compilation failures:

src/grammar/cpp/sa_zoia_cpp_parser.cpp: In function ‘PyObject* do_parse(PyObject*, PyObject*)’:
src/grammar/cpp/sa_zoia_cpp_parser.cpp:100:44: error: ‘class std::any’ has no member named ‘as’
  100 |         result = visitor.visit(parse_tree).as<PyObject *>();
      |                                            ^~
src/grammar/cpp/sa_zoia_cpp_parser.cpp:100:56: error: expected primary-expression before ‘*’ token
  100 |         result = visitor.visit(parse_tree).as<PyObject *>();
      |                                                        ^
src/grammar/cpp/sa_zoia_cpp_parser.cpp:100:57: error: expected primary-expression before ‘>’ token
  100 |         result = visitor.visit(parse_tree).as<PyObject *>();
      |                                                         ^
src/grammar/cpp/sa_zoia_cpp_parser.cpp:100:59: error: expected primary-expression before ‘)’ token
  100 |         result = visitor.visit(parse_tree).as<PyObject *>();
      |                                                           ^

I altered the generated files manually to fix this for now, replacing the as() calls with std::any_casts, but I'm not a C++ expert so I have no idea if that's the right fix 🤷‍♀️

Implicit token error when parsing PlSql grammer

Hi,

I'm getting this error when trying to speed up the PlSql grammer:

error(126): PlSqlParser.g4:4587:14: cannot create implicit token for string literal in non-combined grammar: ')'

Is there any support for the PlSqlParser grammer or a way to fix the above issue.

Thank you

Error in sa_*_translator.cpp(536): error C2039 when using ANTLR labled rules

Hi,
I get error C2039 due to access to a member of a subclass, *Context, from an instance of its superclass, Context.
For example, in the following code, the compiler error is: 'blockLabel': is not a member of 'JavaParser::StatementContext'
Indeed ctx object doesn't access to blockLable which is defined in one of the subclasses of StatementContext,

antlrcpp::Any SA_JavaTranslator::visitStatement(JavaParser::StatementContext *ctx){
	 
	speedy_antlr::LabelMap labels[] = {
        {"blockLabel", static_cast<void*>(ctx->blockLabel)},
        {"statementExpression", static_cast<void*>(ctx->statementExpression)},
        {"identifierLabel", static_cast<void*>(ctx->identifierLabel)}
    };
    if(!StatementContext_cls) StatementContext_cls = PyObject_GetAttrString(translator->parser_cls, "StatementContext");
    PyObject *py_ctx = translator->convert_ctx(this, ctx, StatementContext_cls, labels, 3);
    return py_ctx;

blockLable is defined in Statement0Context, and statementExpression is defined in another subclass of StatementContext

class  Statement0Context : public StatementContext {
  public:
    Statement0Context(StatementContext *ctx);

    JavaParser::BlockContext *blockLabel = nullptr;
    BlockContext *block();
    virtual void enterRule(antlr4::tree::ParseTreeListener *listener) override;
    virtual void exitRule(antlr4::tree::ParseTreeListener *listener) override;

    virtual antlrcpp::Any accept(antlr4::tree::ParseTreeVisitor *visitor) override;
  };

I used static and dynamic casting but I got runtime errors. The following code is compiled but makes runtime parsing error:

antlrcpp::Any SA_JavaTranslator::visitStatement(JavaParser::StatementContext *ctx){
	JavaParser::Statement0Context *ctx1 = static_cast<JavaParser::Statement0Context*>(ctx);
    JavaParser::Statement15Context *ctx2 = static_cast<JavaParser::Statement15Context*>(ctx);
	JavaParser::Statement16Context *ctx3 = static_cast<JavaParser::Statement16Context*>(ctx);
	speedy_antlr::LabelMap labels[] = {
        {"blockLabel", static_cast<void*>(ctx1->blockLabel)},
        {"statementExpression", static_cast<void*>(ctx2->statementExpression)},
        {"identifierLabel", static_cast<void*>(ctx3->identifierLabel)}
    };
    if(!StatementContext_cls) StatementContext_cls = PyObject_GetAttrString(translator->parser_cls, "StatementContext");
    PyObject *py_ctx = translator->convert_ctx(this, ctx, StatementContext_cls, labels, 3);
    return py_ctx;
}

some rule ctx missing start/stop tokens

Expected behavior

Every rule context has a .start and .stop attribute which contains a token

Actual behavior

Some rules are missing one or both of these tokens

Steps to reproduce

clone this example repo https://github.com/macintoshpie/speedy_antlr_modelica_tokens
Run python -m pip install .
Run python example.py
This will print out the start and stop token at every rule context using both parser implementations. Notice that for the python implementation there are not tokens that are None

Thanks for creating this package! Hopefully I'm not doing something obviously wrong, I'm a total antlr and C++ beginner.

Add support for lexer-only grammars

As requested in #5

It would be great if you could support lexer only grammars as well. I have one that I just use ANTLR4 lexer only grammar.

I imagine this would produce a token stream object, rather than a parse tree.

Fatal Error C1061 on parsing sa_*_cpp_parser.cpp file

If there are many rules in the input grammar the if-the-else statements lead to Fatal Error C1061 on parsing sa_*_cpp_parser.cpp file.
It seems that the C++ compiler can not compiler more than 128 nested blocks!
https://docs.microsoft.com/en-us/cpp/error-messages/compiler-errors-1/fatal-error-c1061?view=msvc-160

Separate lexer/parser grammars

This is great tool! Quite helpful creating a faster parser with Python binding.

It seems though as this does not support separate lexer/parser grammars. When separate lexer/parser grammar files are used I see the following problem.

I have a grammar for a parser named A with the following file/grammar names:

ALexer.g4
AParser.g4

ANTLR4 Python3 target generates these python files:

ALexer.interp
ALexer.py
ALexer.tokens
AParser.interp
AParser.py
AParser.tokens

ANTLR4 Cpp target generates these cpp files:

ALexer.cpp
ALexer.h
ALexer.interp
ALexer.tokens
AParserBaseVisitor.cpp
AParserBaseVisitor.h
AParser.cpp
AParser.h
AParser.interp
AParser.tokens
AParserVisitor.cpp
AParserVisitor.h

Running speedy_antlr_tool.generate and passing AParser.py as argument generates:

sa_a.py

sa_A_cpp_parser.cpp
sa_A_translator.cpp
sa_A_translator.h

speedy_antlr.cpp
speedy_antlr.h
speedy_antlr.o

It seems the templates require a visitor header file with the name ABaseVisitor.h, but the CPP source includes a header named: AParserBaseVisitor.cpp:.

In file included from sa_a_cpp_parser.cpp:16:
sa_a_translator.h:8:10: fatal error: ABaseVisitor.h: No such file or directory
 #include "ABaseVisitor.h"
          ^~~~~~~~~~~~~~~~

Multiple issues when applying on a complex antlr file

Hello Alex,

First of all, thank you so much for creating such a tool. I am the creator of Fugue. Fugue has its own DSL called Fugue SQL which is a higher level language on top of standard SQL. We are using antlr4 to create the language parser. It works great, but it is slow. That is why I am looking for alternative solutions that can be faster. You tool seems exactly what I need.

I created an experimental repo and added you to access it. There are a couple of issues

+= the list operator in antlr will create private members (example), I think it is almost useless, but it is causing issues because C++ target does not have that, so the tool will throw exceptions. Currently, I have a simple workaround.
It can't reach parity with python target. This may be fine, but the problem is I find the result not deterministic. You can run this test and there are a few issues:

It seems EOF is handled differently (minor issue)
After fixing the EOF issue, If you run the test multiple times, you will find it has different errors, sometimes stop not found, sometimes parentCtx not found. I think this is the major issue I need your help.

To reproduce, you can clone the repo.

To rebuild the SQL

make sql

To install locally:

pip install -e .

To run the test

pytest tests/test_equivalent.py

If you run the test step multiple steps, you could see different error messages. (You may need to fix the EOF handling issue first to see various other issues)

Thanks!

tree.parser attribute is None

It seems that the tree.parser attribute does not set to its correct reference when using USE_CPP_BACKEND to create a parse tree.
It returns None. However, this attribute is set to a valid reference of ANTLR parser when using Python backend
It is required to initialize the TokenStreamRewriter class with parse tree tokens in the following code (line 4 raise an exception):
'NoneType' object has no attribute 'getInputStream'

file_stream = FileStream(java_file_path)
sa_javalabeled.USE_CPP_IMPLEMENTATION = config.USE_CPP_BACKEND
tree = sa_javalabeled.parse(file_stream, 'compilationUnit')
tokens = tree.parser.getInputStream()
rewriter = TokenStreamRewriter(tokens)

sa_javalabeled_cpp_parser.cpp(91): error C2440: '=': cannot convert from 'antlrcpp::Any' to 'PyObject *'
sa_javalabeled_cpp_parser.cpp(91): note: No user-defined-conversion operator available that can perform this conversion, or the operator cannot be called

The error points to the following lines in sa_javalabeled_cpp_parser.cpp file:

  // Translate Parse tree to Python
        SA_JavaLabeledTranslator visitor(&translator);
        result = visitor.visit(parse_tree);

I use the latest version of speedy-antlr and antlr and python 3.8.x and MVC++14 on windows 10.