GithubHelp home page GithubHelp logo

Comments (6)

amykyta3 avatar amykyta3 commented on June 29, 2024 1

Ah that makes more sense.

Reminder that the speedy-antlr accelerator works as follows:

  • Create an input stream in Python and hand it off to the C++ accelerator
  • Process the stream in the C++ domain using the generated Antlr lexer/parser
  • Traverse the C++ parse tree, and re-construct an equivalent in Python

The pure C++ parser is indeed much faster than Java. Unfortunately a lot of the performance improvement is lost in the last phase when the parsed tree structure is converted to Python. There are significant overheads in Python object creation (Extensive use of dynamic allocation). I do what I can to cache as much information as I can, but a lot of this overhead is unavoidable.
There may still be some minor enhancements possible, but they will be minimal.

In the end, a pure C++ solution (no Python) will always be the fastest, even compared to Java.

from speedy-antlr-tool.

Thrameos avatar Thrameos commented on June 29, 2024 1

I can throw in my two cents, but your mileage may vary. As the author of the JPype backend, I can assure you that a C++/CPython solution should be faster than a Java/CPython solution. But this really depends on how the interface to CPython is written.

The original JPype solution which used a C++ backend where most of the code where the methods went through Python to call C++ methods on the backend was dog slow. The interface used Python capsules stored in Python classes which were accessed through a small set of CPython module level function calls. Converting the entire backend to pure C++/CPython in which only the start up and infrequently used routines contained Python code and writing custom classes which had dedicated slots for all major actions improved speed by factors of 4 to 12. Specializing iterators for list comprehensions gave another factor of 4 on top for working with lists. Adding dedicate bulk transfer methods using the buffer protocol increased array operations by about a factor of about 400 for primitives (which is how numpy operates). Basically anything which avoids the Python object bottleneck by limiting the number of objects created to just the minimum required to communicate with Python will run much faster. Simple silly things like storing and reading from a dict, getting hashes from freshly created strings, or packing tuples for a call, all result in many objects being created which result in slow performance. Python internally uses bit flags, dedicated slots, and all manner of tricks to make its native classes perform well, and to get good performance in a CPython module you have to do the same. About the only thing I have seen on par is Cython but that is just another way to get at C.

When using JPype, I always recommend the "light" wrapper approach in which JPype directly exposed a Java class as a module rather than what many modules do with a "heavy" wrapper. In a heavy wrapper the classes are written in Python converting types, then it calls JPype as a backend, then converts all the types back using Python code. You end up paying all of the object cost of Python and then call into Java (through C++ and JNI) just to the unwrap it back into Python paying another set of object costs. If instead you either use the existing Java API or create a "Python friendly" Java API which can be directly exposed, then you will get much higher performance. The "heavy" wrapper only saves time if Java can do a bunch of work that would otherwise be prohibitive in Python if coded directly. If you are calling methods to do small amounts of work then the heavy wrapper is much slower and at times is worse than writing the whole thing in Python.

JPype can operate remarkably quickly so long as bulk transfers are used or the majority of the data only crosses the boundary between Python and Java once. One would think that working directly with Java and Python in the same machine would be quicker, but usually that has not proven the case. Some users reported JPype/CPython speed ups over Jython solutions by over an order of magnitude (jobs taking hours reduces to minutes). This is not because of anything wrong with Jython, but because the JVM object model is not very compatible with the Python object model. If CPython with a specialized object lifespan model which is optimized to handle the object creation/deletion bottleneck is a still a drag, then porting it to the JVM where object creation has an even larger penalty and clean up depends on a stop-the-world galactic garbage collector results in a huge performance penalty.

I suffered much that same problem when I ported a C++ quaternion geometry calculator over into Java. If I wrote it the way that I would in C++, the object penalty was overwhelming and the garbage collector was running non-stop. Reformulating it to manually reuse result objects was much fast (but much less readable). The JVM is not acceptable for the Python many small objects created frequently approach.

Ultimately as using Java from Python requires going through a method dispatch to resolve overloads and then the marshalling/unmarshalling process of going through JNI, the speed of a dedicated C++ to CPython API should always beat it hands down. Though the effort of writing all of the primitives and infrastructure required to get good speed can be surprisingly large. C++ lacks much of the required reflection to make wrapping easy so it is all custom code at the interface layer. Transferring exceptions, managing memory lifetimes between C++ and Python, etc is a non-trivial development cost. Thus the reason I developed and use JPype on a daily basis was so that everything in Java is exposed automatically (via reflection) without requiring additional effort on the users part. I am supporting a group of physicist where programming is a secondary skill, hence making Java work seamlessly is the goal with speed secondary objective.

If your goal is best speed, then you would need to dedicate similar levels of effort to a C++/CPython solution.

from speedy-antlr-tool.

amykyta3 avatar amykyta3 commented on June 29, 2024

Are you comparing Java vs pure C++? Or C++ using Speedy Antlr?

from speedy-antlr-tool.

amykyta3 avatar amykyta3 commented on June 29, 2024

Part of the reason is that Python natively supports extensions written in C/C++.

from speedy-antlr-tool.

m-zakeri avatar m-zakeri commented on June 29, 2024

Are you comparing Java vs pure C++? Or C++ using Speedy Antlr?

I compared ANTLR Java only with C++ using Speedy ANTLR. I think C++ must be faster than Java but perhaps the C++ implementation of ANTLR is not optimized like its Python implementation!

from speedy-antlr-tool.

m-zakeri avatar m-zakeri commented on June 29, 2024

Ah that makes more sense.

Reminder that the speedy-antlr accelerator works as follows:

* Create an input stream in Python and hand it off to the C++ accelerator

* Process the stream in the C++ domain using the generated Antlr lexer/parser

* Traverse the C++ parse tree, and re-construct an equivalent in Python

The pure C++ parser is indeed much faster than Java. Unfortunately a lot of the performance improvement is lost in the last phase when the parsed tree structure is converted to Python. There are significant overheads in Python object creation (Extensive use of dynamic allocation). I do what I can to cache as much information as I can, but a lot of this overhead is unavoidable.
There may still be some minor enhancements possible, but they will be minimal.

In the end, a pure C++ solution (no Python) will always be the fastest, even compared to Java.

Thank you for your complete explanation. What is your opinion about delegating parse tree construction to ANTLR Java using infrastructures such as JPype or Jython to overcome "C++ Python object creation"? Is it possible or a good approach?

from speedy-antlr-tool.

Related Issues (14)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.