GithubHelp home page GithubHelp logo

Refactor to support Apache Arrow about octosql HOT 3 OPEN

rupurt avatar rupurt commented on May 23, 2024
Refactor to support Apache Arrow

from octosql.

Comments (3)

cube2222 avatar cube2222 commented on May 23, 2024

Hey @rupurt!

I'm not planning to support Apache Arrow nor port OctoSQL to a vectorised execution engine. It's not worth the effort. Re the thesis - it's old and OctoSQL has been rewritten from scratch since. It's around 100x faster now, due to the static typing and how the execution phase now works.

In general, if you're working with data where the speedup of a columnar engine would be worth it, just use https://github.com/apache/arrow-datafusion or a project built around it. It has much more manpower behind it. Arrow is a PITA to code around, especially when you want union types, repetition, and deeply nested data structures. Additionally, the Go Arrow library is way behind the Rust one (or others).

To answer your last question, if you'd like to port OctoSQL to Arrow, please fork. As far as improvements go, they're welcome! However, please first create issues to discuss the details of the contributions.

If you'd like to attempt this redesign on your own, your best bet is to keep the physical phase but rewrite the execution phase almost completely. Here's an experiment you can use as inspiration: https://github.com/cube2222/octosql/tree/vectorization-experiment2

from octosql.

rupurt avatar rupurt commented on May 23, 2024

Thank you for the info and leads @cube2222. I definitely noticed that you added a dataflow engine which is one of the reasons I'm stoked to comprehend and work with your project!

I'm totally with you in regards to datafusion having a larger community and more progress. But my personal belief is that no one has really nailed the serving layer for small/medium/big data besides maybe Presto which is JVM based. I also believe that once we see the right tool every language will implement a version and go is a sleeping giant with a huge and growing fan base.

Also, sometimes it's just fun to hack on interesting stuff 😄

from octosql.

gedw99 avatar gedw99 commented on May 23, 2024

Data vision and arrow is certainly cool and has momentum.

Octosql is nice and low ceremony alternative. I prefer octosql. Will try to make PR on things .

@cube2222 would be good to have roadmap and triage issues out to agreed bits of work that people can then work on ??

from octosql.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.