GithubHelp home page GithubHelp logo

Comments (5)

hosseinmoein avatar hosseinmoein commented on May 12, 2024

I don't. But if you have the means to do a meaningful comparison in terms of both performance and scale-ability, it would be great.
Also @yssource

from dataframe.

yssource avatar yssource commented on May 12, 2024

pandas is developed with python, which is absolutely slower then C/C++.
For performance reasons, I decided to replace the algorithm part of my python codes with C++ DataFrame.
And then I use pybind11 porting back to python.

@backtradercn
For a simple speed testing, maybe it will help you.

  1. python timeit module
    import timeit
  2. c++ Boost timer
    #include<boost/timer.hpp>

from dataframe.

hosseinmoein avatar hosseinmoein commented on May 12, 2024

@backtradercn, @yssource
I have added a performance section to the README file explaining how the new performance test runs

from dataframe.

qingtiandalaoye avatar qingtiandalaoye commented on May 12, 2024

MacBook> time python pandas_performance.py
All memory allocations are done. Calculating means ...

real 17m18.916s
user 4m47.113s
sys 5m31.901s
MacBook>
MacBook>
MacBook> time ../bin/Linux.GCC64/dataframe_performance
All memory allocations are done. Calculating means ...

real 6m40.222s
user 2m54.362s
sys 2m14.951s

---seems cpp only 2 times faster than python?

from dataframe.

hosseinmoein avatar hosseinmoein commented on May 12, 2024

@qingtiandalaoye,
I think you are misinterpreting the specs, probably because I wasn’t clear in my writeup. A few points:

  1. The Pandas performance script is not really in Python. I believe almost everything there is done in Numpy which is C. That means DataFrame is more than 2x faster than Numpy/C.
  2. As I mentioned in “The interesting part” section, DataFrame is more than 2x faster than Pandas/Numpy in generating the same random numbers and loading them into column vectors. But DataFrame was about 10x faster in calculating means.
  3. You only load data once but calculate statistics many times. So in general DataFrame is about 10x faster than parts of Pandas that are in Numpy. Parts of Pandas that are purely in Python should be much much slower.

from dataframe.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.