Comments (5)
I don't. But if you have the means to do a meaningful comparison in terms of both performance and scale-ability, it would be great.
Also @yssource
from dataframe.
pandas is developed with python, which is absolutely slower then C/C++.
For performance reasons, I decided to replace the algorithm part of my python codes with C++ DataFrame.
And then I use pybind11
porting back to python.
@backtradercn
For a simple speed testing, maybe it will help you.
- python
timeit
module
import timeit
- c++ Boost
timer
#include<boost/timer.hpp>
from dataframe.
@backtradercn, @yssource
I have added a performance section to the README file explaining how the new performance test runs
from dataframe.
MacBook> time python pandas_performance.py
All memory allocations are done. Calculating means ...
real 17m18.916s
user 4m47.113s
sys 5m31.901s
MacBook>
MacBook>
MacBook> time ../bin/Linux.GCC64/dataframe_performance
All memory allocations are done. Calculating means ...
real 6m40.222s
user 2m54.362s
sys 2m14.951s
---seems cpp only 2 times faster than python?
from dataframe.
@qingtiandalaoye,
I think you are misinterpreting the specs, probably because I wasn’t clear in my writeup. A few points:
- The Pandas performance script is not really in Python. I believe almost everything there is done in Numpy which is C. That means DataFrame is more than 2x faster than Numpy/C.
- As I mentioned in “The interesting part” section, DataFrame is more than 2x faster than Pandas/Numpy in generating the same random numbers and loading them into column vectors. But DataFrame was about 10x faster in calculating means.
- You only load data once but calculate statistics many times. So in general DataFrame is about 10x faster than parts of Pandas that are in Numpy. Parts of Pandas that are purely in Python should be much much slower.
from dataframe.
Related Issues (20)
- Adding support for sort column by absolute value. HOT 2
- Unable to compile code using cmake HOT 4
- dataframe_join.tcc miss { } HOT 13
- append row and visitor calculate unexpected HOT 2
- No write to file example HOT 4
- Dataframe length HOT 2
- Plausibility of adapting for C++17 in a fork? HOT 2
- How to filter the DataFrame? HOT 1
- conflict with include <windows.h> HOT 1
- Aggregate visitors can't be used in groupby HOT 3
- Error: specializing member ‘hmdf::DataFrame<int, hmdf::HeteroVector<0> >::set_lock’ requires ‘template<>’ syntax| HOT 9
- Use of std::shared_mutex with shared_lock instead of native locks defined in ThreadGranularity.h HOT 2
- DateTime: Issue with parsing ISO datetime HOT 6
- StdVisitor error with user-defined type HOT 4
- in dynamic libraries, get_column returns an empty data vector HOT 10
- MedianVisitor giving wrong result HOT 2
- Issues while compiling with DataFrame headers HOT 4
- test failed HOT 2
- load_column from single_act_visit.get_result() HOT 3
- failed to compile tests and examples in ubuntu HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dataframe.