benchmarks's People
benchmarks's Issues
performance
The following was run on a MacBook with 2 cores and 8 GB of memory with all applications closed.
(benchmarks)benchmarks master > ./run-benchmarks.sh
sys:1: DtypeWarning: Columns (0,19) have mixed types. Specify dtype option on import or set low_memory=False.
pandas read csv: 11.1477160454s
pandas apply transforms: 0.938997983932s
2016-04-15 13:10:16,467 [INFO] sframe.cython.cy_server, 172: SFrame v1.8.5 started. Logging /tmp/sframe_server_1460722216.log
------------------------------------------------------
Inferred types from first line of file as
column_type_hints=[int,float,float,float,float,str,str,float,str,str,str,str,str,float,str,str,str,str,str,str,str,str,str,str,float,float,str,float,float,float,float,float,float,str,float,str,float,float,float,float,float,float,float,float,float,str,float,str,str,float,str,float,str]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------
Unable to parse line "Loans that do not meet the credit policy,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"
Read 89872 lines. Lines per second: 35505.4
Read 426460 lines. Lines per second: 56880.2
1 lines failed to parse correctly
Finished parsing file /Users/samuelhopkins/cp/benchmarks/data/lc_big.csv
Parsing completed. Parsed 756878 lines in 12.2473 secs.
sframe read csv: 17.1927471161s
sframe apply transforms: 16.9669880867s
node apply transforms: 466.212ms
As you can see, applying the operations in pandas take about 1s, using sframe about 18s, and in node.js about 0.5s.
The performance difference between pandas and sframe is probably due to the fact that I can use the native pandas functions isin
and map
which I am guessing are highly optimized while with sframe I am simply using apply
which is being supplied with pure python functions.
However, I can confirm that sframe is using all cores which leads me to believe that if I can perform my single operations more efficiently, I should see better results.
Node.js is the winner so far, but it's scalability and predictibility is a bit limited so we are willing to take a few milliseconds hit if we can get something robust that uses all cores.
I guess the question here is: 1) Did I miss something in the documentation for sframe that provides equivilent pandas map
and isin
functionality and 2) If not, how can I optimize the given operations?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.