GithubHelp home page GithubHelp logo

benchmarks's People

Contributors

nfcampos avatar

Stargazers

 avatar

Watchers

 avatar  avatar

benchmarks's Issues

performance

The following was run on a MacBook with 2 cores and 8 GB of memory with all applications closed.

(benchmarks)benchmarks master > ./run-benchmarks.sh 
sys:1: DtypeWarning: Columns (0,19) have mixed types. Specify dtype option on import or set low_memory=False.
pandas read csv: 11.1477160454s
pandas apply transforms: 0.938997983932s
2016-04-15 13:10:16,467 [INFO] sframe.cython.cy_server, 172: SFrame v1.8.5 started. Logging /tmp/sframe_server_1460722216.log
------------------------------------------------------
Inferred types from first line of file as 
column_type_hints=[int,float,float,float,float,str,str,float,str,str,str,str,str,float,str,str,str,str,str,str,str,str,str,str,float,float,str,float,float,float,float,float,float,str,float,str,float,float,float,float,float,float,float,float,float,str,float,str,str,float,str,float,str]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------
Unable to parse line "Loans that do not meet the credit policy,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"
Read 89872 lines. Lines per second: 35505.4
Read 426460 lines. Lines per second: 56880.2
1 lines failed to parse correctly
Finished parsing file /Users/samuelhopkins/cp/benchmarks/data/lc_big.csv
Parsing completed. Parsed 756878 lines in 12.2473 secs.
sframe read csv: 17.1927471161s
sframe apply transforms: 16.9669880867s
node apply transforms: 466.212ms

As you can see, applying the operations in pandas take about 1s, using sframe about 18s, and in node.js about 0.5s.

The performance difference between pandas and sframe is probably due to the fact that I can use the native pandas functions isin and map which I am guessing are highly optimized while with sframe I am simply using apply which is being supplied with pure python functions.

However, I can confirm that sframe is using all cores which leads me to believe that if I can perform my single operations more efficiently, I should see better results.

Node.js is the winner so far, but it's scalability and predictibility is a bit limited so we are willing to take a few milliseconds hit if we can get something robust that uses all cores.

I guess the question here is: 1) Did I miss something in the documentation for sframe that provides equivilent pandas map and isin functionality and 2) If not, how can I optimize the given operations?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.