GithubHelp home page GithubHelp logo

alexanderyastrebov / 1brc-simd Goto Github PK

View Code? Open in Web Editor NEW

This project forked from lehuyduc/1brc-simd

0.0 1.0 0.0 147 KB

Process 1 billion row of text data as fast as possible

Shell 0.51% C++ 94.94% C 4.54%

1brc-simd's Introduction

1brc-simd

Process 1 billion row of text data as fast as possible

https://www.morling.dev/blog/one-billion-row-challenge/


Discussion thread: gunnarmorling/1brc#138

Use sha256sum to check that output is same as reference output 016930801788eb421a15cf6def8ea435b4b47fb5f41df09e02ecdd7fbc9ac92b result.txt

I used this file (generated by ./create_measurements.sh 1000000000) to test: https://drive.google.com/file/d/1HEyNw4M453n0tnuaAm9nwaCiLydQYnpo/view?usp=sharing

To run, just download the file above, extract, then ./run_cpp.sh To run with 8 threads to compare with other submission, set N_THREADS = 8 in 1brc_final_valid.cpp


Main indeas:

  • Unsigned int overflow hashing: cheapest hash method possible.
  • SIMD hashing
  • SIMD for string comparison in hash table probing
  • Notice properties of actual data
    • 99% of station names has length <= 16, use compiler hint + implement SIMD for this specific case. If length > 16, use a fallback => still meet requirements of MAX_KEY_LENGTH = 100
    • -99.9 <= temperature <= 99.9 guaranteed, use branchless code using this property
  • Use mmap for fast file reading
  • Use multithreading for both parsing the file, and aggregating the data
  • Other random tricks (intentional ordering of variable assignments)

1brc-simd's People

Contributors

lehuyduc avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.