GithubHelp home page GithubHelp logo

eth-advanced-systems-lab-ss2022's Introduction

ETH Advanced Systems Lab SS2021 - Triangle Listing

In-depth optimizations on three common single-core triangle listing algorithms by improving memory layouts, reducing branch mispredictions, and applying SIMD instructions. Evaluations are performed on both generated graphs with various densities and real-world graphs, achieving significant speedups of up to 15x in comparison to the straightforward implementations.

โš ๏ธ If you are enrolled in the source, make sure you won't break the academic integrity regulations before proceeding.

=> Our report

NOTICE: test data are removed due to large sizes.

Authors

Team 02:

Input File Format

<node count>
<adjacent node count of node 0> <node> <node> ...
<adjacent node count of node 1> <node> <node> ...
...

How to run:

./benchmark -num_warmups $WARMUP -num_runs $RUN  -num_phases $PHASE -o result.csv -algorithm $ALGOS -graph $INPUTDIR/$graph.txt

IMPORTANT: Adding the flags -no_pre_cut -no_pre_sort should only be done when alogrithm selected is compiled with sorting / cutting code in place (only needed for EI and F) othewise it might segfault since it will expect cutted input, this can be done by defining SORTING (see common.h). Also adding the flags -no_pre_cut -no_pre_sort for EU and F should only be done when SORTING is defined.

Counting vs Listing

Elements can be counted instead of being listed by commenting "#define COLLECT_TRIANGLES" in main.cpp. Counting increase the performance and runtime of some versions significantly.

Instrumentation

Disable & Enable

Instrumentation can be removed by uncommenting "#define NO_INSTRUMENTATION" in common.h, otherwise the every version will also be run in instrumented mode which might be way slower than the actual version.

AVX Instrumentation

AVX instructions are instrumented by adding them to "instrumented_intrinsics.json", this allows to specify the number of operations a specific intrinsic does. The "code_generator.py" will then generate a mocked intrinsic header "instrumented_immintrin.h" which uses the instrumented intrinsics only if "INSTRUMENTED" is defined.

Compiler Flags

-fargument-noalias-anything -fno-strict-aliasing works for ei_u4 does not work for forward

Versions

Edge Iterator

  • ei_base: pre sort, no pre cut, has s smaller t everywhere
  • ei_va: pre sort, pre cut, remove all s smaller t, all other versions are based on this
  • ei_v1
  • ei_v2: exponential search + set intersection compare with (1 set, 2 elements stride) at a time = unroll the set intersection loop.
  • ei_u4: unroll t-loop by 4, set intersection compare with (4 sets, 1 element stride) at a time.
    Unrolling by 3 is a little better
    Compiling with -fno-strict-aliasing is better.
  • ei_u5: u4 + unfold the loop by hand + store commonly accessed items in variables.
    This optimization should be done by the compiler, but somehow only done by compiler when using -fno-strict-aliasing.
  • ei_vec4: apply forward_v4 to ei
  • ei_vec5: apply forward_v5 to ei
    Both vec4 and vec5 has a slightly different gt/lt/ge/le comparison chunk than forward_v4/v5, no significant speedup.

Forward

  • f_base
  • f_va
  • f_v1: Removes all branches
  • f_v2: Unroll by and vectroize by factor 4
  • f_v3: Unroll by and vectroize by factor 8
  • f_v4: vectorized version of u4. Unroll by and vectroize by factor 2x8=16
  • f_v5: Unroll by and vectroize by factor 4x8=32
    Compiling v4 and v5 with -fno-strict-aliasing is somehow worse.
  • f_v6: f_v5 + scalar replacment and gather instead of set optimization as explained in report.
  • f_u4: apply ei_u4 to forward. Same performance characteristics.

Forward Hashed

  • fh_base
  • fh_va
  • fh_v1: put the first hash item in the bucket, reduce one indirection
  • fh_v2:
  • fh_v3: vectorized one item hash table lookup
  • fh_v4: vectorized one item hash table lookup

Other optimizations

  • put adjacency.count into adjacency.neighbors, similar for forward helper.

eth-advanced-systems-lab-ss2022's People

Contributors

jenny011 avatar liuzikai avatar matthiaas avatar nblach avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.