GithubHelp home page GithubHelp logo

zstang / matmul Goto Github PK

View Code? Open in Web Editor NEW

This project forked from attractivechaos/matmul

0.0 1.0 0.0 15 KB

Benchmarking matrix multiplication implementations

Makefile 6.07% C++ 25.40% C 68.53%

matmul's Introduction

This repo evaluates different matrix multiplication implementations given two large square matrices (2000-by-2000 in the following example):

Implementation Long description
Naive Most obvious implementation
Transposed Transposing the second matrix for cache efficiency
sdot w/o hints Replacing the inner loop with BLAS sdot()
sdot with hints sdot() with a bit unrolled loop
SSE sdot vectorized sdot() with explicit SSE instructions
SSE+tiling sdot SSE sdot() with loop tiling
OpenBLAS sdot sdot() provided by OpenBLAS
OpenBLAS sgemm sgemm() provided by OpenBLAS

To compile the evaluation program:

make CBLAS=/path/to/cblas/prefix

or omit the CBLAS setting you don't have it. After compilation, use

./matmul -h

to see the available options. Here is the result on my machines:

Implementation -a Linux,-n2000 Linux,-n4000 Linux/icc,-n4000 Mac,-n2000
Naive 0 7.53 sec 188.85 sec 173.76 sec 77.45 sec
Transposed 1 6.66 sec 55.48 sec 21.04 sec 9.73 sec
sdot w/o hints 4 6.66 sec 55.04 sec 21.35 sec 9.70 sec
sdot with hints 3 2.41 sec 29.47 sec 21.69 sec 2.92 sec
SSE sdot 2 1.36 sec 21.79 sec 22.18 sec 2.92 sec
SSE+tiling sdot 7 1.11 sec 10.84 sec 10.97 sec 1.90 sec
OpenBLAS sdot 5 2.69 sec 28.87 sec 5.61 sec
OpenBLAS sgemm 6 0.63 sec 4.91 sec 0.86 sec
uBLAS 7.43 sec 165.74 sec
Eigen 0.61 sec 4.76 sec 5.01 sec 0.85 sec

The machine configurations are as follows:

Machine CPU OS Compiler
Linux 2.6 GHz Xeon E5-2697 CentOS 6 gcc-4.4.7/icc-15.0.3
Mac 1.7 GHz Intel Core i5-2557M OS X 10.9.5 clang-600.0.57/LLVM-3.5svn

On both machines, OpenBLAS-0.2.18 is compiled with the following options (no AVX or multithreading):

TARGET=CORE2
BINARY=64
USE_THREAD=0
NO_SHARED=1
ONLY_CBLAS=1
NO_LAPACK=1
NO_LAPACKE=1

matmul's People

Contributors

lh3 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.