merinjo / parallel-matrix-multiply Goto Github PK

In this parallel strategy, rows and columns were partitioned into 4 so matrix A and B was partitioned into 4*4 block matrices. The tile width would be 512/4 = 128. At a time one block matrix A and B was brought to shared memory and all the threads computed on those data. Each thread would work on 128/64 = 2 columns of the block of matrix B. In parallel 64 threads would be working on a block at a time. The block sequence would be: First block of first row of C = first block of first row of c + (first block of first row of A * first block of first column of B) + (second block of first row of A * second block of first column of B) + … (fourth block of first row of A * fourth block of first column of B). This technique exploits both spatial and temporal locality of data since reuse of adjacent dataset and reuse of same dataset.

License: MIT License

C++ 100.00%

Recommend Projects

merinjo / parallel-matrix-multiply Goto Github PK

parallel-matrix-multiply's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs