This project benchmarks the following libraries:
To run:
(allocation)
stack build :bench-alloc && stack exec bench-alloc
(runtime)
stack build :bench-runtime && stack exec bench-runtime
Matrix-matrix multiplication
Library |
n = 10 |
n = 50 |
n = 100 |
DLA |
2.65 us |
289.0 us |
2.24 ms |
Hmatrix |
1.32 us |
55.8 us |
292.0 us |
NumHask |
714.0 us |
63.5 ms |
593.0 ms |
Massiv |
12.0 us |
205.0 us |
1.52 ms |
Massiv (Par) |
76.1 us |
220.0 us |
866.0 us |
Matrix |
12.6 us |
1.1 ms |
8.44 ms |
Naive C |
51 us |
323 us |
4.78 ms |
Repeated matrix-matrix multiplication
Library |
n = 10 |
n = 50 |
n = 100 |
DLA |
8.25 us |
852.0 us |
6.92 ms |
Hmatrix |
5.41 us |
170.0 us |
889.0 us |
NumHask |
1.46 ms |
152.0 ms |
1.42 s |
Massiv |
38.9 us |
629.0 us |
4.48 ms |
Massiv (Par) |
358.0 us |
816.0 us |
2.8 ms |
Matrix-vector multiplication
Library |
n = 10 |
n = 50 |
n = 100 |
DLA |
302.0 ns |
4.12 us |
16.1 us |
Hmatrix |
706.0 ns |
2.27 us |
11.1 us |
Library |
n = 10 |
n = 50 |
n = 100 |
DLA |
3.32 us |
233.0 us |
1.7 ms |
Hmatrix |
94.0 us |
6.62 ms |
60.3 ms |
Library |
n = 10 |
n = 50 |
n = 100 |
DLA |
330.0 ns |
8.46 us |
25.9 us |
Hmatrix |
24.9 ns |
24.4 ns |
17.6 ns |
NumHask |
309.0 ns |
7.29 us |
28.1 us |
Massiv |
7.29 us |
35.7 us |
122.0 us |
Matrix |
4.58 us |
130.0 us |
699.0 us |
Library |
n = 10 |
n = 50 |
n = 100 |
DLA |
189.0 ns |
4.15 us |
16.8 us |
Hmatrix |
285.0 ns |
1.15 us |
4.32 us |
NumHask |
40.3 us |
1.79 ms |
9.72 ms |
Massiv |
128.0 ns |
3.3 us |
13.0 us |
Naive C |
350 ns |
12.65 us |
40.96 μs |
Library |
n = 10 |
n = 50 |
n = 100 |
DLA |
26.4 ns |
19.5 ns |
19.5 ns |
Hmatrix |
1.43 us |
1.63 us |
1.7 us |
NumHask |
39.4 ns |
170.0 ns |
305.0 ns |
Massiv |
3.97 us |
5.08 us |
4.74 us |
Matrix |
40.9 ns |
167.0 ns |
310.0 ns |
Library |
n = 10 |
n = 50 |
n = 100 |
DLA |
61.0 ns |
279.0 ns |
295.0 ns |
Hmatrix |
1.43 us |
1.7 us |
1.84 us |
NumHask |
221.0 ns |
1.04 us |
2.38 us |
Massiv |
4.63 us |
5.04 us |
5.04 us |
Matrix |
350.0 ns |
1.59 us |
3.09 us |
Library |
n = 10 |
n = 50 |
n = 100 |
DLA |
157.0 ns |
4.75 us |
11.2 us |
Hmatrix |
2.31 us |
34.5 us |
132.0 us |
Matrix |
2.94 us |
65.9 us |
492.0 us |
Library |
n = 10 |
n = 50 |
n = 100 |
DLA |
124.0 ns |
5.04 us |
11.2 us |
Hmatrix |
2.15 us |
33.7 us |
132.0 us |
Matrix-matrix multiplication
Library |
n = 10 |
n = 50 |
n = 100 |
DLA |
976 |
20,176 |
80,176 |
hmatrix |
904 |
20,936 |
80,936 |
NumHask |
1,691,432 |
179,093,816 |
1,400,273,872 |
Massiv |
5,816 |
140,216 |
560,216 |
Matrix |
18,160 |
392,288 |
1,544,056 |
Library |
n = 10 |
n = 50 |
n = 100 |
DLA |
1,848 |
40,248 |
160,248 |
hmatrix |
201,192 |
9,074,048 |
67,457,120 |
Library |
n = 10 |
n = 50 |
n = 100 |
DLA |
880 |
20,080 |
80,080 |
hmatrix |
64 |
64 |
64 |
NumHask |
0 |
0 |
0 |
Massiv |
872 |
20,072 |
80,072 |
Matrix |
9,840 |
239,952 |
959,664 |
Library |
n = 10 |
n = 50 |
n = 100 |
DLA |
16 |
16 |
16 |
hmatrix |
232 |
232 |
232 |
NumHask |
146,800 |
2,919,552 |
11,641,752 |
Massiv |
16 |
16 |
16 |
Library |
n = 10 |
n = 50 |
n = 100 |
DLA |
64 |
64 |
64 |
hmatrix |
2,128 |
2,128 |
2,128 |
NumHask |
256 |
256 |
256 |
Massiv |
144 |
464 |
864 |
Matrix |
896 |
20,112 |
80,168 |
Library |
n = 10 |
n = 50 |
n = 100 |
DLA |
160 |
480 |
880 |
hmatrix |
2,128 |
2,128 |
2,128 |
NumHask |
800 |
2,720 |
5,120 |
Matrix |
1,648 |
23,744 |
87,400 |
Library |
n = 10 |
n = 50 |
n = 100 |
DLA |
1,008 |
20,528 |
80,928 |
hmatrix |
3,208 |
66,440 |
252,440 |
Matrix |
5,752 |
139,848 |
559,504 |
Relevant details:
- the implementations of the "naive C" parts can be found in
/naive
. They were compiled with -O3
- the
massiv
benchmarks use the Primitive
representation, which seems to be the fastest among what massiv offers
- the benchmarked functions from DLA are taken from the
Fast
module when available
- the
norm
function is called on n*n
vectors
- instead of relying on hackage, the project's dependencies fetch the libraries directly from github (see
stack.yaml
).
Formely included:
bed-and-breakfast
(abandoned because too slow)
matrices
(abandoned because too slow, also see kaizhang/matrices#8)
TODO:
- have a cleaner/more abstract interface for the benches