GithubHelp home page GithubHelp logo

libmir / mir-glas Goto Github PK

View Code? Open in Web Editor NEW
102.0 13.0 10.0 2.53 MB

[Experimental] LLVM-accelerated Generic Linear Algebra Subprograms

License: Other

D 89.52% Shell 1.56% C 8.55% Makefile 0.37%
blas glas linear-algebra-subprograms algebra matrix-multiplication matrix lapack simd

mir-glas's Introduction

Dub downloads License Gitter

Latest version

Circle CI Build Status

Benchmarks

glas

LLVM-accelerated Generic Linear Algebra Subprograms (GLAS)

Description

GLAS is a C library written in Dlang. No C++/D runtime is required but libc, which is available everywhere.

The library provides

  1. BLAS (Basic Linear Algebra Subprograms) API.
  2. GLAS (Generic Linear Algebra Subprograms) API.

CBLAS API can be provided by linking with Netlib's CBLAS library.

dub

GLAS can be used with DMD and LDC but LDC (LLVM D Compiler) >= 1.1.0 beta 6 should be installed in common path anyway.

Note performance issue #18.

GLAS can be included automatically in a project using dub (the D package manager). DUB will build GLAS and CPUID manually with LDC.

{
   ...
   "dependencies": {
      "mir-glas": "~><current_mir-glas_version>",
      "mir-cpuid": "~><current_mir-cpuid_version>"
   },
   "lflags": ["-L$MIR_GLAS_PACKAGE_DIR", "-L$MIR_CPUID_PACKAGE_DIR"]
}

$MIR_GLAS_PACKAGE_DIR and $MIR_CPUID_PACKAGE_DIR will be replaced automatically by DUB to appropriate directories.

Usage

mir-glas can be used like a common C library. It should be linked with mir-cpuid. A compiler, for example GCC, may require mir-cpuid to be passed after mir-glas: -lmir-glas -lmir-cpuid.

GLAS API

GLAS API is based on the new ndslice from mir-algorithm. Other languages can use simple structure definition. Examples are available for C and for Dlang.

Headers

C/C++ headers are located in include/. D headers are located in source/.

There are two files:

  1. glas/fortran.h / glas/fortran.d - for Netilb's BLAS API
  2. glas/ndslice.h / glas/ndslice.d - for GLAS API

Manual Compilation

Compiler installation

LDC (LLVM D Compiler) >= 1.1.0 beta 6 is required to build a project. You may want to build LDC from source or use LDC 1.1.0 beta 6. Beta 2 generates a lot of warnings that can be ignored. Beta 3 is not supported.

LDC binaries contains two compilers: ldc2 and ldmd2. It is recommended to use ldmd2 with mir-glas.

Recent LDC packages come with the dub package manager. dub is used to build the project.

Mir CPUID

Mir CPUID is CPU Identification Routines.

Download mir-cpuid

dub fetch mir-cpuid --cache=local

Change the directory

cd mir-cpuid-<current-mir-cpuid-version>/mir-cpuid

Build mir-cpuid

dub build --build=release-nobounds --compiler=ldmd2 --build-mode=singleFile --parallel --force

You may need to add --arch=x86_64, if you use windows.

Copy libmir-cpuid.a to your project or add its directory to the library path.

Mir GLAS

Download mir-glas

dub fetch mir-glas --cache=local

Change the directory

cd mir-glas-<current-mir-glas-version>/mir-glas

Build mir-glas

dub build --config=static --build=target-native --compiler=ldmd2 --build-mode=singleFile --parallel --force

You may need to add --arch=x86_64 if you use windows.

Copy libmir-glas.a to your project or add its directory to the library path.

Status

We are open for contributing! The hardest part (GEMM) is already implemented.

  • CI testing with Netlib's BLAS test suite.
  • CI testing with Netlib's CBLAS test suite.
  • CI testing with Netlib's LAPACK test suite.
  • CI testing with Netlib's LAPACKE test suite.
  • Multi-threading
  • GPU back-end
  • Shared library support - requires only DUB configuration fixes.
  • Level 3 - matrix-matrix operations
    • GEMM - matrix matrix multiply
    • SYMM - symmetric matrix matrix multiply
    • HEMM - hermitian matrix matrix multiply
    • SYRK - symmetric rank-k update to a matrix
    • HERK - hermitian rank-k update to a matrix
    • SYR2K - symmetric rank-2k update to a matrix
    • HER2K - hermitian rank-2k update to a matrix
    • TRMM - triangular matrix matrix multiply
    • TRSM - solving triangular matrix with multiple right hand sides
  • Level 2 - matrix-vector operations
    • GEMV - matrix vector multiply
    • GBMV - banded matrix vector multiply
    • HEMV - hermitian matrix vector multiply
    • HBMV - hermitian banded matrix vector multiply
    • HPMV - hermitian packed matrix vector multiply
    • TRMV - triangular matrix vector multiply
    • TBMV - triangular banded matrix vector multiply
    • TPMV - triangular packed matrix vector multiply
    • TRSV - solving triangular matrix problems
    • TBSV - solving triangular banded matrix problems
    • TPSV - solving triangular packed matrix problems
    • GERU - performs the rank 1 operation A := alpha*x*y' + A
    • GERC - performs the rank 1 operation A := alpha*x*conjg( y' ) + A
    • HER - hermitian rank 1 operation A := alpha*x*conjg(x') + A
    • HPR - hermitian packed rank 1 operation A := alpha*x*conjg( x' ) + A
    • HER2 - hermitian rank 2 operation
    • HPR2 - hermitian packed rank 2 operation
  • Level 1 - vector-vector and scalar operations. Note: Mir already provides generic implementation.
    • ROTG - setup Givens rotation
    • ROTMG - setup modified Givens rotation
    • ROT - apply Givens rotation
    • ROTM - apply modified Givens rotation
    • SWAP - swap x and y
    • SCAL - x = a*x. Note: requires addition optimization for complex numbers.
    • COPY - copy x into y
    • AXPY - y = a*x + y. Note: requires addition optimization for complex numbers.
    • DOT - dot product
    • DOTU - dot product. Note: requires addition optimization for complex numbers.
    • DOTC - dot product, conjugating the first vector. Note: requires addition optimization for complex numbers.
    • DSDOT - dot product with extended precision accumulation and result
    • SDSDOT - dot product with extended precision accumulation
    • NRM2 - Euclidean norm
    • ASUM - sum of absolute values
    • IAMAX - index of max abs value

Porting to a new target

Five steps

  1. Implement cpuid_init function for mir-cpuid. This function should be implemented per platform or OS. Already implemented targets are
    • x86, any OS
    • x86_64, any OS
  2. Verify that source/glas/internal/memory.d contains an implementation for the OS. Already implemented targets are
    • Posix (Linux, macOS, and others)
    • Windows
  3. Add new configuration for register blocking to source/glas/internal/config.d. Already implemented configuration available for
    • x87
    • SSE2
    • AVX / AVX2
    • AVX512 (requires LLVM bug fixes).
  4. Create a Pool Request.
  5. Coordinate with LDC team in case of compiler bugs.

Questions & Answers

Why GLAS is called "Generic ..."?

  1. GLAS has a generic internal implementation, which can be easily ported to any other architecture with minimal efforts (5 minutes).
  2. GLAS API provides more functionality comparing with BLAS.
  3. It is written in Dlang using generic programming.

Why it is better then other BLAS Open Source Libraries like OpenBLAS and Eigen?

  1. GLAS is faster.
  2. GLAS API is more user-friendly and does not require additional data copying.
  3. GLAS does not require C++ runtime comparing with Eigen.
  4. GLAS does not require platform specific optimizations like Eigen intrinsics micro kernels and OpenBLAS assembler macro kernels.
  5. GLAS has a simple implementation, which can be easily ported and extended.

Why GLAS does not have Lazy Evaluation and Aliasing like Eigen?

GLAS is a lower level library than Eigen. For example, GLAS can be an Eigen BLAS back-end in the future Lazy Evaluation and Aliasing can be easily implemented in D. Explicit composition of operations can be done using mir.ndslice.algorithm and multidimensional map from mir.ndslice.topology, which is a generic way to perform any lazy operations you want.

mir-glas's People

Contributors

9il avatar e-y-e avatar marenz avatar n8sh avatar schveiguy avatar thewilsonator avatar wilzbach avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mir-glas's Issues

Code optimization questions/recomendations (SIMD support, LLVM version)

Hello, I am planning to user mir-glas for a project that needs to quickly process a huge amount of data. Binary compatibility across multiple systems is not an issue, so we are free to use SIMD instructions. How is SIMD implemented in mir-glas? Do I need to enable anything special? And while I'm on it, what LLVM is recommended for the most performance? Any other thing I should have in mind?

Thank you,
Filipe

dub not building mir-glas with dependencies

I'm on Windows 7 64bit with dmd 2.74.0, ldc 1.3.0 beta2, dub 1.3.0, and Visual Studio 2017.

The dub header to the mir-glas readme.md suggests that one can include a mir-glas project in a dub project essentially automatically. However, I have not had much success with this.

I created a new dub project with the following dub.json

{
	"name": "testing_mir_glas",
	"authors": [
		"jmh530"
	],
	"dependencies": {
		"mir-algorithm": ">=0.6.6",
		"mir-glas": ">=0.2.3",
		"mir-cpuid": ">=0.5.2"
	},
	"dflags-ldc2": ["-mcpu=native"],
	"lflags": ["-L$MIR_GLAS_PACKAGE_DIR", "-L$MIR_CPUID_PACKAGE_DIR"],
	"description": "A minimal D application.",
	"license": "proprietary"
}

and source\app.d (was going to add mir-glas stuff later)

import std.stdio;
import mir.ndslice.algorithm : sliced;

void main()
{
	auto x = [1, 2, 3, 4, 5].sliced;
	writeln("done.");
}

I tried compiling with dub build, dub build --compiler=ldmd2, and dub build --compiler=ldc2. The result is basically the same

C:\ProgrammingFiles\DFiles\testing\testing_mir_glas>dub build --compiler=ldmd2
The determined compiler type "ldc" doesn't match the expected type "dmd". This w
ill probably result in build errors.
Performing "debug" build using ldmd2 for x86_64.
mir-algorithm 0.6.6: target for configuration "library" is up to date.
mir-cpuid 0.5.2: building configuration "library"...
testing_mir_glas ~master: building configuration "application"...
Running pre-build commands...
The determined compiler type "ldc" doesn't match the expected type "dmd". This w
ill probably result in build errors.

Neither a package description file, nor source/app.d was found in
'C:\ProgrammingFiles\DFiles\dubFolder\mir-glas-0.2.3\'
Please run DUB from the root directory of an existing package, or run
"dub init --help" to get information on creating a new package.

No valid root package found - aborting.
Command failed with exit code 2

[AMD] mir-glas is slower than OpenBLAS for DGEMM

I suceesfully compiled the benchmark gemm_report.d provided by mir-glas. I ran it twice.
One comparing with OpenBLAS and another comparing against ACML-5.3.1.
As you can see from the benchmarks mir-glas does not yield full performance for large matrices.
Peak performance for my machine is about 23 GFLOPs for double precision.
But also ACML does noch achieve full performance.
So I decided to compare with dgemm.goto and dgemm.acml benchmark programs provided in
OpenBLAS/benchmark. Here ACML reaches peak performance too. Is there any overhead calling
ACML from D?
dgemm_bench
print

Blog/Wiki/Documentation on your approach to writing glas and gemm algorithm

Hi,

Your glas library is very interesting, and the benchmark for your gemm is impressive. I think users and potential contributors would benefit greatly from a series of blogs or documentation explaining the design approach you have taken for the whole library and for the gemm algorithm.

One of the issues about D is that the advanced methods, idioms, and techniques are not particularly well popularised as they are in C++. So I think a good set of blogs describing the techniques used in this library would greatly help the community. The same thing extends to other mir libraries but I am more interested in glas.

Thank you

2 x slower with LLVM 4.X - 5.X

LLVM 4.0, avx512f has the same issue as LLVM 4.0, broadwell.

LDC - the LLVM D compiler (1.1.1):
  based on DMD v2.071.2 and LLVM 4.0.0
  built with LDC - the LLVM D compiler (0.17.3)
  Default target: x86_64-apple-darwin16.4.0
  Host CPU: haswell
  http://dlang.org - http://wiki.dlang.org/LDC

vs

LDC - the LLVM D compiler (1.1.1):
  based on DMD v2.071.2 and LLVM 3.9.1
  built with LDC - the LLVM D compiler (1.1.1)
  Default target: x86_64-apple-darwin16.4.0
  Host CPU: haswell
  http://dlang.org - http://wiki.dlang.org/LDC

Link-Error when building gemm_report.d

I updated mir-glas and mir-cpuid as explained in README. Then compiling gemm_report with
dub build --compiler=ldmd2 -b release --single gemm_report.d gives an error:

Compiling ../../../../../.dub/packages/mir-algorithm-0.6.5/mir-algorithm/source/mir/ndslice/topology.d...
Compiling ../../../../../.dub/packages/mir-algorithm-0.6.5/mir-algorithm/source/mir/primitives.d...
Compiling ../../../../../.dub/packages/mir-algorithm-0.6.5/mir-algorithm/source/mir/timeseries.d...
Compiling ../../../../../.dub/packages/mir-algorithm-0.6.5/mir-algorithm/source/mir/utility.d...
Linking...
Linking...
.dub/build/application-release-linux.posix-x86_64-ldc_2072-6A7D442D6DF721FDCC6B5AF6994A4118/gemm_report.o: In function `_Dmain':
../../../../../.dub/packages/mir-algorithm-0.6.5/mir-algorithm/source/mir/utility.d:(.text._Dmain[_Dmain]+0x8d4): undefined reference to `cblas_sgemm'
/home/miguel/Dokumente/DLang/mir-glas-0.2.3/mir-glas//libmir-glas.a(home.miguel.Dokumente.DLang.mir-glas-0.2.3.mir-glas.source.glas.precompiled.context.d.o): In function `glas_init':
../source/glas/precompiled/context.d:(.text.glas_init[glas_init]+0x17): undefined reference to `cpuid_init'
../source/glas/precompiled/context.d:(.text.glas_init[glas_init]+0x1c): undefined reference to `cpuid_dCache'
../source/glas/precompiled/context.d:(.text.glas_init[glas_init]+0x27): undefined reference to `cpuid_uCache'
collect2: Fehler: ld gab 1 als Ende-Status zurück
Error: /usr/bin/gcc failed with status: 1
ldmd2 failed with exit code 1.`

Error compiling gemm_report

I am trying to compile the gemm_report.d in the bench folder. But I got an error:
[mig@antergos-mig bench]$ dub build --compiler=ldmd2 -b release --single gemm_report.d The determined compiler type "ldc" doesn't match the expected type "dmd". This will probably result in build errors. dub.json(12): Error: Expected '}' or ',' - got '"'.

I guess I have to remove the commented out part for the OpenBLAS libs:
`!/usr/bin/env dub
/+ dub.json:
{
"name": "gemm_report",
"dependencies": {"mir-glas": {"path": "../"}, },
"libs": ["blas"],
"lflags": ["-L$MIR_GLAS_PACKAGE_DIR", "-L$MIR_CPUID_PACKAGE_DIR", "-L.."],
"dependencies": {
"cblas": ">1.0.0",
"mir-glas":{
"path": "../"
}
"mir-cpuid": "
>0.4.2",
},
"dflags-ldc": ["-mcpu=native"],
}
+/
"lflags": ["-L/opt/OpenBLAS/lib"],
"libs": ["openblas"],

// Set up your libblas to approporiate version, or just copy it to the benchmarks/glas folder.
// Note: GLAS is single thread for now.
...`

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.