Comments (34)
from sse2.
If I understood you, you have moved some of the code base to AVX2 but are not planning on publishing that source code. But you may make an AVX version of the bmx procedure available to the public. Is that right ?
"We" is me and a dev who I've asked to help me because he has some SIMD experience. We have been using boost.simd to do some benchmarking.
The GPL could be a problem as I want to develop a commercial application (for engineering). There is no problem sharing changes that we might make to the bmx proc but the GPL would require releasing all the code it is linked with and that is problematic.
The app would run on industry server farms so managing different SIMD implementations/generations is an issue. We were thinking of using gcc intrinsics for this. One idea would be to map bmx to intrinsics. You did not want to use intrinsics ?
from sse2.
from sse2.
Only targetting x86 at this stage.
I tried compiling on Ubuntu 16.04:
cc -g -MMD -fPIC -pthread -fdiagnostics-show-option -fno-strict-aliasing -fstack-protector --param ssp-buffer-size=4 -Wall -Werror -Wextra -Wcast-align -Wcast-qual -Wformat=2 -Wformat-security -Wmissing-prototypes -Wnested-externs -Wpointer-arith -Wshadow -Wstrict-prototypes -Wunused -Wwrite-strings -Wno-attributes -Wno-cast-qual -Wno-error -Wno-unknown-pragmas -Wno-unused-parameter -O3 -I/usr/local/include -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -I. -c -o sseutil.o sseutil.c
sseutil.c:1:18: fatal error: plat.h: No such file or directory
Is that file missing from the repo ?
from sse2.
from sse2.
from sse2.
Also missing msutil.h and sock.h
I ran make
then make test
which gives:
make test
cc -pthread -L/usr/local/lib ssebmx_t.o libsse.a tap.o bitmat.o -lstdc++ -lm -o ssebmx_t
bitmat.o: In function `bitmat_trans':
/home/propacov/shared/proto/i8051-07/src/tests/primitives/sse2/bitmat.c:80: undefined reference to `ssebmx'
/home/propacov/shared/proto/i8051-07/src/tests/primitives/sse2/bitmat.c:80: undefined reference to `ssebmx_m'
collect2: error: ld returned 1 exit status
<builtin>: recipe for target 'ssebmx_t' failed
make: *** [ssebmx_t] Error 1
from sse2.
from sse2.
No problem, it is worth the effort if we can use the code. I resolved the missing files (downloaded the 3 headers from your utils package). But ran into the compile error reported in my previous message. Can you get the bmx test running ? The GNUMakefile and rules are new to me so not so easy to quickly understand where the issue is. Thanks.
from sse2.
from sse2.
from sse2.
from sse2.
Hi,
I don't see updates to the repo, are you using attachments with these messages ? I don't think I can access those.
I'm in France. I imagine the user of our software will have AVX512 boxes. But I don't have a server farm. I plan to do testing on cloud infrastructure e.g. AWS.
from sse2.
With 256 or 512bit registers does the optimal size of the bit matrix for transposition change ?
from sse2.
from sse2.
from sse2.
If you like you could upload to https://expirebox.com/ it is very simple, no login, provides a link to the file (which gets deleted after 48hrs).
from sse2.
For bmx, does AVX provide improved instructions or is the only benefit larger registers ?
from sse2.
from sse2.
from sse2.
I have a hard time understanding why the CPU don't provide native support for a bitwise transpose, it seems such a fundamental building block. Do you see why that hasn't happened ?
The zip ran fine on my machine, I only tried ssebmx_t (I'm using an Intel Core i5 on my laptop). Thanks!
Have you tried benchmarking between clang and gcc ? I was surprised to see how much better clang-3.8 was than gcc-6.2 on some auto-vectorization test cases, seemed to make better use of the ymm/xmm registers.
Yeah lucky to be in France, so much come down to luck...
from sse2.
from sse2.
Non, néo-zélandais, beaucoup de chance la aussi!
from sse2.
This is a bit of a diverging thread but I hesitate to create new issues for questions. The bmx is 16x8 and I am wondering, if we are targeting a size of 256 x W (where W is typically less than 512). Are there changes to the algorithm that could match up with the initial row count of 256 and improve performance ? Or is it best to just break that up into 16x8 chunks. Thanks.
from sse2.
from sse2.
Nice idea with INP and OUT. I would hope that the hardware could prefetch but in any case memory will be the bottleneck. It is premature to optimise now. I will be late next week before I can do profiling and the current bmx may be plenty enough.
The application is analysing decompressed trace files from digital circuit simulation. One dimension of the matrix is time/cycles and the another dimension inputs. The matrix can be quite big (e.g. GBs).
from sse2.
from sse2.
Could __builtin_prefetch be a big help with that ? If the gather/scatter work on a block that fits in L1...
I should probably mention that we are looking to transpose blocks (kBs) not the entire matrix (potentially GBs). So the scatter can be limited.
from sse2.
from sse2.
from sse2.
Hi, thanks! Can you please upload it to github or https://expirebox.com
from sse2.
from sse2.
Hi, we ran some benchmarking and got slightly better results with code based on http://stackoverflow.com/questions/41778362/how-to-efficiently-transpose-a-2d-bit-matrix targetting a 64x64 matrix. It was surprising. 940.423 MB/s vs 747.659 MB/s and AVX2 was actually slower at 400.961 MB/s Thanks for your support!
from sse2.
from sse2.
Related Issues (3)
- No ‘intcmp’ available HOT 3
- plat.h: no such file HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sse2.