memcpy_sse's People
memcpy_sse's Issues
Wrong loop counters
I'm sure you'll catch this one, but the copy loop is unrolled 8 times so you need to step the pointers by 1024 bytes and not 128. Should be a nice speedup!
With this change memcpy_sse is almost twice as fast as memcpy (gcc 5.4 on Linux/Nehalem). FWIW, this CPU doesn't exhibit and performance change between x32 and x64.
This is not bug free.
This is not bug free.
Line 21 in 7647493
When low 4 bits destination and source addresses are different the function will use _mm_load_si128 to read unaligned memory at the later loop at line 33.
When destination address is unaligned and size is < 16 bytes function will copy 16 bytes and if the pad is > size the size will wrap to 32 ^ 2 - (size - pad) on 32 bit systems and to 64 ^ 2 - (size - pad) on 64.
Can't compile with the commands provided in the readme
I tried to compile the code, with the following command, as provided in the readme:
gcc -march=sse3 -O3 -m32 testmem_modified.c -o tm32
but i get this error:
cc1: error: bad value (‘sse3’) for ‘-march=’ switch
cc1: note: valid arguments to ‘-march=’ switch are: i386 i486 i586 pentium lakemont pentium-mmx winchip-c6 winchip2 c3 samuel-2 c3-2 nehemiah c7 esther i686 pentiumpro pentium2 pentium3 pentium3m pentium-m pentium4 pentium4m prescott nocona core2 nehalem corei7 westmere sandybridge corei7-avx ivybridge core-avx-i haswell core-avx2 broadwell skylake skylake-avx512 cannonlake icelake-client icelake-server bonnell atom silvermont slm knl knm geode k6 k6-2 k6-3 athlon athlon-tbird athlon-4 athlon-xp athlon-mp x86-64 eden-x2 nano nano-1000 nano-2000 nano-3000 nano-x2 eden-x4 nano-x4 k8 k8-sse3 opteron opteron-sse3 athlon64 athlon64-sse3 athlon-fx amdfam10 barcelona bdver1 bdver2 bdver3 bdver4 znver1 btver1 btver2 native
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.