clearlinux / clr-avx-tools Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
Hi,
I am here to bother you know :)
I added some code to debug the likely duplicate counting, given the position I patched right, if I am wrong, please correct me.
I cloned the code from the git repo and make install .
patched avxjudge.py with following debug.patch
/usr/share/clr-avx-tools # diff avxjudge.py avxjudge.py.patched
184a185,187
sse_avx2_duplicate_cnt = 0
avx2_avx512_duplicate_cnt = 0
191a195,196
global sse_avx2_duplicate_cnt global avx2_avx512_duplicate_cnt
235a241,248
if sse_score >=0.0 and avx2_score >= 0.0: sse_avx2_duplicate_cnt +=1 print("duplicate count for sse & avx2 ?", ins, arg,
sse_avx2_duplicate_cnt)
if avx512_score >= 0.0 and avx2_score >= 0.0: avx2_avx512_duplicate_cnt +=1 print("duplicate count for avx2 & avx512 ?", ins, arg,
avx2_avx512_duplicate_cnt)
262a276,277print("File duplicate count of sse&avx2", sse_avx2_duplicate_cnt,
", duplicate count of avx2&avx512", avx2_avx512_duplicate_cnt);
score the /usr/lib64/avxstatus/openblas_clr/libopenblas_skylakexp-r0.3.3.so with patched avxjudge.py.patched
we can see following output
โฆ..
duplicate count for sse & avx2 ? vfmadd132sd 0x15473f(%rip),%xmm3,%xmm1 66799
duplicate count for sse & avx2 ? vextractf64x2 $0x1,%ymm0,%xmm1 66800 duplicate count for avx2 & avx512 ? vextractf64x4 $0x1,%zmm0,%ymm0 11497 duplicate count for sse & avx2 ? vextractf64x2 $0x1,%ymm0,%xmm0 66801 duplicate count for sse & avx2 ? vextractf64x2 $0x1,%ymm0,%xmm1 66802 duplicate count for avx2 & avx512 ? vextractf64x4 $0x1,%zmm0,%ymm0 11498 duplicate count for sse & avx2 ? vextractf64x2 $0x1,%ymm0,%xmm0 66803 duplicate count for sse & avx2 ? vextractf64x2 $0x1,%ymm0,%xmm1 66804 duplicate count for avx2 & avx512 ? vextractf64x4 $0x1,%zmm0,%ymm0 11499 duplicate count for sse & avx2 ? vextractf64x2 $0x1,%ymm0,%xmm0 66805 duplicate count for sse & avx2 ? vextractf128 $0x1,%ymm0,%xmm1 66806 duplicate count for avx2 & avx512 ? vextractf32x8 $0x1,%zmm0,%ymm0 11500 duplicate count for sse & avx2 ? vextractf128 $0x1,%ymm0,%xmm0 66807 duplicate count for sse & avx2 ? vextractf128 $0x1,%ymm0,%xmm1 66808 duplicate count for avx2 & avx512 ? vextractf32x8 $0x1,%zmm0,%ymm0 11501 duplicate count for sse & avx2 ? vextractf128 $0x1,%ymm0,%xmm0 66809 duplicate count for sse & avx2 ? vextractf128 $0x1,%ymm0,%xmm1 66810 duplicate count for avx2 & avx512 ? vextractf32x8 $0x1,%zmm0,%ymm0 11502 duplicate count for sse & avx2 ? vextractf128 $0x1,%ymm0,%xmm0 66811
duplicate count for sse & avx2 ? vfmadd231ss 0x436(%rip),%xmm1,%xmm0 66812
duplicate count for sse & avx2 ? vfmadd231sd 0x5e7(%rip),%xmm1,%xmm0 66813
Top SSE functions by instruction count
sgetrf_single@@base 87.5 %s
slaed6_@@base 86.79 %s
dlaed6_@@base 86.79 %s
slasd6_@@base 84.89 %s
dlasd6_@@base 84.89 %s
Top SSE functions by value
clarfy_@@base 2076.4
cgemm3m_oncopyr@@base 1504.61
cgemm3m_oncopyi@@base 1459.66
csymm3m_iucopyb@@base 1459.66
zgemm_kernel_b@@base 1389.3
Top AVX2 functions by instruction count
zgemm_incopy@@base 49.48 %s
zgemm_kernel_b@@base 49.24 %s
cgemm_incopy@@base 49.15 %s
zgemm_kernel_r@@base 49.11 %s
zgemm_kernel_l@@base 49.11 %s
Top AVX2 functions by value
cgemm_kernel_b@@base 3439.4
cgemm_incopy@@base 3439.4
cgemm_kernel_r@@base 3439.16
cgemm_kernel_l@@base 3439.16
zgemm_kernel_b@@base 3190.86
Top AVX512 functions by instruction count
zgemm_itcopy@@base 57.67 %s
zgemm3m_oncopyr@@base 52.59 %s
zgemm3m_oncopyi@@base 51.9 %s
zsymm3m_iucopyb@@base 51.73 %s
cgemm_otcopy@@base 43.35 %s
Top AVX512 functions by value
zgemm3m_oncopyr@@base 1952.7
cgemm3m_oncopyr@@base 1703.05
zgemm3m_oncopyi@@base 1389.69
zsymm3m_iucopyb@@base 1389.54
csymm3m_iucopyb@@base 1251.03
File total (SSE): 542451 instructions with score 126060 File total (AVX2): 114279 instructions with score 112833 File total (AVX512): 80538 instructions with score 30653
File duplicate count of sse&avx2 66813 , duplicate count of avx2&avx512 11502
Please take a look.
The debug.patch and the output log are attached.
Thanks,
Ethan
duplicate_count.log.tar.gz
Found script takes following jmp/callq instructions as AVX2
while doing libopenblas_nehalemp-r0.3.3.so with '-d' option
...
AVX2 instruction ? jmp 39092d <zsymm3m_olcopyi@@Base+0x1f8d>
AVX2 instruction ? jl 39092a <zsymm3m_olcopyi@@Base+0x1f8a>
AVX2 instruction ? jne 3909d8 <zsymm3m_olcopyi@@Base+0x2038>
AVX2 instruction ? jg 390a20 <zsymm3m_olcopyi@@Base+0x2080>
AVX2 instruction ? jmpq 38ef4e <zsymm3m_olcopyi@@Base+0x5ae>
AVX2 instruction ? callq 857c0 <ssymm_@plt>
AVX2 instruction ? je 390aa1 <zsymm3m_olcopyi@@Base+0x2101>
...
File total (SSE): 555391 instructions with score 208750
File total (AVX2): 3834 instructions with score 38 <--false report
File total (AVX512): 0 instructions with score 0
Will file another PR with two patches to fix it.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.