amd / zendnn Goto Github PK

View Code? Open in Web Editor NEW

80.0 80.0 14.0 5.67 MB

License: Other

Makefile 0.13% C 3.19% C++ 94.91% Shell 0.31% CMake 1.23% Batchfile 0.09% Assembly 0.13%

zendnn's People

Contributors

Stargazers

Watchers

Forkers

dongxiao92 kiritigowda neqkir andyluo7 henrywoo gateoverflow aadwived alsrivas apwojcik manu87ds jimw567 amdgani2012 pongstorn-amd

zendnn's Issues

`libblis-mt.so.3` error while installing torch

Hi,

I get this error when installing the torch wheel provided with ZenDNN.

To check the installed version of PT:
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/root/miniconda3/lib/python3.9/site-packages/torch/__init__.py", line 202, in <module>
    from torch._C import *  # noqa: F403
ImportError: libblis-mt.so.3: cannot open shared object file: No such file or directory

I am using a Ubuntu20.04 docker container in which I have installed miniconda.

Background

Output of lscpu

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   43 bits physical, 48 bits virtual
CPU(s):                          48
On-line CPU(s) list:             0-47
Thread(s) per core:              2
Core(s) per socket:              24
Socket(s):                       1
NUMA node(s):                    4
Vendor ID:                       AuthenticAMD
CPU family:                      23
Model:                           8
Model name:                      AMD Ryzen Threadripper 2970WX 24-Core Processor
Stepping:                        2
Frequency boost:                 enabled
CPU MHz:                         2194.713
CPU max MHz:                     3000.0000
CPU min MHz:                     2200.0000
BogoMIPS:                        5987.96
Virtualization:                  AMD-V
L1d cache:                       768 KiB
L1i cache:                       1.5 MiB
L2 cache:                        12 MiB
L3 cache:                        64 MiB
NUMA node0 CPU(s):               0-5,24-29
NUMA node1 CPU(s):               12-17,36-41
NUMA node2 CPU(s):               6-11,30-35
NUMA node3 CPU(s):               18-23,42-47
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prc
                                 tl and seccomp
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user point
                                 er sanitization
Vulnerability Spectre v2:        Mitigation; Full AMD retpoline, IBPB conditional, STI
                                 BP disabled, RSB filling
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
                                 mca cmov pat pse36 clflush mmx fxsr sse sse2 ht sysca
                                 ll nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc 
                                 rep_good nopl nonstop_tsc cpuid extd_apicid amd_dcm a
                                 perfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1
                                  sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_l
                                 m cmp_legacy svm extapic cr8_legacy abm sse4a misalig
                                 nsse 3dnowprefetch osvw skinit wdt tce topoext perfct
                                 r_core perfctr_nb bpext perfctr_llc mwaitx cpb hw_pst
                                 ate sme ssbd sev ibpb vmmcall fsgsbase bmi1 avx2 smep
                                  bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsav
                                 ec xgetbv1 xsaves clzero irperf xsaveerptr arat npt l
                                 brv svm_lock nrip_save tsc_scale vmcb_clean flushbyas
                                 id decodeassists pausefilter pfthreshold avic v_vmsav
                                 e_vmload vgif overflow_recov succor smca

I have extracted PT_v1.9.0_ZenDNN_v3.2_Python_v3.9.zip, aocl-blis-linux-aocc-3.1.0.tar.gz and aocc-compiler-3.2.0.tar in /home/deps folder.
Exported required Environment variables

export ZENDNN_LOG_OPTS=ALL:0
export OMP_NUM_THREADS=48
export OMP_WAIT_POLICY=ACTIVE
export OMP_PROC_BIND=FALSE
export OMP_DYNAMIC=FALSE
export ZENDNN_GIT_ROOT=/home/deps/pyzendnn/PT_v1.9.0_ZenDNN_v3.2_Python_v3.9_2021-12-03T10/ZenDNN
export ZENDNN_PARENT_FOLDER=/home/deps/pyzendnn/PT_v1.9.0_ZenDNN_v3.2_Python_v3.9_2021-12-03T10
export ZENDNN_AOCC_COMP_PATH=/home/deps/aocc-compiler-3.2.0
export ZENDNN_BLIS_PATH=/home/deps/amd-blis
export ZENDNN_PRIMITIVE_CACHE_CAPACITY=1024
export GOMP_CPU_AFFINITY=0-47

Output of echo $LD_LIBRARY_PATH

/home/deps/pyzendnn/PT_v1.9.0_ZenDNN_v3.2_Python_v3.9_2021-12-03T10/ZenDNN/_out/lib/:/home/deps/pyzendnn/PT_v1.9.0_ZenDNN_v3.2_Python_v3.9_2021-12-03T10/ZenDNN/external/googletest/lib:/home/deps/amd-blis/lib/:/home/deps/aocc-compiler-3.2.0/lib:/home/deps/aocc-compiler-3.2.0/lib32:/home/deps/pyzendnn/PT_v1.9.0_ZenDNN_v3.2_Python_v3.9_2021-12-03T10/ZenDNN/_out/lib/:/home/deps/pyzendnn/PT_v1.9.0_ZenDNN_v3.2_Python_v3.9_2021-12-03T10/ZenDNN/external/googletest/lib:/home/deps/amd-blis/lib/:/home/deps/aocc-compiler-3.2.0/lib:/home/deps/aocc-compiler-3.2.0/lib32:

The AMD BLIS lib is in the path, but torch doesn't seem to find it. Any idea how to resolve this? Thanks!

about zenMatMulSplit split strategy

" if m/6 < n/16, there is no benifit in splitting m, as it will
//make more skinny matix sizes."

i find that throughput is higher when no split in case (m/6 < n/16), but latency is higher than split, could you give suggestions about how to balance the throughput with latency ? is there some better split strategy ?

Issue while setting up ZenDNN in conda container

Hi,

I have cloned the ZenDNN repo into conda docker image and built from source by following the steps mentioned in Readme.
Able to setup ZenDNN successfully but while running the benchmark scripts I am getting errors.
Please let me know if we can use conda docker image for ZenDNN.

TIA,
Lavanya

Support for Pytorch 2.x?

When is this planned?

Thanks!

DLRM suggest benchmark

Hi,
I read about your doc and the example of benchmark only has CNN related model.
And what benchmark tool do you suggest to use with DLRM and running on AMD platform using ZenDNN?

Thanks

Can I build PT_v1.12_ZenDNN_v4.0_source_code to obtain a pytorch with both zendnn cpu capability and cuda GPU capability at the same time?

In Instructions_to_setup_PyTorch_build.txt:
5. Create and activate a conda environment and install the following dependencies
conda install ninja pyyaml cmake cffi typing_extensions future six requests dataclasses astunparse setuptools
conda install cpuonly -c pytorch

why should execute conda install a cpuonly -c pytorch while we actually are building a pytorch from source? Also, can we build a pytorch with both zendnn cpu capability and cuda CPU capability at the same time using "PT_v1.12_ZenDNN_v4.0_source_code"?

Clarification on ZenDNN tuning for tensorflow

We are refering the below ZenDNN tuning guide for Tensorflow models.
Tuning guide

In the section 9.2, For Resnet50 it is recommended to use below settings.

export TF_ENABLE_ZENDNN_OPTS=0
export ZENDNN_CONV_ALGO=3
export ZENDNN_TF_CONV_ADD_FUSION_SAFE=1
export ZENDNN_TENSOR_POOL_LIMIT=512
export OMP_NUM_THREADS=96
export GOMP_CPU_AFFINITY=0-95

My question is regarding TF_ENABLE_ZENDNN_OPTS=0 setting which is ment to disable ZenDNN. Can you clarify on this if it is expexted to disable ZenDNN for Tensorflow models.

Support Pytorch with python v3.11

As zendnn 4.1 supports pytorch v1.13 with python v3.7-v3.10, I would like to know if there are any plans to support python v3.11?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs

Jooble