GithubHelp home page GithubHelp logo

armcl-pipe-all's Introduction

ARMCL-PipeALL

Pipe-All is an integrated high-throughput CPU-GPU CNN inference pipeline design for ARM-based Heterogeneous Multi-Processors System-on-Chips (HMPSoCs).

Publication

Details of Pipe-ALL can be found in our paper (pre-print), and please consider citing this paper in your work if you find this implementation useful in your research.

Ehsan Aghapour, Gayathri Ananthanarayanan, and Anuj Pathania. "Integrated ARM big. Little-Mali Pipeline for High-Throughput CNN Inference." TechRxiv.

TechRxiv

Run the Pipe-all

We create the pipe-all for AlexNet, GoogleNet, MobileNet, ResNet50, and SqueezeNet graphs. The pipe-all versions of these graphs are graph_AlexNet_all_pipe_sync, graph_GoogleNet_all_pipe_sync, graph_MobileNet_all_pipe_sync, graph_ResNet50_all_pipe_sync, and graph_SqueezeNet_all_pipe_sync respectively.

git clone https://github.com/Ehsan-aghapour/ARMCL-PipeALL.git -b pipe-all

After compiling the source code and preparing the libraries based on your platform run the following command:

./graph_AlexNet_all_pip_sync --threads=4 --threads2=2 --total_cores=6 --partition_point=3 --partition_point2=5 --order=G-L-B --n=60 --image=data_dir/images/ --data=data_dir/ --labels=data_dir/label.txt

--threads: Number of threads for Big cluster.
--threads2: Number of threads for little cluster.
--total_cores: Number of all cores of CPU.
--partition_point: The first partitioning point. First partitioning will happen after layer specified with this argument.
--partition_point2: The second partitioning point. second partitioning will happen after layer specified with this argument.
--order: The order of components in the pipeline. (G:GPU, B:CPU Big cluster, L:CPU Little cluster). For example G-B-L order means first subgraph runs in GPU, Second subgraph runs in CPU Big cluster and third subgraph runs in CPU little cluster.
--n: Number or runs. For example 60 means running graph for 60 frames.

The following image, data and lablels should specified if you want to run the graph for real data. But if you want to run the network for dummy data (random data and image) do not specify this arguments:
--image: dir which include image files. graph will run for images inside this dir.
--data: dir of graph parameters.
--labels: label file



The following parts explain compiling and running ARMCL for android and linux platforms.

Compiling for Android

First it is required to prepare cross compile tools to compile source code in linux system for android target. Here is the steps to download and settup tools.

1- Download Android NDK:
https://developer.android.com/ndk/downloads

2- We should create a standalone toolchains for compiling source code for android. Based on your platform set --arch to arm or arm64 in the following command. $corss-compile-dir is your arbitrary dir at which cross compile toolchains will be created.

$NDK/build/tools/make_standalone_toolchain.py --arch arm/arm64 --api 23 --stl gnustl --install-dir $cross_compile_dir

This command create cross compile toolchains at $cross-compile-dir.

3- Add $cross-compile-dir/bin to the path:
export PATH=$cross-compile-dir/bin/:$PATH

4- Go to the ARMCL source dir (cd $ARMCL-source-dir) and use the following command to compile it. Based on your platform set arch to armv7a or arm64-v8a in this command.
CXX=clang++ CC=clang scons Werror=0 debug=0 asserts=0 neon=1 opencl=1 os=android arch=armv7a/arm64-v8a -j8

Compiling for linux

For cross compiling the source code in linux host for linux host you require:
gcc-linaro-6.3.1-2017.05-x86_64_arm-linux-gnueabihf for 32 bit target
gcc-linaro-6.3.1-2017.05-x86_64_aarch64-linux-gnu for 64 bit target

Then use the following command to compile. Based on your platform set arch to armv7a or arm64-v8a in this command.
scons Werror=0 -j16 debug=0 asserts=0 neon=1 opencl=1 os=linux arch=armv7a/arm64-v8a

Running in Android

For android it is required to specify the path of libOpenCL.so. First copy this library into an arbitrary dir ($lib_dir) and set LD_LIBRARY_PATH to this dir:
cp /system/lib64/egl/libGLES_mali.so $lib_dir/libOpenCL.so
export LD_LIBRARY_PATH=$lib_dir

Now it is ready to run built binaries in build dir of ARMCL.

For AlexNet there is a zip file of model paramters, sample images and a label file. So you can run this graph for real data and see the results. For this purpose first download this zip file at:
https://developer.arm.com/-/media/Arm%20Developer%20Community/Images/Tutorial%20Guide%20Diagrams%20and%20Screenshots/Machine%20Learning/Running%20AlexNet%20on%20Pi%20with%20Compute%20Library/compute_library_alexnet.zip?revision=c1a232fa-f328-451f-9bd6-250b83511e01&la=en&hash=7371AEC619F8192A9DE3E42FE6D9D18B5119E30C

make a directory and extract this zip file:
mkdir $assets_alexnet
unzip compute_library_alexnet.zip -d $assets_alexnet

Run the AlexNet graph with this command. Select NEON or CL to run it on CPU or GPU respectively:
./build/examples/graph_alexnet Neon/CL $assets_alexnet $assets_alexnet/go_kart.ppm $assets_alexnet/labels.txt

Running in Linux

For linux in addition to libOpencL.so, these three libraries should be copied into target. So first copy these libraries from the ARMCL dir:
cp build/libarm_compute.so build/libarm_compute_core.so build/libarm_compute_graph.so $lib_dir
Then copy libOpenCL.so into $lib_dir and set LD_LIBRARY_PATH to them:
cp /system/lib64/egl/libGLES_mali.so $lib_dir/libOpenCL.so
export LD_LIBRARY_PATH=$lib_dir

Now it is ready to run built binaries in build dir of ARMCL.

For AlexNet there is a zip file of model paramters, sample images and a label file. So you can run this graph for real data and see the results. For this purpose first download this zip file at:
https://developer.arm.com/-/media/Arm%20Developer%20Community/Images/Tutorial%20Guide%20Diagrams%20and%20Screenshots/Machine%20Learning/Running%20AlexNet%20on%20Pi%20with%20Compute%20Library/compute_library_alexnet.zip?revision=c1a232fa-f328-451f-9bd6-250b83511e01&la=en&hash=7371AEC619F8192A9DE3E42FE6D9D18B5119E30C

make a directory and extract this zip file:
mkdir $assets_alexnet
unzip compute_library_alexnet.zip -d $assets_alexnet

Run the AlexNet graph with this command. Select NEON or CL to run it on CPU or GPU respectively:
./build/examples/graph_alexnet Neon/CL $assets_alexnet $assets_alexnet/go_kart.ppm $assets_alexnet/labels.txt





Release repository: https://github.com/arm-software/ComputeLibrary

Development repository: https://review.mlplatform.org/#/admin/projects/ml/ComputeLibrary

Please report issues here: https://github.com/ARM-software/ComputeLibrary/issues

Make sure you are using the latest version of the library before opening an issue. Thanks

News:

Related projects:

Tutorials:

Documentation (API, changelogs, build guide, contribution guide, errata, etc.) available at https://github.com/ARM-software/ComputeLibrary/wiki/Documentation.

Binaries available at https://github.com/ARM-software/ComputeLibrary/releases.

Supported Architectures/Technologies

  • Arm® CPUs:

    • Arm® Cortex®-A processor family using Arm® Neon™ technology
    • Arm® Cortex®-R processor family with Armv8-R AArch64 architecture using Arm® Neon™ technology
    • Arm® Cortex®-X1 processor using Arm® Neon™ technology
  • Arm® Mali™ GPUs:

    • Arm® Mali™-G processor family
    • Arm® Mali™-T processor family
  • x86

Supported OS

  • Android™
  • Bare Metal
  • Linux®
  • macOS®
  • Tizen™

License and Contributions

The software is provided under MIT license. Contributions to this project are accepted under the same license.

Public mailing list

For technical discussion, the ComputeLibrary project has a public mailing list: [email protected] The list is open to anyone inside or outside of Arm to self subscribe. In order to subscribe, please visit the following website: https://lists.linaro.org/mailman/listinfo/acl-dev

Developer Certificate of Origin (DCO)

Before the ComputeLibrary project accepts your contribution, you need to certify its origin and give us your permission. To manage this process we use the Developer Certificate of Origin (DCO) V1.1 (https://developercertificate.org/)

To indicate that you agree to the the terms of the DCO, you "sign off" your contribution by adding a line with your name and e-mail address to every git commit message:

Signed-off-by: John Doe <[email protected]>

You must use your real name, no pseudonyms or anonymous contributions are accepted.

Trademarks and Copyrights

Android is a trademark of Google LLC.

Arm, Cortex and Mali are registered trademarks or trademarks of Arm Limited (or its subsidiaries) in the US and/or elsewhere.

Linux® is the registered trademark of Linus Torvalds in the U.S. and other countries.

Mac and macOS are trademarks of Apple Inc., registered in the U.S. and other countries.

Tizen is a registered trademark of The Linux Foundation.

armcl-pipe-all's People

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

taihulight

armcl-pipe-all's Issues

Pipe-ALL AlexNet output formatting error when running entirely on little cluster at certain frequencies

demonstration:
Screenshot 2024-01-23 at 13 55 46

Steps to reproduce:

  • make sure LD_LIBRARY_PATH is set correctly
echo performance > /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
echo performance > /sys/devices/system/cpu/cpufreq/policy2/scaling_governor
echo 1000000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
./graph_alexnet_all_pipe_sync --threads=4  --threads2=2 --n=60 --total_cores=6 --partition_point=8 --partition_point2=8 --order="L-G-B"                                                                    

Happens on frequencies of 1 GHz and higher. Once it didn't occur on 1.2 GHz, but it is generally very consistent in occurring.

Consistently reproducible on our system.

Hardware is plugged directly into power with the supplied Anker PowerPort+ 1 power supply.

Significance of the issue:

  • It disrupts the proper operation of our parser, which cost us hours trying to "fix" a bug yesterday that was due to this.

Segmentation Fault and Runtime Errors in ResNet50 (all pipe sync)

Output of 'strings libarm_compute.so | grep arm_compute_version':

arm_compute_version=v21.02 
Build options: {'arch': 'arm64-v8a', 'opencl': '1', 'neon': '1', 'asserts': '0', 'debug': '1', 'os': 'linux', 'Werror': '0'} 
Git hash=e4cef6d16f8638331bde4d8d67a0e65ffbe4e571

Platform:

Hikey970

Operating System:

Debian 9, Linux kernel 4.9.78-147538-g244928755bbe

Problem description:

Below, there is a list of commands which yield a seg fault:

sudo LD_LIBRARY_PATH=/home/ARMCL-pipe-all/build /home/ARMCL-pipe-all/build/examples/graph_resnet50_all_pipe_sync --threads=4 --threads2=2 --total_cores=6 --partition_point=13 --partition_point2=15 --order=L-B-G --n=50

sudo LD_LIBRARY_PATH=/home/ARMCL-pipe-all/build /home/ARMCL-pipe-all/build/examples/graph_resnet50_all_pipe_sync --threads=4 --threads2=2 --total_cores=6 --partition_point=1 --partition_point2=8 --order=L-B-G --n=50

sudo LD_LIBRARY_PATH=/home/ARMCL-pipe-all/build /home/ARMCL-pipe-all/build/examples/graph_resnet50_all_pipe_sync --threads=4 --threads2=2 --total_cores=6 --partition_point=1 --partition_point2=3 --order=B-L-G --n=50

Also, these throw Runtime Error:

sudo LD_LIBRARY_PATH=/home/ARMCL-pipe-all/build /home/ARMCL-pipe-all/build/examples/graph_resnet50_all_pipe_sync --threads=4 --threads2=2 --total_cores=8 --partition_point=1 --partition_point2=6 --order=B-L-G --n=50

sudo LD_LIBRARY_PATH=/home/ARMCL-pipe-all/build /home/ARMCL-pipe-all/build/examples/graph_resnet50_all_pipe_sync --threads=4 --threads2=2 --total_cores=8 --partition_point=1 --partition_point2=13 --order=G-B-L --n=50

I have not done an exhaustive search to find all mappings causing seg faults, but those are some that definitely yield problems.

Unable to rebuild the ARM_CL and follow the instruction to reproduce the result

Output of 'strings libarm_compute.so | grep arm_compute_version':
arm_compute_version=v21.02 Build options: {'Werror': '0', 'debug': '1', 'asserts': '0', 'neon': '1', 'opencl': '1', 'os': 'linux', 'arch': 'arm64-v8a'} Git hash=b'5682f000a9e6682be0cf3d2ef5289851cd933433'

Platform:
Hikey970

Operating System:
Linux

Problem description:

Hello Ehsan. First of all, Thanks for open source this great project.

I try to replicate your result. I download your project by the following command. (I guess the README should be updated )
git clone https://github.com/Ehsan-aghapour/ARMCL-pipe-all/ But Here are some problems I faced:

  1. I used scons Werror=0 -j16 debug=0 asserts=0 neon=1 opencl=1 os=linux arch=arm64-v8a to build the ARM-CL. However, it shows the following warnings and errors:

image
aarch64-linux-gnu-g++: internal compiler error: Killed (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-6/README.Bugs> for instructions.
scons: *** [build/src/graph/backends/NEON/NEFunctionFactory.os] Error 4
aarch64-linux-gnu-g++: internal compiler error: Killed (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-6/README.Bugs> for instructions.
scons: *** [build/src/graph/backends/CL/CLFunctionsFactory.os] Error 4
examples/graph_mobilenet_all_pipe_sync.cpp: In constructor ‘GraphMobilenetExample::GraphMobilenetExample()’:
examples/graph_mobilenet_all_pipe_sync.cpp:67:5: warning: ‘GraphMobilenetExample::input_descriptor2’ should be

  1. I tried following the "Running in Linux part" : cp /system/lib64/egl/libGLES_mali.so $lib_dir/libOpenCL.so
    But May I ask when Can I find the libOpenCL.so?
    image

3.I tried to use the following command to test
./build/examples/graph_alexnet_all_pipe_sync --threads=4 --threads2=2 --total_cores=6 --partition_point=3 --partition_point2=5 --order=G-L-B --n=60

but it shows these errors, I guess maybe this is due to unsuccessful building.
./build/examples/graph_alexnet_all_pipe_sync: symbol lookup error: /home/Micro_SD_shunya/hungyang/ARMCL-pipe-all/library/libarm_compute_graph.so: undefined symbol: _ZN11arm_compute7logging14LoggerRegistry3getEv

  1. Also, I follow and use the following "Running in Linux part"

image
but it turns out with the following messages:
WARNING: Skipping invalid option 'Neon'!
WARNING: Skipping invalid option '/home/Micro_SD_shunya/hungyang/ARMCL-pipe-all/alexnet/'!
WARNING: Skipping invalid option '/home/Micro_SD_shunya/hungyang/ARMCL-pipe-all/alexnet//go_kart.ppm'!
WARNING: Skipping invalid option '/home/Micro_SD_shunya/hungyang/ARMCL-pipe-all/alexnet//labels.txt'!

Thanks again :)

Questions for clarification (mostly on multithreading, and real data inferencing)

Output of 'strings libarm_compute.so | grep arm_compute_version':
arm_compute_version=v21.02 Build options: {'arch': 'arm64-v8a', 'opencl': '1', 'neon': '1', 'asserts': '0', 'debug': '1', 'os': 'linux', 'Werror': '0'} Git hash=abc2c291bcc4a62c171b84e41e0fb0dafd393291

Platform:
Odroid N2+

Operating System:
Ubuntu 20.04

Problem description:
Hello, I would like to use this repository for my project. But before doing so, I would like to address some thoughts:

  1. Why did you create threads for the execution of different network parts? These are pipelined, so they are going to be scheduled in serial. So why create threads? (ref: examples/alexnet, lines 706-711)
  2. In src/runtime/SchedulerUtils.cpp, you have line 48 commented out, and the reason as you point out is "amend mistake in ARMCL". Could you please elaborate on that?
  3. In the README file, you point out that if we do not provide the --image argument, the execution will proceed with dummy data. Where is this data created? I took a glance over the graph_* files, and did not see any line that created dummy data.
  4. Is there any way I can retrieve the output of the DNN? In other words, How can I print the predictions or map the output layer to labels?
  5. The first lines of my output are:
Third graph inferencing image: 0:../../data/images/7.ppm
First graph inferencing image: 0:../../data/images/7.ppm
Second graph inferencing image: 1:../../data/images/0.ppm

I understand that multithreading can have that "unreasonable" output. I can also understand that you have used mutexes, locks and conditional variables to block certain code blocks from being accessed by multiple threads. However, is there any way that I can verify that data is not actually accessed by a later subgraph before the first pipeline stages (e.g. graph 3 before graph 2 or 1).

Thank you :)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.