utcs-scea / ava Goto Github PK
View Code? Open in Web Editor NEWAutomatic virtualization of (general) accelerators.
Home Page: https://ava.yuhc.me/
License: BSD 2-Clause "Simplified" License
Automatic virtualization of (general) accelerators.
Home Page: https://ava.yuhc.me/
License: BSD 2-Clause "Simplified" License
The following codes may be deprecated or reimplemented:
For example, let this command build only the ava-cuda-10.2 docker image:
~/ava/tools/docker$ make ava-cuda-10.2
Add a "debug=[True, False]" entry in the configuration file.
@photoszzt has added two options for compiling TF and ONNX specs as examples: 52a9476.
We should add more cmake options for other specs and different AvA managers.
Merge the PR at KatanaGraph/katana#113.
The legacy_manager does not shut down correctly. Killing the process with SIGINT leaves zombie processes. Using boost::asio::io_service::stop
might solve this.
Example: https://github.com/KatanaGraph/katana/blob/master/.clang-tidy
https://github.com/KatanaGraph/katana/blob/master/.clang-format
Add a pre-push/pre-commit hook to check lint locally.
Add a pre-merge hook to check hint remotely.
In the current prototype, ava_async
means the API returns right after it's sent to the API server, when the API server may not receive or execute the API yet.
In the multi-threading scenario, the order of ava_async
APIs being executed may be changed and wrong in the API server when there's inter-thread synchronization.
To enforce the execution correctness, ava_async
should preserve the ordering of those APIs between guest library and API server.
Related issue: [wait for merge from ava-serverless].
The imported serializer uses C++14 standard. I'd be good to upgrade it to comply C++17.
This can also improve the performance of registering polymorphic types which can be slower in C++14 compared to C++17 due to the use of shared_timed_mutex
instead of shared_mutex
.
# in build directory
cmake -DAVA_ENABLE_DEBUG .
make
results in:
...
CMakeFiles/worker.dir/home/vance/Documents/workspace/ava/common/cmd_channel_socket_tcp.cpp.o: In function `command_channel_socket_tcp_guest_new':
/home/vance/Documents/workspace/ava/common/cmd_channel_socket_tcp.cpp:50: undefined reference to `guestconfig::config'
/home/vance/Documents/workspace/ava/common/cmd_channel_socket_tcp.cpp:56: undefined reference to `guestconfig::config'
/home/vance/Documents/workspace/ava/common/cmd_channel_socket_tcp.cpp:57: undefined reference to `guestconfig::config'
/home/vance/Documents/workspace/ava/common/cmd_channel_socket_tcp.cpp:126: undefined reference to `guestconfig::config'
collect2: error: ld returned 1 exit status
CMakeFiles/worker.dir/build.make:328: recipe for target 'worker' failed
make[8]: *** [worker] Error 1
CMakeFiles/Makefile2:123: recipe for target 'CMakeFiles/worker.dir/all' failed
make[7]: *** [CMakeFiles/worker.dir/all] Error 2
Makefile:148: recipe for target 'all' failed
make[6]: *** [all] Error 2
CMakeFiles/cudadrv-nw.dir/build.make:130: recipe for target 'cudadrv_nw/src/cudadrv-nw-stamp/cudadrv-nw-build' failed
make[5]: *** [cudadrv_nw/src/cudadrv-nw-stamp/cudadrv-nw-build] Error 2
CMakeFiles/Makefile2:123: recipe for target 'CMakeFiles/cudadrv-nw.dir/all' failed
make[4]: *** [CMakeFiles/cudadrv-nw.dir/all] Error 2
Makefile:102: recipe for target 'all' failed
make[3]: *** [all] Error 2
CMakeFiles/ava-spec.dir/build.make:130: recipe for target 'ava-spec/src/ava-spec-stamp/ava-spec-build' failed
make[2]: *** [ava-spec/src/ava-spec-stamp/ava-spec-build] Error 2
CMakeFiles/Makefile2:117: recipe for target 'CMakeFiles/ava-spec.dir/all' failed
make[1]: *** [CMakeFiles/ava-spec.dir/all] Error 2
Makefile:102: recipe for target 'all' failed
make: *** [all] Error 2
Many assertions (assert
) should be replaced with more robust checks and error prints.
Some extensions may require to add codes in the worker's or guestlib's constructor or destructor function.
Modifying CAvA in those cases are unacceptable, instead we should introduce a set of new annotations to specify those codes and let CAvA combine them into the generated constructor and destructor functions.
The following new annotations should be added:
ava_guestlib_[init|fini]_[prologue|epilogue](...)
ava_worker_init_epilogue(...)
Currently, the migration dest server's IP address (and port) is hard-coded at devconf.h. But instead this information should be provided by the AvA manager to the API server.
AvA uses and implements lots of ad-hoc socket APIs which is bad for programmability and readability.
Those APIs should be replaced with an easy-to-read, standard or well-known RPC APIs.
gRPC
+Protobuf
is too slow for our case.
Cap'n proto
seems a good candidate: https://capnproto.org/
It's probably worth passing the whole /etc/ava/guest.config
file to the AvA manager and letting the manager parse the configuration file, so that we can extend configurations and features without touching the guestlib channel initialization code.
Currently, the AvA guestlib parses the configuration file and sends a few configuration entries to the AvA manager when it creates the channel.
Currently most third-party dependencies such as libconfig and gRPC are linked dynamically to guestlib.
This causes an issue when we try to link libguestlib to any benchmark outside the build directory--the relative path to those dependent libraries is changed.
Linking all dependencies statically would be great, but I met some troubles with gRPC.
I'm thinking about to install those dependencies in /usr/local/ava
or opt/ava
and use absolute path for linking all libraries.
AvA already supports single-node multi-GPU case, where a single process can get access to multiple GPUs on a single GPU node.
The CUDA process needs to call cudaSetDevice
explicitly to choose the in-use GPU during the runtime, and this feature can be utilized to support multi-node multi-GPU.
The basic idea is to run a worker on a GPU (which can be on different GPU nodes). When the application calls cudaSetDevice
, guestlib changes the address of the worker dynamically and all following CUDA APIs will be forwarded to that worker. This assumes that there is no inter-GPU data transfer via channels like NVLink.
An improvement will be to use multiple local GPUs in a worker, and the guestlib changes the worker address and forwards cudaSetDevice(adjusted GPU ID)
to that worker.
See PR 131.
typedef struct {
union Algorithm {
cudnnConvolutionFwdAlgo_t convFwdAlgo;
cudnnConvolutionBwdFilterAlgo_t convBwdFilterAlgo;
cudnnConvolutionBwdDataAlgo_t convBwdDataAlgo;
cudnnRNNAlgo_t RNNAlgo;
cudnnCTCLossAlgo_t CTCLossAlgo;
} algo;
} cudnnAlgorithm_t;
The generated code is like:
cudnnAlgorithm_t *ava_self;
ava_self = (cudnnAlgorithm_t *) (&algorithm);
union Algorithm *__algorithm_a_0_algo;
__algorithm_a_0_algo = (union Algorithm *)(&(algorithm).algo);
union Algorithm *__algorithm_b_0_algo;
__algorithm_b_0_algo = (union Algorithm *)(&(__call->algorithm).algo); {
union Algorithm *ava_self;
ava_self = (union Algorithm *)(&*__algorithm_a_0_algo);
cudnnCTCLossAlgo_t *__algorithm_a_1_CTCLossAlgo;
__algorithm_a_1_CTCLossAlgo = (cudnnCTCLossAlgo_t *) (&(*__algorithm_a_0_algo).CTCLossAlgo);
cudnnCTCLossAlgo_t *__algorithm_b_1_CTCLossAlgo;
__algorithm_b_1_CTCLossAlgo = (cudnnCTCLossAlgo_t *) (&(*__algorithm_b_0_algo).CTCLossAlgo); {
*__algorithm_a_1_CTCLossAlgo = (cudnnCTCLossAlgo_t) * __algorithm_b_1_CTCLossAlgo;
*__algorithm_a_1_CTCLossAlgo = *__algorithm_b_1_CTCLossAlgo;
}
...
AvA cannot intercept several CUDA internal APIs such as __cudaRegisterFatBinary
when the program is compiled with separate compilation. It may be worth to investigate the solution, which may also be applied to static linking CUDA programs.
If this doesn't break the loading of guestlib, we can switch to use abseil.
When I run tensenflow resnet50 benchmark on tf_opt, a segmemtation fault may accur in "__cudaPopCallConfiguration" function at some point.
When the guestlib and TensorFlow link against two different versions of libprotobuf, the guestlib loaded by a TensorFlow program will fail to initialize.
A recent PR #61 removes the depedency of protobuf completely. But due to the time limit, I didn't get to remove deprecated files.
Describe the bug
Building ava with -CMAKE_BUILD_TYPE=Debug inside the ava container w/ cuda 10.1 does not succeed. Without debug flag it does.
To Reproduce
Create the 10.1 container, run interactive shell and compile with
./generate.py -s cudart && mkdir -p build
cd build && cmake .. -DAVA_GEN_CUDART_SPEC=ON -DAVA_MANAGER_DEMO=ON -CMAKE_BUILD_TYPE=Debug
If -CMAKE_BUILD_TYPE=Debug
is removed, it works fine.
Expected behavior
Container should be setup to compile release and debug with no errors.
Error log
Determining if the strtod_l exist failed with the following output:
...
CMakeTmp/CheckSymbolExists.c:8:19: error: ‘strtod_l’ undeclared (first use in this function); did you mean ‘strtoull’?
return ((int*)(&strtod_l))[argc];
...
Performing C SOURCE FILE Test CMAKE_HAVE_LIBC_PTHREAD failed with the following output:
...
/usr/bin/cmake -E cmake_link_script CMakeFiles/cmTC_45ece.dir/link.txt --verbose=1
/usr/bin/cc CMakeFiles/cmTC_45ece.dir/src.c.o -o cmTC_45ece
CMakeFiles/cmTC_45ece.dir/src.c.o: In function `main':
src.c:(.text+0x3e): undefined reference to `pthread_create'
src.c:(.text+0x4a): undefined reference to `pthread_detach'
src.c:(.text+0x56): undefined reference to `pthread_cancel'
src.c:(.text+0x67): undefined reference to `pthread_join'
src.c:(.text+0x7b): undefined reference to `pthread_atfork'
A single manager process takes responsibility for both spawning workers and assigning those workers to guestlibs.
This design works well in the single-node scenario, but not the multi-node case.
When GPUs are distributed among multiple GPU nodes, we need daemons spawning workers on every node and a global manager scheduling and assigning those workers to corresponding guestlibs.
The current log print outputs to stderr
and leads to large CPU burden and latency. We plan to improve its performance with high-performance logging libraries such as:
https://github.com/odygrd/quill
https://github.com/PlatformLab/NanoLog
https://github.com/HardySimpson/zlog
Add --append
(-A
) as a temporary solution to share spec snippets.
--append file1 file2 file3
will simply concatenate these 3 files in order to the specification being compiled.
This will work as a temporary solution to share (import) a specification between (into) other specifications.
It would be nice if we had some way to test if our virtual layer is complete other than debugging additional demos. Perhaps we could put together a way to get traces on which syscalls originated where and ensure that specific syscalls are always routed through our wrappers. I'm unsure of how to automate something like that right now though.
Describe the bug
Some optional flags to the demo_manager are not used/not implemented. For example worker_pool_size
.
We should set flags per manager, not one general flags.h
with unused options.
I will probably fix this soon and refer to this issue.
This is required to support batching optimization.
Different kinds of commands (sync, async, batched) need to be sent in different ways, and those ways should be described in the spec.
Enabling developers to write CAvA code in the spec is not a nice thing, but currently I haven't found an alternative.
Describe the messy code or documentation
ava_disable_native_call has no documentation for its meaning but it's used by many API.
A few changes need to be made in CAvA, but most optimization-specific codes should be maintained only in the spec.
Remoting vector_add benchmark from Cuda 10.1 samples with AvA's cudart spec seems to have a missing function.
symbol cudaGetErrorName version libcudart.so.10.1 not defined in file libcudart.so.10.1 with link time reference
I tried to add this function to the cudart_opt.c spec and recompile:
__host__ __cudart_builtin__ const char* CUDARTAPI
cudaGetErrorName(cudaError_t error)
{
const char *ret = ava_execute();
ava_return_value {
ava_out; ava_buffer(strlen(ret) + 1);
ava_lifetime_static;
}
}
same error
Some functions and variables like guestlib_tf_opt_init
and fatbin_handle_list
are defined and used for utility functions. Those utility functions are used only in either guestlib or worker, but have to be compiled into both for now.
Currently, we add dummy definitions for such functions and variables in guestlib or worker to suppress the compiler error. But we should find a better way for this in the end.
They will be useful to support spec-specific optimizations or extensions. In old days we have to modify the Makefile manually to include those source files.
The new annotations should be like:
ava_worker_srcs(filenames...);
ava_guestlib_srcs(filenames...);
These annotations will be helpful for merging Galvanic specs (#5 and #6).
Describe the bug
Try initializing the cmake build system on a new clone of the repository.
To Reproduce
Try initializing the cmake build system on a new clone of the repository.
cmake -DAVA_GEN_CUDART_SPEC=On -DAVA_MANAGER_LEGACY=On ../ava
# note out of source tree build
Expected behavior
Cmake completes successfully.
Error log
CMake Error at cava/CMakeLists.txt:39 (add_subdirectory):
add_subdirectory given source "cudart_nw" which is not an existing
directory.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.