utcs-scea / ava Goto Github PK

View Code? Open in Web Editor NEW

38.0 38.0 20.0 1.58 MB

Automatic virtualization of (general) accelerators.

Home Page: https://ava.yuhc.me/

License: BSD 2-Clause "Simplified" License

C 4.72% Makefile 0.03% C++ 89.65% Shell 0.50% GDB 0.01% Python 4.59% CMake 0.49% Roff 0.02%

accelerators ava compiler virtualization

ava's People

Contributors

Stargazers

Watchers

Forkers

zhaojp-frank yuhc photoszzt arthurp anxs-04 imcg woodlgz arnavmohan anjuna-security hfingler vancemiller baby123456 trellixvulnteam yippeexu wzhao18 lklake abhishekghosh1998

ava's Issues

Change all AvA component library files to C++

Command channel
Endpoint library
Shadow thread pool
Migration channel

The following codes may be deprecated or reimplemented:

Hypervisor channel
Murmur3
Zero copy
Shared memory channel

Avoid defining channels and VM IDs as global variables

Merge API server migration from r1.0

Update docker Makefile to build individual images

For example, let this command build only the ava-cuda-10.2 docker image:

~/ava/tools/docker$ make ava-cuda-10.2

Control debug prints in the configuration file

Add a "debug=[True, False]" entry in the configuration file.

Add cmake options for compiling managers and example specs

@photoszzt has added two options for compiling TF and ONNX specs as examples: 52a9476.

We should add more cmake options for other specs and different AvA managers.

Patch gRPC to avoid gettid build failure

Merge the PR at KatanaGraph/katana#113.

Remove DEBUG_PRINT in all places

Merge TensorFlow and ONNX runtime specifications from ava-serverless

Use boost::asio::io_service::stop

The legacy_manager does not shut down correctly. Killing the process with SIGINT leaves zombie processes. Using boost::asio::io_service::stop might solve this.

Rewrite a few components in C++

Add linter for submitted code

Example: https://github.com/KatanaGraph/katana/blob/master/.clang-tidy

https://github.com/KatanaGraph/katana/blob/master/.clang-format

Add a pre-push/pre-commit hook to check lint locally.
Add a pre-merge hook to check hint remotely.

Preserve ordering of ava_async APIs between multiple threads

In the current prototype, ava_async means the API returns right after it's sent to the API server, when the API server may not receive or execute the API yet.
In the multi-threading scenario, the order of ava_async APIs being executed may be changed and wrong in the API server when there's inter-thread synchronization.
To enforce the execution correctness, ava_async should preserve the ordering of those APIs between guest library and API server.

Related issue: [wait for merge from ava-serverless].

Build demo manager with Boost::asio and Protobuf

Update serializer to use C++17 standard

The imported serializer uses C++14 standard. I'd be good to upgrade it to comply C++17.
This can also improve the performance of registering polymorphic types which can be slower in C++14 compared to C++17 due to the use of shared_timed_mutex instead of shared_mutex.

AVA_ENALBE_DEBUG doesn't work

# in build directory
cmake -DAVA_ENABLE_DEBUG .
make

results in:

...
CMakeFiles/worker.dir/home/vance/Documents/workspace/ava/common/cmd_channel_socket_tcp.cpp.o: In function `command_channel_socket_tcp_guest_new':
/home/vance/Documents/workspace/ava/common/cmd_channel_socket_tcp.cpp:50: undefined reference to `guestconfig::config'
/home/vance/Documents/workspace/ava/common/cmd_channel_socket_tcp.cpp:56: undefined reference to `guestconfig::config'
/home/vance/Documents/workspace/ava/common/cmd_channel_socket_tcp.cpp:57: undefined reference to `guestconfig::config'
/home/vance/Documents/workspace/ava/common/cmd_channel_socket_tcp.cpp:126: undefined reference to `guestconfig::config'
collect2: error: ld returned 1 exit status
CMakeFiles/worker.dir/build.make:328: recipe for target 'worker' failed
make[8]: *** [worker] Error 1
CMakeFiles/Makefile2:123: recipe for target 'CMakeFiles/worker.dir/all' failed
make[7]: *** [CMakeFiles/worker.dir/all] Error 2
Makefile:148: recipe for target 'all' failed
make[6]: *** [all] Error 2
CMakeFiles/cudadrv-nw.dir/build.make:130: recipe for target 'cudadrv_nw/src/cudadrv-nw-stamp/cudadrv-nw-build' failed
make[5]: *** [cudadrv_nw/src/cudadrv-nw-stamp/cudadrv-nw-build] Error 2
CMakeFiles/Makefile2:123: recipe for target 'CMakeFiles/cudadrv-nw.dir/all' failed
make[4]: *** [CMakeFiles/cudadrv-nw.dir/all] Error 2
Makefile:102: recipe for target 'all' failed
make[3]: *** [all] Error 2
CMakeFiles/ava-spec.dir/build.make:130: recipe for target 'ava-spec/src/ava-spec-stamp/ava-spec-build' failed
make[2]: *** [ava-spec/src/ava-spec-stamp/ava-spec-build] Error 2
CMakeFiles/Makefile2:117: recipe for target 'CMakeFiles/ava-spec.dir/all' failed
make[1]: *** [CMakeFiles/ava-spec.dir/all] Error 2
Makefile:102: recipe for target 'all' failed
make: *** [all] Error 2

Type annotate CAvA functions

Implement object handle as a virtual address layer

Assertions get compiled out in the release build

Many assertions (assert) should be replaced with more robust checks and error prints.

Describe guestlib and worker construction and destruction in the spec

Some extensions may require to add codes in the worker's or guestlib's constructor or destructor function.
Modifying CAvA in those cases are unacceptable, instead we should introduce a set of new annotations to specify those codes and let CAvA combine them into the generated constructor and destructor functions.

The following new annotations should be added:

ava_guestlib_[init|fini]_[prologue|epilogue](...)
ava_worker_init_epilogue(...)

Replace hardcoded migration destination server address

Currently, the migration dest server's IP address (and port) is hard-coded at devconf.h. But instead this information should be provided by the AvA manager to the API server.

Use a standard RPC interface instead of ad-hoc socket APIs

AvA uses and implements lots of ad-hoc socket APIs which is bad for programmability and readability.
Those APIs should be replaced with an easy-to-read, standard or well-known RPC APIs.

gRPC+Protobuf is too slow for our case.
Cap'n proto seems a good candidate: https://capnproto.org/

Pass whole guest config to manager

It's probably worth passing the whole /etc/ava/guest.config file to the AvA manager and letting the manager parse the configuration file, so that we can extend configurations and features without touching the guestlib channel initialization code.

Currently, the AvA guestlib parses the configuration file and sends a few configuration entries to the AvA manager when it creates the channel.

Linking problem with third-party dependencies

Currently most third-party dependencies such as libconfig and gRPC are linked dynamically to guestlib.
This causes an issue when we try to link libguestlib to any benchmark outside the build directory--the relative path to those dependent libraries is changed.

Linking all dependencies statically would be great, but I met some troubles with gRPC.
I'm thinking about to install those dependencies in /usr/local/ava or opt/ava and use absolute path for linking all libraries.

Support multiple GPUs on multiple nodes

AvA already supports single-node multi-GPU case, where a single process can get access to multiple GPUs on a single GPU node.
The CUDA process needs to call cudaSetDevice explicitly to choose the in-use GPU during the runtime, and this feature can be utilized to support multi-node multi-GPU.

The basic idea is to run a worker on a GPU (which can be on different GPU nodes). When the application calls cudaSetDevice, guestlib changes the address of the worker dynamically and all following CUDA APIs will be forwarded to that worker. This assumes that there is no inter-GPU data transfer via channels like NVLink.

An improvement will be to use multiple local GPUs in a worker, and the guestlib changes the worker address and forwards cudaSetDevice(adjusted GPU ID) to that worker.

Fix code generation for union inside struct

See PR 131.

typedef struct {
    union Algorithm {
        cudnnConvolutionFwdAlgo_t convFwdAlgo;
        cudnnConvolutionBwdFilterAlgo_t convBwdFilterAlgo;
        cudnnConvolutionBwdDataAlgo_t convBwdDataAlgo;
        cudnnRNNAlgo_t RNNAlgo;
        cudnnCTCLossAlgo_t CTCLossAlgo;
    } algo;
} cudnnAlgorithm_t;

The generated code is like:

            cudnnAlgorithm_t *ava_self;                                                                                                                                                                                                                                    
            ava_self = (cudnnAlgorithm_t *) (&algorithm);                                                                                                                                                                                                                  
            union Algorithm *__algorithm_a_0_algo;                                                                                                                                                                                                                         
            __algorithm_a_0_algo = (union Algorithm *)(&(algorithm).algo);                                                                                                                                                                                                 
            union Algorithm *__algorithm_b_0_algo;                                                                                                                                                                                                                         
            __algorithm_b_0_algo = (union Algorithm *)(&(__call->algorithm).algo); {                                                                                                                                                                                       
                union Algorithm *ava_self;                                                                                                                                                                                                                                 
                ava_self = (union Algorithm *)(&*__algorithm_a_0_algo);                                                                                                                                                                                                    
                cudnnCTCLossAlgo_t *__algorithm_a_1_CTCLossAlgo;                                                                                                                                                                                                           
                __algorithm_a_1_CTCLossAlgo = (cudnnCTCLossAlgo_t *) (&(*__algorithm_a_0_algo).CTCLossAlgo);                                                                                                                                                               
                cudnnCTCLossAlgo_t *__algorithm_b_1_CTCLossAlgo;                                                                                                                                                                                                           
                __algorithm_b_1_CTCLossAlgo = (cudnnCTCLossAlgo_t *) (&(*__algorithm_b_0_algo).CTCLossAlgo); {                                                                                                                                                             
                    *__algorithm_a_1_CTCLossAlgo = (cudnnCTCLossAlgo_t) * __algorithm_b_1_CTCLossAlgo;                                                                                                                                                                     
                    *__algorithm_a_1_CTCLossAlgo = *__algorithm_b_1_CTCLossAlgo;                                                                                                                                                                                           
                }
...

Intercept __cudaRegister* APIs of programs with separable compilation

AvA cannot intercept several CUDA internal APIs such as __cudaRegisterFatBinary when the program is compiled with separate compilation. It may be worth to investigate the solution, which may also be applied to static linking CUDA programs.

KatanaGraph/katana#163

Reimplement CAvA front-end in C++

Statically link abseil library

If this doesn't break the loading of guestlib, we can switch to use abseil.

A segmentation fault may accur in "__cudaPopCallConfiguration" in tf_opt.c

When I run tensenflow resnet50 benchmark on tf_opt, a segmemtation fault may accur in "__cudaPopCallConfiguration" function at some point.

Completely get rid of protobuf

When the guestlib and TensorFlow link against two different versions of libprotobuf, the guestlib loaded by a TensorFlow program will fail to initialize.

A recent PR #61 removes the depedency of protobuf completely. But due to the time limit, I didn't get to remove deprecated files.

[BUG] cmake of cudart with debug mode fails on the ava container.

Describe the bug
Building ava with -CMAKE_BUILD_TYPE=Debug inside the ava container w/ cuda 10.1 does not succeed. Without debug flag it does.

To Reproduce
Create the 10.1 container, run interactive shell and compile with

./generate.py -s cudart && mkdir -p build
cd build && cmake .. -DAVA_GEN_CUDART_SPEC=ON -DAVA_MANAGER_DEMO=ON -CMAKE_BUILD_TYPE=Debug

If -CMAKE_BUILD_TYPE=Debug is removed, it works fine.

Expected behavior
Container should be setup to compile release and debug with no errors.

Error log

Determining if the strtod_l exist failed with the following output:
...
CMakeTmp/CheckSymbolExists.c:8:19: error: ‘strtod_l’ undeclared (first use in this function); did you mean ‘strtoull’?
   return ((int*)(&strtod_l))[argc];

...

Performing C SOURCE FILE Test CMAKE_HAVE_LIBC_PTHREAD failed with the following output:
...
/usr/bin/cmake -E cmake_link_script CMakeFiles/cmTC_45ece.dir/link.txt --verbose=1
/usr/bin/cc CMakeFiles/cmTC_45ece.dir/src.c.o -o cmTC_45ece 
CMakeFiles/cmTC_45ece.dir/src.c.o: In function `main':
src.c:(.text+0x3e): undefined reference to `pthread_create'
src.c:(.text+0x4a): undefined reference to `pthread_detach'
src.c:(.text+0x56): undefined reference to `pthread_cancel'
src.c:(.text+0x67): undefined reference to `pthread_join'
src.c:(.text+0x7b): undefined reference to `pthread_atfork'

Add config entry for choosing protobuf or flatbuffers

Update README of master branch

Split worker spawn daemon and worker manager

A single manager process takes responsibility for both spawning workers and assigning those workers to guestlibs.
This design works well in the single-node scenario, but not the multi-node case.
When GPUs are distributed among multiple GPU nodes, we need daemons spawning workers on every node and a global manager scheduling and assigning those workers to corresponding guestlibs.

Use high-performance log print

The current log print outputs to stderr and leads to large CPU burden and latency. We plan to improve its performance with high-performance logging libraries such as:

https://github.com/odygrd/quill
https://github.com/PlatformLab/NanoLog
https://github.com/HardySimpson/zlog

Add "--append" option to nwcc [temporarily]

Add --append (-A) as a temporary solution to share spec snippets.
--append file1 file2 file3 will simply concatenate these 3 files in order to the specification being compiled.

This will work as a temporary solution to share (import) a specification between (into) other specifications.

Testing Completeness of Virtualization?

It would be nice if we had some way to test if our virtual layer is complete other than debugging additional demos. Perhaps we could put together a way to get traces on which syscalls originated where and ensure that specific syscalls are always routed through our wrappers. I'm unsure of how to automate something like that right now though.

[BUG] Some optional flags to the demo_manager are not used/not implemented.

Describe the bug
Some optional flags to the demo_manager are not used/not implemented. For example worker_pool_size.
We should set flags per manager, not one general flags.h with unused options.

I will probably fix this soon and refer to this issue.

New annotation for specifying send_code (instructions to send the commands) in the spec

This is required to support batching optimization.
Different kinds of commands (sync, async, batched) need to be sent in different ways, and those ways should be described in the spec.

Enabling developers to write CAvA code in the spec is not a nice thing, but currently I haven't found an alternative.

[DOC] ava_disable_native_call has no documentation

Describe the messy code or documentation
ava_disable_native_call has no documentation for its meaning but it's used by many API.

Merge API batching optimizations from ava-serverless

A few changes need to be made in CAvA, but most optimization-specific codes should be maintained only in the spec.

Missing functions in cudart spec?

Remoting vector_add benchmark from Cuda 10.1 samples with AvA's cudart spec seems to have a missing function.

symbol cudaGetErrorName version libcudart.so.10.1 not defined in file libcudart.so.10.1 with link time reference

I tried to add this function to the cudart_opt.c spec and recompile:

__host__ __cudart_builtin__ const char* CUDARTAPI
cudaGetErrorName(cudaError_t error)
{
    const char *ret = ava_execute();
    ava_return_value {
        ava_out; ava_buffer(strlen(ret) + 1);
        ava_lifetime_static;
    }
}

same error

Repair Galvanic manager

Linking issue in utility functions

Some functions and variables like guestlib_tf_opt_init and fatbin_handle_list are defined and used for utility functions. Those utility functions are used only in either guestlib or worker, but have to be compiled into both for now.
Currently, we add dummy definitions for such functions and variables in guestlib or worker to suppress the compiler error. But we should find a better way for this in the end.

Add annotations to include worker-/guestlib- specific files

They will be useful to support spec-specific optimizations or extensions. In old days we have to modify the Makefile manually to include those source files.

The new annotations should be like:

ava_worker_srcs(filenames...);
ava_guestlib_srcs(filenames...);

These annotations will be helpful for merging Galvanic specs (#5 and #6).

Add workflows to protect codebase

[BUG] Generator CMakeLists.txt assumes build artifacts already exist

Describe the bug
Try initializing the cmake build system on a new clone of the repository.

To Reproduce
Try initializing the cmake build system on a new clone of the repository.

cmake -DAVA_GEN_CUDART_SPEC=On -DAVA_MANAGER_LEGACY=On ../ava
# note out of source tree build

Expected behavior
Cmake completes successfully.

Error log

CMake Error at cava/CMakeLists.txt:39 (add_subdirectory):
  add_subdirectory given source "cudart_nw" which is not an existing
  directory.

utcs-scea / ava Goto Github PK

ava's People

Contributors

Stargazers

Watchers

Forkers

ava's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs