GithubHelp home page GithubHelp logo

utcs-scea / ava Goto Github PK

View Code? Open in Web Editor NEW
38.0 38.0 20.0 1.58 MB

Automatic virtualization of (general) accelerators.

Home Page: https://ava.yuhc.me/

License: BSD 2-Clause "Simplified" License

C 4.72% Makefile 0.03% C++ 89.65% Shell 0.50% GDB 0.01% Python 4.59% CMake 0.49% Roff 0.02%
accelerators ava compiler virtualization

ava's People

Contributors

aakshintala avatar doranjuna avatar hfingler avatar photoszzt avatar vancemiller avatar yuhc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ava's Issues

Change all AvA component library files to C++

  • Command channel
  • Endpoint library
  • Shadow thread pool
  • Migration channel

The following codes may be deprecated or reimplemented:

  • Hypervisor channel
  • Murmur3
  • Zero copy
  • Shared memory channel

Use boost::asio::io_service::stop

The legacy_manager does not shut down correctly. Killing the process with SIGINT leaves zombie processes. Using boost::asio::io_service::stop might solve this.

Rewrite a few components in C++

  • AvA manager
  • API server
  • Common libraries
  • Generated codes (we should have different backends for generating C, C++, kernel codes)
  • CAvA

Preserve ordering of ava_async APIs between multiple threads

In the current prototype, ava_async means the API returns right after it's sent to the API server, when the API server may not receive or execute the API yet.
In the multi-threading scenario, the order of ava_async APIs being executed may be changed and wrong in the API server when there's inter-thread synchronization.
To enforce the execution correctness, ava_async should preserve the ordering of those APIs between guest library and API server.

Related issue: [wait for merge from ava-serverless].

Update serializer to use C++17 standard

The imported serializer uses C++14 standard. I'd be good to upgrade it to comply C++17.
This can also improve the performance of registering polymorphic types which can be slower in C++14 compared to C++17 due to the use of shared_timed_mutex instead of shared_mutex.

AVA_ENALBE_DEBUG doesn't work

# in build directory
cmake -DAVA_ENABLE_DEBUG .
make

results in:

...
CMakeFiles/worker.dir/home/vance/Documents/workspace/ava/common/cmd_channel_socket_tcp.cpp.o: In function `command_channel_socket_tcp_guest_new':
/home/vance/Documents/workspace/ava/common/cmd_channel_socket_tcp.cpp:50: undefined reference to `guestconfig::config'
/home/vance/Documents/workspace/ava/common/cmd_channel_socket_tcp.cpp:56: undefined reference to `guestconfig::config'
/home/vance/Documents/workspace/ava/common/cmd_channel_socket_tcp.cpp:57: undefined reference to `guestconfig::config'
/home/vance/Documents/workspace/ava/common/cmd_channel_socket_tcp.cpp:126: undefined reference to `guestconfig::config'
collect2: error: ld returned 1 exit status
CMakeFiles/worker.dir/build.make:328: recipe for target 'worker' failed
make[8]: *** [worker] Error 1
CMakeFiles/Makefile2:123: recipe for target 'CMakeFiles/worker.dir/all' failed
make[7]: *** [CMakeFiles/worker.dir/all] Error 2
Makefile:148: recipe for target 'all' failed
make[6]: *** [all] Error 2
CMakeFiles/cudadrv-nw.dir/build.make:130: recipe for target 'cudadrv_nw/src/cudadrv-nw-stamp/cudadrv-nw-build' failed
make[5]: *** [cudadrv_nw/src/cudadrv-nw-stamp/cudadrv-nw-build] Error 2
CMakeFiles/Makefile2:123: recipe for target 'CMakeFiles/cudadrv-nw.dir/all' failed
make[4]: *** [CMakeFiles/cudadrv-nw.dir/all] Error 2
Makefile:102: recipe for target 'all' failed
make[3]: *** [all] Error 2
CMakeFiles/ava-spec.dir/build.make:130: recipe for target 'ava-spec/src/ava-spec-stamp/ava-spec-build' failed
make[2]: *** [ava-spec/src/ava-spec-stamp/ava-spec-build] Error 2
CMakeFiles/Makefile2:117: recipe for target 'CMakeFiles/ava-spec.dir/all' failed
make[1]: *** [CMakeFiles/ava-spec.dir/all] Error 2
Makefile:102: recipe for target 'all' failed
make: *** [all] Error 2

Describe guestlib and worker construction and destruction in the spec

Some extensions may require to add codes in the worker's or guestlib's constructor or destructor function.
Modifying CAvA in those cases are unacceptable, instead we should introduce a set of new annotations to specify those codes and let CAvA combine them into the generated constructor and destructor functions.

The following new annotations should be added:

  • ava_guestlib_[init|fini]_[prologue|epilogue](...)
  • ava_worker_init_epilogue(...)

Pass whole guest config to manager

It's probably worth passing the whole /etc/ava/guest.config file to the AvA manager and letting the manager parse the configuration file, so that we can extend configurations and features without touching the guestlib channel initialization code.

Currently, the AvA guestlib parses the configuration file and sends a few configuration entries to the AvA manager when it creates the channel.

Linking problem with third-party dependencies

Currently most third-party dependencies such as libconfig and gRPC are linked dynamically to guestlib.
This causes an issue when we try to link libguestlib to any benchmark outside the build directory--the relative path to those dependent libraries is changed.

Linking all dependencies statically would be great, but I met some troubles with gRPC.
I'm thinking about to install those dependencies in /usr/local/ava or opt/ava and use absolute path for linking all libraries.

Support multiple GPUs on multiple nodes

AvA already supports single-node multi-GPU case, where a single process can get access to multiple GPUs on a single GPU node.
The CUDA process needs to call cudaSetDevice explicitly to choose the in-use GPU during the runtime, and this feature can be utilized to support multi-node multi-GPU.

The basic idea is to run a worker on a GPU (which can be on different GPU nodes). When the application calls cudaSetDevice, guestlib changes the address of the worker dynamically and all following CUDA APIs will be forwarded to that worker. This assumes that there is no inter-GPU data transfer via channels like NVLink.

An improvement will be to use multiple local GPUs in a worker, and the guestlib changes the worker address and forwards cudaSetDevice(adjusted GPU ID) to that worker.

Fix code generation for union inside struct

See PR 131.

typedef struct {
    union Algorithm {
        cudnnConvolutionFwdAlgo_t convFwdAlgo;
        cudnnConvolutionBwdFilterAlgo_t convBwdFilterAlgo;
        cudnnConvolutionBwdDataAlgo_t convBwdDataAlgo;
        cudnnRNNAlgo_t RNNAlgo;
        cudnnCTCLossAlgo_t CTCLossAlgo;
    } algo;
} cudnnAlgorithm_t;

The generated code is like:

            cudnnAlgorithm_t *ava_self;                                                                                                                                                                                                                                    
            ava_self = (cudnnAlgorithm_t *) (&algorithm);                                                                                                                                                                                                                  
            union Algorithm *__algorithm_a_0_algo;                                                                                                                                                                                                                         
            __algorithm_a_0_algo = (union Algorithm *)(&(algorithm).algo);                                                                                                                                                                                                 
            union Algorithm *__algorithm_b_0_algo;                                                                                                                                                                                                                         
            __algorithm_b_0_algo = (union Algorithm *)(&(__call->algorithm).algo); {                                                                                                                                                                                       
                union Algorithm *ava_self;                                                                                                                                                                                                                                 
                ava_self = (union Algorithm *)(&*__algorithm_a_0_algo);                                                                                                                                                                                                    
                cudnnCTCLossAlgo_t *__algorithm_a_1_CTCLossAlgo;                                                                                                                                                                                                           
                __algorithm_a_1_CTCLossAlgo = (cudnnCTCLossAlgo_t *) (&(*__algorithm_a_0_algo).CTCLossAlgo);                                                                                                                                                               
                cudnnCTCLossAlgo_t *__algorithm_b_1_CTCLossAlgo;                                                                                                                                                                                                           
                __algorithm_b_1_CTCLossAlgo = (cudnnCTCLossAlgo_t *) (&(*__algorithm_b_0_algo).CTCLossAlgo); {                                                                                                                                                             
                    *__algorithm_a_1_CTCLossAlgo = (cudnnCTCLossAlgo_t) * __algorithm_b_1_CTCLossAlgo;                                                                                                                                                                     
                    *__algorithm_a_1_CTCLossAlgo = *__algorithm_b_1_CTCLossAlgo;                                                                                                                                                                                           
                }
...

Completely get rid of protobuf

When the guestlib and TensorFlow link against two different versions of libprotobuf, the guestlib loaded by a TensorFlow program will fail to initialize.

A recent PR #61 removes the depedency of protobuf completely. But due to the time limit, I didn't get to remove deprecated files.

[BUG] cmake of cudart with debug mode fails on the ava container.

Describe the bug
Building ava with -CMAKE_BUILD_TYPE=Debug inside the ava container w/ cuda 10.1 does not succeed. Without debug flag it does.

To Reproduce
Create the 10.1 container, run interactive shell and compile with

./generate.py -s cudart && mkdir -p build
cd build && cmake .. -DAVA_GEN_CUDART_SPEC=ON -DAVA_MANAGER_DEMO=ON -CMAKE_BUILD_TYPE=Debug

If -CMAKE_BUILD_TYPE=Debug is removed, it works fine.

Expected behavior
Container should be setup to compile release and debug with no errors.

Error log

Determining if the strtod_l exist failed with the following output:
...
CMakeTmp/CheckSymbolExists.c:8:19: error: ‘strtod_l’ undeclared (first use in this function); did you mean ‘strtoull’?
   return ((int*)(&strtod_l))[argc];

...

Performing C SOURCE FILE Test CMAKE_HAVE_LIBC_PTHREAD failed with the following output:
...
/usr/bin/cmake -E cmake_link_script CMakeFiles/cmTC_45ece.dir/link.txt --verbose=1
/usr/bin/cc CMakeFiles/cmTC_45ece.dir/src.c.o -o cmTC_45ece 
CMakeFiles/cmTC_45ece.dir/src.c.o: In function `main':
src.c:(.text+0x3e): undefined reference to `pthread_create'
src.c:(.text+0x4a): undefined reference to `pthread_detach'
src.c:(.text+0x56): undefined reference to `pthread_cancel'
src.c:(.text+0x67): undefined reference to `pthread_join'
src.c:(.text+0x7b): undefined reference to `pthread_atfork'

Split worker spawn daemon and worker manager

A single manager process takes responsibility for both spawning workers and assigning those workers to guestlibs.
This design works well in the single-node scenario, but not the multi-node case.
When GPUs are distributed among multiple GPU nodes, we need daemons spawning workers on every node and a global manager scheduling and assigning those workers to corresponding guestlibs.

Add "--append" option to nwcc [temporarily]

Add --append (-A) as a temporary solution to share spec snippets.
--append file1 file2 file3 will simply concatenate these 3 files in order to the specification being compiled.

This will work as a temporary solution to share (import) a specification between (into) other specifications.

Testing Completeness of Virtualization?

It would be nice if we had some way to test if our virtual layer is complete other than debugging additional demos. Perhaps we could put together a way to get traces on which syscalls originated where and ensure that specific syscalls are always routed through our wrappers. I'm unsure of how to automate something like that right now though.

Missing functions in cudart spec?

Remoting vector_add benchmark from Cuda 10.1 samples with AvA's cudart spec seems to have a missing function.

symbol cudaGetErrorName version libcudart.so.10.1 not defined in file libcudart.so.10.1 with link time reference

I tried to add this function to the cudart_opt.c spec and recompile:

__host__ __cudart_builtin__ const char* CUDARTAPI
cudaGetErrorName(cudaError_t error)
{
    const char *ret = ava_execute();
    ava_return_value {
        ava_out; ava_buffer(strlen(ret) + 1);
        ava_lifetime_static;
    }
}

same error

Linking issue in utility functions

Some functions and variables like guestlib_tf_opt_init and fatbin_handle_list are defined and used for utility functions. Those utility functions are used only in either guestlib or worker, but have to be compiled into both for now.
Currently, we add dummy definitions for such functions and variables in guestlib or worker to suppress the compiler error. But we should find a better way for this in the end.

Add annotations to include worker-/guestlib- specific files

They will be useful to support spec-specific optimizations or extensions. In old days we have to modify the Makefile manually to include those source files.

The new annotations should be like:

ava_worker_srcs(filenames...);
ava_guestlib_srcs(filenames...);

These annotations will be helpful for merging Galvanic specs (#5 and #6).

[BUG] Generator CMakeLists.txt assumes build artifacts already exist

Describe the bug
Try initializing the cmake build system on a new clone of the repository.

To Reproduce
Try initializing the cmake build system on a new clone of the repository.

cmake -DAVA_GEN_CUDART_SPEC=On -DAVA_MANAGER_LEGACY=On ../ava
# note out of source tree build

Expected behavior
Cmake completes successfully.

Error log

CMake Error at cava/CMakeLists.txt:39 (add_subdirectory):
  add_subdirectory given source "cudart_nw" which is not an existing
  directory.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.