Comments (8)
Prior art
One responsibility of the Backend
class is to provide a common hardware abstraction layer interface. Each supported target architecture provides a subclass of Backend
that implements virtual functions which specialize compilation and execution to the particular target.
Old-school re-targetable compilers used #ifdef
s sprinkled all over their source code to specialize the compiler for a target architecture. The compiler could only be built for one architecture at a time. This meant that unit tests would only test the architecture you built, and you couldn't even detect syntax errors if they happened to be invisible in your current configuration. This made continuous integration very painful.
LLVM can be re-targeted at runtime. It uses a Target
class which has virtual functions that provide further information about a target architecture. LLVM's target configuration system is more complex than Glow needs.
The Cretonne code generator uses a TargetIsa
trait which encapsulates both the target architecture and any compiler flags and settings that can affect the code generation. This means that the generated code is a pure function of the input IR and the TargetIsa
instance. No secret command line flags can affect the code generation and cause hard-to-reproduce bugs.
Common to the LLVM and Cretonne designs is that their Target
and TargetIsa
objects are constant once they have been created. All the virtual methods are declared as const
. This means that one target object can be reused for multiple (concurrent) compilations, and the state related to target configuration is clearly separated.
from glow.
Question: Should we consider the idea of a dynamic backend registration in the scope of the Backend interface re-factoring?
Currently, we need to statically enumerate all the backends in the Backend.h
, in the BackendKind enum. And we have a createBackend
method that creates new backends instances based this information. This approach results in a tight coupling and in the need to re-build the whole project if a new backend is added. Also the integration of new backends is more complicated due to this.
May be we should switch to the registry model, where backends basically register themselves (e.g. their name and the backend object responsible for managing of new instances, etc) and then they can be created by calling something like createBackend(backendName)
?
from glow.
As a first step, we can split the Backend
state into three parts:
- The hardware abstraction layer is a class with a number of const virtual functions. It contains no state other than configuration data, and it references no mutable state.
- Temporary data used during compilation.
- State representing a compiled function.
This first step does not address the issues with multithreading that the use cases bring up. A second refactoring step can handle this by distinguishing between a compiled function and a bound function which has fixed input and output locations.
from glow.
Phase 2: Execution environment
The initial incarnation of CompiledFunction
above contains both the result of compilation and the state needed during execution, such as the input/output variables in the module. This means that a compiled function can't be run in two threads concurrently, for example.
We can separate these two kinds of data such that a single compilation can be reused for concurrent execution:
CompiledFunction
owns the compiled code and possibly some constant data.BoundFunction
has bindings to specific input/output variable instances and it owns memory buffers that are mutated during execution, such as internal activations.
There can be multiple BoundFunction
instances associated with a single CompiledFunction
instance. This enables concurrent execution, whether on multiple threads or multiple hardware accelerators.
We can add a CompiledFunction::bind(...)
method which returns a unique_pointer<BoundFunction>
. Compare to the onnxSetGraphIO
function; cc @rdzhabarov. Then move CompiledFunction::execute()
to BoundFunction::execute()
An unresolved issue is how we handle multiple hardware devices. It seems that a BoundFunction
should also be associated with a specific device.
from glow.
Use cases
To inform the design, it's useful to look at a couple use cases for Glow. These all assume that Glow is used as a JIT compiler, i.e. we're not concerned with cross compilation or saving compiled models to disk.
Parallel CPU inference
Running parallel inferences with a single fixed graph on a multicore CPU:
- Compile graph once.
- Execute compiled graph on many threads.
Parallel inference on NUMA CPU
Same as above, but running on a multi-socket NUMA server:
- Compile graph once.
- Make separate copy of weights for each socket.
- Execute compiled graph on many threads per socket, using the weights local to the socket.
Pipelined inference on NUMA CPU
To save memory, we want to avoid multiple copies of the weights. Instead, we partition the graph and distribute it among the sockets. That way, each socket holds part of the weights.
- Partition graph into multiple functions.
- Compile each function once.
- Set up multiple pipelines with semaphores, one per core in each socket.
- Run multiple inference jobs through the pipelines in parallel.
Pipelined inference on two GPUs
Say the weights of our graph are too big to fit on one GPU's high-bandwidth memory, but we have two GPUs. We want to partition the graph into two parts that each fit on one GPU.
- Partition graph into two functions.
- Compile each function once.
- Set up single pipeline transferring the intermediate output from one GPU to the next one.
- Run inference jobs through the pipeline. Two jobs can be active at the same time.
Multiple graphs on single GPU
We want to run inference on multiple different graphs with low latency. The weights for all the graphs fit on HBM of a single GPU.
- Compile each graph once.
- Copy code and weights for all graphs to the GPU's HBM.
- Run different types of inference jobs without needing to copy weights or code. Only input/output data is copied.
Observations
We want the Backend
design to be compatible with these kinds of use cases. This doesn't necessarily mean that all the backends can do all these things, but the design shouldn't prevent them.
- The parallel CPU use case suggests that we need to distinguish between shared constant data (i.e., weights) and dynamic per-run data (inputs, outputs, and activations). If we want to reuse one compilation on multiple threads, dynamic data needs to be relocatable or stack-based.
- The NUMA use case suggests that constant data also needs to be relocatable in some cases.
- The GPU use cases need to separate copying weights and code to the device from executing a single inference job.
- We don't want to hang on to temporary compiler data structures like the graph and IR after we finished compilation. That's wasting memory, and IR can be large in some cases.
This all suggests that maybe variables should not belong to the Module along with the compiler IR. Such a change is not in scope for this issue, but it is worth keeping in mind when designing the Backend interface.
from glow.
I think the remaining refactoring can be part of the runtime project. They'll understand the requirements better.
from glow.
Good idea, registerBackend in global constructor?
from glow.
Is this completed? Are we leaving it for the larger runtime project?
from glow.
Related Issues (20)
- Request release - This pull request was **exported** from Phabricator. Differential Revision: [D39456245](https://www.internalfb.com/diff/D39456245) HOT 1
- model-compiler missing HOT 1
- BatchedReduceAdd and BatchedReduceProd multi-axis support
- Where Glow invokes llvm ? HOT 2
- build_Debug error HOT 1
- Multiple tensorflow-lite models in one application
- Glow cmsis support on aarch64
- Typed pointers support will be dropped in LLVM 17 HOT 1
- Compile error when building glow HOT 6
- Error occurred during the compilation of Glow
- FAILED: bin/image-classifier during building glow
- Unable to access glow model HOT 3
- Adding a library for a new backend, CMSIS. HOT 1
- Adding Support for ELU Activation functions
- Evaluate Profile-Guided Optimization (PGO)
- How to build glow for Buildroot based Linux?
- Sunsetting Glow Active Development HOT 1
- Error building on Mac Ventura
- Cannot open connection when downloading pb file
- Is the Glow still under development and maintenance? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from glow.