Comments (6)
I thought about the problem and came up with en even better potential solution!
- Each operation would receive owned input tensors, which would allow them to reuse data storage or buffer for performance improvement. However, they would also need to handle shared data structures.
- Tensor would no longer implement
Clone
, but would have to implementshare
instead. This would involve creating a new method for creating shared references to tensor data.
pub trait TensorOps<B> {
...
fn share<const D: usize>(tensor: &mut B::TensorPrimitive<D> ) -> B::TensorPrimitive<D>;
}
Backends can implement this with a simple clone if they want, but they can also change the datastore to a shared one.
struct MyTensorPrimitive {
...
storage: MyTensorStorage,
}
enum MyTensorStorage {
Owned(Storage),
Shared(Arc<Storage>),
}
The share
function implementation modifies the inner storage by making it immutable in an Arc
reference. This allows backends to have more flexibility to reuse existing buffers without increasing the number of functions they need to implement.
I don't see any drawback to this solution. It does not increase the size of the API in the Backend
trait or the Tensor
struct, does not require graph analysis for performance improvement, and even allows for partial mutability in the API (the left-hand side Tensor may be shared, but the right-hand side may not, allowing for even more optimization opportunities). It also provides room for better documentation, as we can add custom documentation to the share
method but not the Clone
trait.
from burn.
Thanks for the proposal. I'd like to highlight the pros and cons of having mutable operations.
Pros:
- Potentially increase performance, particularly during inference rather than training. This is because tensors often need to be reused in the backward pass during training, which requires an immutable API or frequent cloning.
Cons:
- Increase the size of the backend API
- May increase the userland Tensor API
- Decrease developer experience by requiring them to choose between the mutable and immutable versions of an operation.
I have two potential solutions in mind:
-
Allow backends to implement mutable operations (with default implementations provided). However, I would not include these mutable operations in the userland Tensor API. Instead, a lazy decorator backend could analyze the computational graph and use these mutable operations internally. One potential issue with this approach is that the decorator backend would need to handle dynamic partial graphs, which may make it difficult to know for certain if a tensor will never be used.
-
Another way to allow mutating tensor in the backend is to change the API so that each operation takes ownership of each input tensor. Each backend could then handle the reusability of tensor data in their clone implementation of the tensor primitive. This solution is simpler, but it's not clear how we could provide more information to backends to help them know when to share storage or reuse and modify it.
Maybe both solutions could be combined in a way that simplifies the decorator backend's analysis of graphs, using explicit clone calls to provide lifetime information.
from burn.
+1 on improving inference performance.
from burn.
I came across clone_from
method that could be memory efficient: https://doc.rust-lang.org/nightly/core/clone/trait.Clone.html#method.clone_from
from burn.
@nathanielsimard You worked on this. Is this ticket complete?
from burn.
Yes it's completed.
from burn.
Related Issues (20)
- Conv3d feature request HOT 1
- "import * as __wbg_star0 from 'env';" in the glue of wasm-bindgen when using wgpu backend
- Panic in example image-classification-web when activate 2 times Wgpu backend HOT 3
- Codecov actions broken since last update
- Chore: Use Candle's published crate version instead of git source code HOT 2
- Add logging functions to `burn-common` HOT 4
- RUSTSEC-2021-0145: Potential unaligned read
- RUSTSEC-2020-0140: `Shared` can cause a data race
- Metrics are stored to the wrong folder when restarting from a checkpoint
- Casting to Int/Float of Boolean Tensors for WGPU backend
- Burn Book Documentation HOT 1
- Outdated Burn Book Documentation in Production HOT 4
- Improve module initialization. HOT 2
- hugging face candle and burn HOT 1
- Add support for Any,All operations on Tensor HOT 1
- Add named enum support
- Workflow update-cargo-lock does not trigger CI checks
- Can't load `squad` dataset. Load huggingface datasets fail. HOT 1
- Nonconstrastive estimation loss HOT 2
- Print model structure like with PyTorch
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from burn.