with current design of TensorOps, TensorPrivimitive is once initialized, never mutated

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

RFC: Tensor mutability of TensorOps about burn HOT 6 CLOSED

tracel-ai commented on May 15, 2024

RFC: Tensor mutability of TensorOps

from burn.

Comments (6)

nathanielsimard commented on May 15, 2024 2

I thought about the problem and came up with en even better potential solution!

Each operation would receive owned input tensors, which would allow them to reuse data storage or buffer for performance improvement. However, they would also need to handle shared data structures.
Tensor would no longer implement Clone, but would have to implement share instead. This would involve creating a new method for creating shared references to tensor data.

pub trait TensorOps<B> {
    ...
    fn share<const D: usize>(tensor: &mut B::TensorPrimitive<D> ) -> B::TensorPrimitive<D>;  
}

Backends can implement this with a simple clone if they want, but they can also change the datastore to a shared one.

struct MyTensorPrimitive {
   ...
   storage: MyTensorStorage,
}

enum MyTensorStorage {
    Owned(Storage),
    Shared(Arc<Storage>),
}

The share function implementation modifies the inner storage by making it immutable in an Arc reference. This allows backends to have more flexibility to reuse existing buffers without increasing the number of functions they need to implement.

I don't see any drawback to this solution. It does not increase the size of the API in the Backend trait or the Tensor struct, does not require graph analysis for performance improvement, and even allows for partial mutability in the API (the left-hand side Tensor may be shared, but the right-hand side may not, allowing for even more optimization opportunities). It also provides room for better documentation, as we can add custom documentation to the share method but not the Clone trait.

from burn.

nathanielsimard commented on May 15, 2024 1

Thanks for the proposal. I'd like to highlight the pros and cons of having mutable operations.

Pros:

Potentially increase performance, particularly during inference rather than training. This is because tensors often need to be reused in the backward pass during training, which requires an immutable API or frequent cloning.

Cons:

Increase the size of the backend API
May increase the userland Tensor API
- Decrease developer experience by requiring them to choose between the mutable and immutable versions of an operation.

I have two potential solutions in mind:

Allow backends to implement mutable operations (with default implementations provided). However, I would not include these mutable operations in the userland Tensor API. Instead, a lazy decorator backend could analyze the computational graph and use these mutable operations internally. One potential issue with this approach is that the decorator backend would need to handle dynamic partial graphs, which may make it difficult to know for certain if a tensor will never be used.
Another way to allow mutating tensor in the backend is to change the API so that each operation takes ownership of each input tensor. Each backend could then handle the reusability of tensor data in their clone implementation of the tensor primitive. This solution is simpler, but it's not clear how we could provide more information to backends to help them know when to share storage or reuse and modify it.

Maybe both solutions could be combined in a way that simplifies the decorator backend's analysis of graphs, using explicit clone calls to provide lifetime information.

from burn.

antimora commented on May 15, 2024

+1 on improving inference performance.

from burn.

antimora commented on May 15, 2024

I came across clone_from method that could be memory efficient: https://doc.rust-lang.org/nightly/core/clone/trait.Clone.html#method.clone_from

from burn.

antimora commented on May 15, 2024

@nathanielsimard You worked on this. Is this ticket complete?

from burn.

nathanielsimard commented on May 15, 2024

Yes it's completed.

from burn.

RFC: Tensor mutability of TensorOps about burn HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs