Comments (7)
Before I get into the internals of Mamba implementation, this maybe a useful thread.
The following snippet keeps a check on memory :
for ix in range(2) :
inp = torch.randint(0, 128, (2, 512)).to('cuda')
out = model(inp, use_cache=False)
del out
del inp
print(torch.cuda.max_memory_allocated())
for ix in range(2) :
inp = torch.randint(0, 128, (2, 512)).to('cuda')
out = model(inp, use_cache=False)
del out
del inp
print(torch.cuda.max_memory_allocated())
and I get the following output :
513157120
513157120
After only loading the modules (imports) result of torch.cuda.max_memory_allocated() = 0
. (expected)
After loading the model to RAM result of torch.cuda.max_memory_allocated() = 0
. (expected)
After model = model.to('cuda')
result of torch.cuda.max_memory_allocated() = 5529600
. ~5.5Mb
After a forward pass using a dummy batch as above result is 513157120
~ 0.5Gb
Using the del commands, the usage remains same.
from transformers.
The issue seems not to be one of those described in the thread. Rather than gradually increasing each epoch, memory spikes during forward pass, spike amplitudes seems to be consistent between epochs.
I've updated the Kaggle notebook - please check Version 6. It now includes torch cuda memory profile image, as well as memory usage prints in different setups backing my suspicion that the problem is in model itself.
from transformers.
I'm carefully reminding you about this issue in case you've forgotten about it
from transformers.
I agree, there is something going on with the forward pass itself. The del hack atleast prevents linear growth across multiple forward passes ( but within one pass it still takes unreasonable size of memory ). I will take a look at this later in the week ( limited bandwidth ).
from transformers.
@famishedrover @ArthurZucker @younesbelkada @amyeroberts @koayon
I've encountered the same issue as described above. I've tried to used transformers mamba implmentation instead of state-spaces/mamba. If neccessary I can provide example of code, Please fix this isssue, I will be really appreciative of this
from transformers.
You would need to provide a reproducer. If you try the original state space model and have the kernels, the hf model should not really change much.
You should make sure you are testing equiavlent things: gradient or not, fast path or not, use cache or not
from transformers.
@v4ndi Please provide the reproducer, it will be very helpful
from transformers.
Related Issues (20)
- Cannot convert llama 3 model to hf HOT 3
- error when using PPO in Gemma HOT 11
- Llama3 models causing `TypeError: not a string` error in LlamaTokenizer HOT 4
- Some functional problems in the implementation of Speculative Decoding HOT 6
- Error During Training with PatchTSMixerForTimeSeriesClassification for Time Series Classification HOT 1
- Whisper assistant decoding not working with pipeline
- Error During Training with PatchTSMixerForTimeSeriesClassification for Time Series Classification HOT 5
- TypeError: WhisperForConditionalGeneration.forward() got an unexpected keyword argument 'model' HOT 5
- FutureWarning about resume_download is raised after huggingface-hub 0.23.0 release
- Remove pipelines, chatformatters, templates etc --> Replace with simple generator function / manual string interpolation ---> Just have one standardized way for building datasets and running inference HOT 2
- HTML Files Keep on Loading HOT 2
- Wav2Vec2ForCTC weight mismatch HOT 1
- More memory consumption than litgpt HOT 1
- Setting compute_metrics in Trainer with Idefics2ForConditionalGeneration leads to AttributeError: 'DynamicCache' object has no attribute 'detach' HOT 14
- DPT implementation contains unused parameters HOT 4
- Urdu Encoding Issue in Hugging Face Tokenizer HOT 1
- Add Prismatic VLMs to Transformers HOT 5
- Error converting from PyTorch to HuggingFace - Mistral / Mixtral HOT 2
- model_max_length default parameters are missing in transformers>=4.40.0 HOT 2
- (Have PR) Speed up `BeamScorer` to make GPT-2 generation 2-3x faster HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from transformers.