Comments (5)
Also, when I use the default script for episodic training the usage of RAM increases dramatically during training. The model can use about 100G RAM after about 300 iterations. I don't know if this is reasonable.
from pytorch-meta-dataset.
Hi,
Thanks for raising this issue. Let me investigate on both problems and get back to you ASAP.
Update :
- Could you try again and let me know if the RAM problem is solved ?
- As for the resnet structure, there is indeed some discrepancy in the litterature between resnet18 (implemented in my code) and the custom resnet 12 used in several few-shot works. I will add the latter architecture soon.
from pytorch-meta-dataset.
Hi, thank you for your reply!
I have tried to modify the training script based on your new version to skip the model forward and backward and only iterate the dataloader and print the memory usage as follow
import psutil
for i, data in enumerate(tqdm_bar):
if i >= args.num_updates:
break
print("PERCENTAGE RAM USED", psutil.virtual_memory().percent)
continue
In my trial the percentage of used memory keeps increasing. I think there may be some potential memory leakage when reading the tfrecord files but I cannot figure it out.
My pytorch version is 1.9.0, with cuda 11.1. Maybe you can try my code to see if you can reproduce the problem.
from pytorch-meta-dataset.
I have tried my new code before pushing, and I had no memory leakage. When choosing my pytorch loader, the RAM capped at 16.5 GB. Please can you confirm that by running on my original code:
bash scripts/train.sh protonet resnet18 ilsvrc_2012
you don't have any leakage ? Thanks.
from pytorch-meta-dataset.
I re-clone the repo and trained the protonet with your original code. After 1400 iterations 23G RAM is used. When I train the model with 4 GPUs (by modifying the gpu configuration in base.yaml), about 80G RAM is used at 1100 iterations. And the usage keeps increasing slowly in both cases.
I assume the RAM used by the model is correlated with number of GPUs (since DDP is used) and size of an episode. In this way when the episode is large, which is exact the case in meta-dataset where the largest support set can contain 500 images, and when I want to use multiple GPUs, the code may use incredibly large RAM. I wonder if there is any solution to this problem.
from pytorch-meta-dataset.
Related Issues (20)
- CrossTransformers implementation HOT 1
- no bash file HOT 1
- No dataset_spec file found in directory
- How to run the code correctly? HOT 2
- Learn2Learn support? HOT 3
- where is make_index_files.sh HOT 1
- Too many unexpected Errors.
- Training the fine-tuned base line with standard supervised learning with union/concatenation of labels
- Sampling from episodic loader gives error - "Key image doesn't exist (select from [])!" HOT 2
- version of tensorflow-gpu being used? HOT 2
- how long does make index take?
- What are tricks to speed up training for SL and MAML?
- How can we use all classes all the time for episodic training even if the number of examples is small?
- Main feature differences between pytorch-meta-dataset and original meta-dataset?
- how to compute data set size form tfrecrods for mds (within python)? HOT 1
- Unexpected behavior from min_examples_in_class HOT 2
- shuffle buffer issue? HOT 8
- Meta-batch size hard coded to 1 HOT 2
- Feature Request: Ideally the episodes generated would be repeatable for a specified seed. HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pytorch-meta-dataset.