Hi, The original tensorflow implementation uses the standard structu

Also, when I use the default for episodic training the usage of RAM increases d

ResNet structure about pytorch-meta-dataset HOT 5 OPEN

mboudiaf commented on September 27, 2024

ResNet structure

from pytorch-meta-dataset.

Comments (5)

chmxu commented on September 27, 2024

Also, when I use the default script for episodic training the usage of RAM increases dramatically during training. The model can use about 100G RAM after about 300 iterations. I don't know if this is reasonable.

from pytorch-meta-dataset.

mboudiaf commented on September 27, 2024

Hi,

Thanks for raising this issue. Let me investigate on both problems and get back to you ASAP.

Update :

Could you try again and let me know if the RAM problem is solved ?
As for the resnet structure, there is indeed some discrepancy in the litterature between resnet18 (implemented in my code) and the custom resnet 12 used in several few-shot works. I will add the latter architecture soon.

from pytorch-meta-dataset.

chmxu commented on September 27, 2024

Hi, thank you for your reply!
I have tried to modify the training script based on your new version to skip the model forward and backward and only iterate the dataloader and print the memory usage as follow

    import psutil
    for i, data in enumerate(tqdm_bar):
        if i >= args.num_updates:
            break

        print("PERCENTAGE RAM USED", psutil.virtual_memory().percent)
        continue

In my trial the percentage of used memory keeps increasing. I think there may be some potential memory leakage when reading the tfrecord files but I cannot figure it out.
My pytorch version is 1.9.0, with cuda 11.1. Maybe you can try my code to see if you can reproduce the problem.

from pytorch-meta-dataset.

mboudiaf commented on September 27, 2024

I have tried my new code before pushing, and I had no memory leakage. When choosing my pytorch loader, the RAM capped at 16.5 GB. Please can you confirm that by running on my original code:

bash scripts/train.sh protonet resnet18 ilsvrc_2012

you don't have any leakage ? Thanks.

from pytorch-meta-dataset.

chmxu commented on September 27, 2024

I re-clone the repo and trained the protonet with your original code. After 1400 iterations 23G RAM is used. When I train the model with 4 GPUs (by modifying the gpu configuration in base.yaml), about 80G RAM is used at 1100 iterations. And the usage keeps increasing slowly in both cases.

I assume the RAM used by the model is correlated with number of GPUs (since DDP is used) and size of an episode. In this way when the episode is large, which is exact the case in meta-dataset where the largest support set can contain 500 images, and when I want to use multiple GPUs, the code may use incredibly large RAM. I wonder if there is any solution to this problem.

from pytorch-meta-dataset.

Recommend Projects

ResNet structure about pytorch-meta-dataset HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs