valeoai / maskgit-pytorch Goto Github PK

License: MIT License

Python 6.97% Jupyter Notebook 93.03%

maskgit-pytorch's Issues

reproducibility

Hi @llvictorll, thanks for your nice reproduction. When I evaluated the checkpoints provided, with the following command

torchrun --standalone --nnodes=1 --nproc_per_node=1 main.py --bsize 128 --data-folder imagenet --vit-folder pretrained_maskgit/MaskGIT/MaskGIT_ImageNet_256.pth --vqgan-folder pretrained_maskgit/VQGAN/ --writer-log logs --num_workers 16 --img-size 256 --epoch 301 --resume --test-only

Size of model autoencoder: 72.142M                                                                                                                       
Acquired codebook size: 1024                                                                                                                             
load ckpt from: pretrained_maskgit/MaskGIT/MaskGIT_ImageNet_256.pth                                                                                      
Size of model vit: 174.161M                                                                                                                              
Evaluation with hyper-parameter ->                                                                                                                       
scheduler: arccos, number of step: 8, softmax temperature: 1.0, cfg weight: 3, gumbel temperature: 4.5                                                   
{'Eval/fid_conditional': 7.655000113633889, 'Eval/inception_score_conditional': 228.72691345214844, 'Eval/precision_conditional': 0.8194600000000002, 'Eval/recall_conditional': 0.5016600000000001, 'Eval/density_conditional': 1.2358733333333334, 'Eval/coverage_conditional': 0.8560800000000001}

The FID result is lower than you have reported (6.80), as shown above. Could you please help figure out where this gap come from? Thanks.

Has anyone successfully run the code

Warm-up of CFG weight

First of all, thank you for providing such great codes and materials. I was also struggling to reproduce MaskGIT, so it has been a tremendous help.

I noticed an implementation that was not mentioned in the report, which is the warm-up of CFG weight during sampling.

Maskgit-pytorch/Trainer/vit.py

Line 372 in 3e5d3eb

_w = w * (indice / len(scheduler))

If you don't mind, could you please provide insights into the differences in results when this warm-up is applied versus not applied?

Here's another minor point, but would it be more in line with the intended processing if the weight calculation is modified as follows?
_w = w * (indice / (len(scheduler)-1))

Target tokens for loss computation

Hi, I'd like to qustion about a loss computation part.
This repository (and the original repository?) compute cross-emtropy loss with entire groud-truth tokens.
This implies that the model learns to predict 'known (unmasked)' tokens as well, which is relatively easy to estimate.
As a result, the training may exhibit a strong bias towards the known tokens.

Maskgit-pytorch/Trainer/vit.py

Line 191 in d2ba643

 loss = self.criterion(pred.reshape(-1, 1024 + 1), code.view(-1)) / self.args.grad_cum 

Intuitively thinking, in this case, a model firstly ignore the loss of 'masked' tokens and the loss of known tokens would be drastically descreased at the beginning of training.

I think there is another option of masking the known position in the target tokens, which results in forcing a model to predict only the unknown (masked) tokens (as same as the approach taken in the following repository).
https://github.com/lucidrains/muse-maskgit-pytorch/blob/main/muse_maskgit_pytorch/muse_maskgit_pytorch.py#L680

I'd like to know if you have any insights on the following two points regarding this.

Which approach tends to yield better learning outcomes: masking the target or not masking the target?
Could you share the loss curve through epochs during training so that we can confirm if our training is going well?

Thank you for considering my request!
Best regards,
Yukara

Clarification on Additional Token Usage and Embedding in Maskgit-pytorch Transformer

Hi. Thanks for the great work. I have two questions.

Can you please clarify what the second 1 is used for?
codebook_size is 1024, so its indices are between [0, 1023]. The first 1 in the code is for the mask token, which is 1024. nclass is 1000 for ImageNet. I do not understand the purpose of increasing the nn.Embedding with another 1.
Link to code
In the following code, why is self.codebook_size+1 used instead of self.codebook_size? What is the purpose of the additional token when after that we compute the cross-entropy loss?
Link to code

How can I use my own dataset to train maskgit?

@llvictorll How can I use my own dataset to train maskgit?

Unneccesary Dropout layer in FeedForward network

According to the original implementation, may the following Dropout layer not be needed?
(even though Dropout is applied in this position in the original BERT implementation)

Maskgit-pytorch/Network/transformer.py

Line 41 in d2ba643

nn.Dropout(dropout),

(original): https://github.com/google-research/maskgit/blob/main/maskgit/nets/maskgit_transformer.py#L78

questions about two stage training

Hey @llvictorll and team,

Really appreciate your reproducing and open source it! It's really helpful for the community. I want to further understand the training and fine-tuning strategy mentioned in the tech report Sec.2. Is that meaning the first stage training is for 256256 and the second fine-tuning is for 512512?

It would be very helpful if you can kindly explain it more.

train vqgan

I want to use my own Dataset to train. So do I need to retrain vqgan? if so I see that vqgan training seems to be missing the discriminator. how do I train vqgan?

About the training intermediate result.

Hello, really thanks for your great work,
I have a question about the intermediate result. I am trying to reproduce the result in the video domain, but it is really hard to train and the loss does not drop significantly and it keeps producing the pure image where this whole image contains only one color. I just want to ask is it also true for the intermedia result in the Maskgit ?
Really looking forward and thanks for your reply

Question about the DataPreprocessing when training ImageNet

Hi, Thanks for your wonderful work！
I notice that maskgit uses RandomResizedandCrop for data augmentation. But I find that in the code you adopt Cropping and flipping and comment out the Normalization (since I think VQGAN is trained with normalization).

Maskgit-pytorch/Trainer/trainer.py

Line 80 in b0b2b3c

t_train = transforms.Compose([transforms.Resize(self.args.img_size),

I am curious about the reason behind.

Thanks in advance!

Regarding training a mask model with my own data, could you please provide guidance on the steps involved

Thank you for your great work，I have some questions I would like to ask you, if you don't mind.
data_folder="/datasets_local/ImageNet/"
vit_folder="./pretrained_maskgit/MaskGIT/MaskGIT_ImageNet_256.pth"
vqgan_folder="./pretrained_maskgit/VQGAN/"
writer_log="./logs/"
num_worker=16
bsize=64

Single GPU

python main.py --bsize ${bsize} --data-folder "${data_folder}" --vit-folder "${vit_folder}" --vqgan-folder "${vqgan_folder}" --writer-log "${writer_log}" --num_workers ${num_worker} --img-size 256 --epoch 301 --resume

Multiple GPUs single node

torchrun --standalone --nnodes=1 --nproc_per_node=gpu main.py --bsize ${bsize} --data-folder "${data_folder}" --vit-folder "${vit_folder}" --vqgan-folder "${vqgan_folder}" --writer-log "${writer_log}" --num_workers ${num_worker} --img-size 256 --epoch 301 --resume
If I want to train the mask with custom data, what changes do I need to make to this code? I've already trained my own VQGAN

Training loss jumping up when resuming training.

Hey there,

when I load the model and optimizer state dict from a checkpoint and try to resume training, the training loss suddenly spikes up, removing a lot of the progress the previous training run brought. After a while it goes down again, but the training process is set back by a large margin.

Would you, by chance, know what causes this behavior?

Thanks a lot in advance!

Sampling with CFG = 0

Hello,

in the vit.py, on line 381, there is

logit = self.vit(code.clone(), labels, drop_label=~drop)

When debugging, I found that drop is Tensor([True, True, ...]), so it is turned to Tensor([False, False, ...]), meaning the labels are not dropped.
I'm wondering whether this is working as expected, since a CFG of 0 usually means that the label is ignored, right?

valeoai / maskgit-pytorch Goto Github PK

maskgit-pytorch's Issues

Single GPU

Multiple GPUs single node

Recommend Projects

Recommend Topics

Recommend Org

Jobs