valeoai / maskgit-pytorch Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
Hi @llvictorll, thanks for your nice reproduction. When I evaluated the checkpoints provided, with the following command
torchrun --standalone --nnodes=1 --nproc_per_node=1 main.py --bsize 128 --data-folder imagenet --vit-folder pretrained_maskgit/MaskGIT/MaskGIT_ImageNet_256.pth --vqgan-folder pretrained_maskgit/VQGAN/ --writer-log logs --num_workers 16 --img-size 256 --epoch 301 --resume --test-only
Size of model autoencoder: 72.142M
Acquired codebook size: 1024
load ckpt from: pretrained_maskgit/MaskGIT/MaskGIT_ImageNet_256.pth
Size of model vit: 174.161M
Evaluation with hyper-parameter ->
scheduler: arccos, number of step: 8, softmax temperature: 1.0, cfg weight: 3, gumbel temperature: 4.5
{'Eval/fid_conditional': 7.655000113633889, 'Eval/inception_score_conditional': 228.72691345214844, 'Eval/precision_conditional': 0.8194600000000002, 'Eval/recall_conditional': 0.5016600000000001, 'Eval/density_conditional': 1.2358733333333334, 'Eval/coverage_conditional': 0.8560800000000001}
The FID result is lower than you have reported (6.80), as shown above. Could you please help figure out where this gap come from? Thanks.
Has anyone successfully run the code
First of all, thank you for providing such great codes and materials. I was also struggling to reproduce MaskGIT, so it has been a tremendous help.
I noticed an implementation that was not mentioned in the report, which is the warm-up of CFG weight during sampling.
Maskgit-pytorch/Trainer/vit.py
Line 372 in 3e5d3eb
Here's another minor point, but would it be more in line with the intended processing if the weight calculation is modified as follows?
_w = w * (indice / (len(scheduler)-1))
Hi, I'd like to qustion about a loss computation part.
This repository (and the original repository?) compute cross-emtropy loss with entire groud-truth tokens.
This implies that the model learns to predict 'known (unmasked)' tokens as well, which is relatively easy to estimate.
As a result, the training may exhibit a strong bias towards the known tokens.
Maskgit-pytorch/Trainer/vit.py
Line 191 in d2ba643
I think there is another option of masking the known position in the target tokens, which results in forcing a model to predict only the unknown (masked) tokens (as same as the approach taken in the following repository).
https://github.com/lucidrains/muse-maskgit-pytorch/blob/main/muse_maskgit_pytorch/muse_maskgit_pytorch.py#L680
I'd like to know if you have any insights on the following two points regarding this.
Thank you for considering my request!
Best regards,
Yukara
Hi. Thanks for the great work. I have two questions.
Can you please clarify what the second 1
is used for?
codebook_size
is 1024
, so its indices are between [0, 1023]
. The first 1
in the code is for the mask token, which is 1024
. nclass
is 1000
for ImageNet. I do not understand the purpose of increasing the nn.Embedding
with another 1
.
Link to code
In the following code, why is self.codebook_size+1
used instead of self.codebook_size
? What is the purpose of the additional token when after that we compute the cross-entropy loss?
Link to code
@llvictorll How can I use my own dataset to train maskgit?
According to the original implementation, may the following Dropout layer not be needed?
(even though Dropout is applied in this position in the original BERT implementation)
Maskgit-pytorch/Network/transformer.py
Line 41 in d2ba643
(original): https://github.com/google-research/maskgit/blob/main/maskgit/nets/maskgit_transformer.py#L78
Hey @llvictorll and team,
Really appreciate your reproducing and open source it! It's really helpful for the community. I want to further understand the training and fine-tuning strategy mentioned in the tech report Sec.2. Is that meaning the first stage training is for 256256 and the second fine-tuning is for 512512?
It would be very helpful if you can kindly explain it more.
I want to use my own Dataset to train. So do I need to retrain vqgan? if so I see that vqgan training seems to be missing the discriminator. how do I train vqgan?
Hello, really thanks for your great work,
I have a question about the intermediate result. I am trying to reproduce the result in the video domain, but it is really hard to train and the loss does not drop significantly and it keeps producing the pure image where this whole image contains only one color. I just want to ask is it also true for the intermedia result in the Maskgit ?
Really looking forward and thanks for your reply
Hi, Thanks for your wonderful work!
I notice that maskgit uses RandomResizedandCrop for data augmentation. But I find that in the code you adopt Cropping and flipping and comment out the Normalization (since I think VQGAN is trained with normalization).
Maskgit-pytorch/Trainer/trainer.py
Line 80 in b0b2b3c
Thanks in advance!
Thank you for your great work,I have some questions I would like to ask you, if you don't mind.
data_folder="/datasets_local/ImageNet/"
vit_folder="./pretrained_maskgit/MaskGIT/MaskGIT_ImageNet_256.pth"
vqgan_folder="./pretrained_maskgit/VQGAN/"
writer_log="./logs/"
num_worker=16
bsize=64
python main.py --bsize ${bsize} --data-folder "${data_folder}" --vit-folder "${vit_folder}" --vqgan-folder "${vqgan_folder}" --writer-log "${writer_log}" --num_workers ${num_worker} --img-size 256 --epoch 301 --resume
torchrun --standalone --nnodes=1 --nproc_per_node=gpu main.py --bsize ${bsize} --data-folder "${data_folder}" --vit-folder "${vit_folder}" --vqgan-folder "${vqgan_folder}" --writer-log "${writer_log}" --num_workers ${num_worker} --img-size 256 --epoch 301 --resume
If I want to train the mask with custom data, what changes do I need to make to this code? I've already trained my own VQGAN
Hey there,
when I load the model and optimizer state dict from a checkpoint and try to resume training, the training loss suddenly spikes up, removing a lot of the progress the previous training run brought. After a while it goes down again, but the training process is set back by a large margin.
Would you, by chance, know what causes this behavior?
Thanks a lot in advance!
Hello,
in the vit.py, on line 381, there is
logit = self.vit(code.clone(), labels, drop_label=~drop)
When debugging, I found that drop is Tensor([True, True, ...]), so it is turned to Tensor([False, False, ...]), meaning the labels are not dropped.
I'm wondering whether this is working as expected, since a CFG of 0 usually means that the label is ignored, right?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.