valeoai / maskgit-pytorch Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
Hi @llvictorll, thanks for your nice reproduction. When I evaluated the checkpoints provided, with the following command
torchrun --standalone --nnodes=1 --nproc_per_node=1 main.py --bsize 128 --data-folder imagenet --vit-folder pretrained_maskgit/MaskGIT/MaskGIT_ImageNet_256.pth --vqgan-folder pretrained_maskgit/VQGAN/ --writer-log logs --num_workers 16 --img-size 256 --epoch 301 --resume --test-only
Size of model autoencoder: 72.142M
Acquired codebook size: 1024
load ckpt from: pretrained_maskgit/MaskGIT/MaskGIT_ImageNet_256.pth
Size of model vit: 174.161M
Evaluation with hyper-parameter ->
scheduler: arccos, number of step: 8, softmax temperature: 1.0, cfg weight: 3, gumbel temperature: 4.5
{'Eval/fid_conditional': 7.655000113633889, 'Eval/inception_score_conditional': 228.72691345214844, 'Eval/precision_conditional': 0.8194600000000002, 'Eval/recall_conditional': 0.5016600000000001, 'Eval/density_conditional': 1.2358733333333334, 'Eval/coverage_conditional': 0.8560800000000001}
The FID result is lower than you have reported (6.80), as shown above. Could you please help figure out where this gap come from? Thanks.
Hi, I'd like to qustion about a loss computation part.
This repository (and the original repository?) compute cross-emtropy loss with entire groud-truth tokens.
This implies that the model learns to predict 'known (unmasked)' tokens as well, which is relatively easy to estimate.
As a result, the training may exhibit a strong bias towards the known tokens.
Maskgit-pytorch/Trainer/vit.py
Line 191 in d2ba643
I think there is another option of masking the known position in the target tokens, which results in forcing a model to predict only the unknown (masked) tokens (as same as the approach taken in the following repository).
https://github.com/lucidrains/muse-maskgit-pytorch/blob/main/muse_maskgit_pytorch/muse_maskgit_pytorch.py#L680
I'd like to know if you have any insights on the following two points regarding this.
Thank you for considering my request!
Best regards,
Yukara
Hello, really thanks for your great work,
I have a question about the intermediate result. I am trying to reproduce the result in the video domain, but it is really hard to train and the loss does not drop significantly and it keeps producing the pure image where this whole image contains only one color. I just want to ask is it also true for the intermedia result in the Maskgit ?
Really looking forward and thanks for your reply
First of all, thank you for providing such great codes and materials. I was also struggling to reproduce MaskGIT, so it has been a tremendous help.
I noticed an implementation that was not mentioned in the report, which is the warm-up of CFG weight during sampling.
Maskgit-pytorch/Trainer/vit.py
Line 372 in 3e5d3eb
Here's another minor point, but would it be more in line with the intended processing if the weight calculation is modified as follows?
_w = w * (indice / (len(scheduler)-1))
Hello,
in the vit.py, on line 381, there is
logit = self.vit(code.clone(), labels, drop_label=~drop)
When debugging, I found that drop is Tensor([True, True, ...]), so it is turned to Tensor([False, False, ...]), meaning the labels are not dropped.
I'm wondering whether this is working as expected, since a CFG of 0 usually means that the label is ignored, right?
I want to use my own Dataset to train. So do I need to retrain vqgan? if so I see that vqgan training seems to be missing the discriminator. how do I train vqgan?
Has anyone successfully run the code
Thank you for your great work,I have some questions I would like to ask you, if you don't mind.
data_folder="/datasets_local/ImageNet/"
vit_folder="./pretrained_maskgit/MaskGIT/MaskGIT_ImageNet_256.pth"
vqgan_folder="./pretrained_maskgit/VQGAN/"
writer_log="./logs/"
num_worker=16
bsize=64
python main.py --bsize ${bsize} --data-folder "${data_folder}" --vit-folder "${vit_folder}" --vqgan-folder "${vqgan_folder}" --writer-log "${writer_log}" --num_workers ${num_worker} --img-size 256 --epoch 301 --resume
torchrun --standalone --nnodes=1 --nproc_per_node=gpu main.py --bsize ${bsize} --data-folder "${data_folder}" --vit-folder "${vit_folder}" --vqgan-folder "${vqgan_folder}" --writer-log "${writer_log}" --num_workers ${num_worker} --img-size 256 --epoch 301 --resume
If I want to train the mask with custom data, what changes do I need to make to this code? I've already trained my own VQGAN
Hey @llvictorll and team,
Really appreciate your reproducing and open source it! It's really helpful for the community. I want to further understand the training and fine-tuning strategy mentioned in the tech report Sec.2. Is that meaning the first stage training is for 256256 and the second fine-tuning is for 512512?
It would be very helpful if you can kindly explain it more.
@llvictorll How can I use my own dataset to train maskgit?
According to the original implementation, may the following Dropout layer not be needed?
(even though Dropout is applied in this position in the original BERT implementation)
Maskgit-pytorch/Network/transformer.py
Line 41 in d2ba643
(original): https://github.com/google-research/maskgit/blob/main/maskgit/nets/maskgit_transformer.py#L78
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.