I'm hoping for clarification on the files passed into config_discriminator.yaml. <

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

@ <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Clarification on files in config_discriminator.yaml about nmt_gan HOT 20 CLOSED

kellymarchisio commented on May 27, 2024

Clarification on files in config_discriminator.yaml

from nmt_gan.

Comments (20)

ZhenYangIACAS commented on May 27, 2024

@kellymarchisio Firstly, Thanks for your contributions to this code. Actually, your understanding is right. However, the vocabulary file is a little different from vocab.bpe.32000 released by en-de wmt corpora in the artificial tokens, such as "PAD", "S", "S" and "UNK". These tokens are utilized for preparing the training data. You only need to add these four tokens manually at the begin of the vocab.bpe.32000. I checked the log file in our experiments again, I find that in our experiments, the discriminator achieves accuracy with 0.7 after 2-epochs training.

from nmt_gan.

jeicy07 commented on May 27, 2024

In my project, when running the discriminator pretraining, my loss also falls to 0-2, but the accuracy always oscillates around 0.5. I wonder what kind of problems may cause that? Thanks.

from nmt_gan.

kellymarchisio commented on May 27, 2024

@jeicy07 In my project, the silly reason this behaviour was caused was because my pickled dictionary was built incorrectly. I fixed it to make sure it was a string:int mapping of word:id. When this was broken, my entire src/trg/neg matrices were written as 1s (for UNK). It sounds like the behaviour you observe is symptomatic of indistinguishable matrices. Try logging the final matrices you feed into the discriminator, to see if you observe anything unusual. Then backtrack from there.

from nmt_gan.

jeicy07 commented on May 27, 2024

Thanks, I've tried to convert my pickled dictionary into a dict. Finally, it works!

from nmt_gan.

kellymarchisio commented on May 27, 2024

@ZhenYangIACAS Thanks very much for your response. After fixing some errors, I also achieve accuracy 0.70 after 2 epochs. After how many epochs do you reach 0.82/0.95? I am still training (~epoch 4) but performance is still ~0.70.

I notice though that the accuracy bounces around quite significantly as seen here:

testing the accuracy on the evaluation sets when epoch 1, samples 3400000
the total accuracy in evaluation is 0.714286
testing the accuracy on the evaluation sets when epoch 1, samples 3410000
the total accuracy in evaluation is 0.708494
testing the accuracy on the evaluation sets when epoch 1, samples 3420000
the total accuracy in evaluation is 0.722008
testing the accuracy on the evaluation sets when epoch 1, samples 3430000
the total accuracy in evaluation is 0.735521
testing the accuracy on the evaluation sets when epoch 1, samples 3440000
the total accuracy in evaluation is 0.700772

Is this expected, or a bug?

I also notice that loss alternates between very high values and lower values at the beginning of training:

epoch 0, samples 100, loss 8.257384, accuracy 0.510000 BatchTime 21.912569
epoch 0, samples 200, loss 109.485771, accuracy 0.500000 BatchTime 1.326053
epoch 0, samples 300, loss 26.595387, accuracy 0.500000 BatchTime 1.261572
epoch 0, samples 400, loss 109.342659, accuracy 0.500000 BatchTime 1.244808
epoch 0, samples 500, loss 24.455862, accuracy 0.500000 BatchTime 1.179417
epoch 0, samples 600, loss 101.686081, accuracy 0.500000 BatchTime 1.192621

Is this also expected, and what might cause this behavior? I would expect loss to monotonically decrease.

Thanks very much for releasing this code base - I've enjoyed working with it.

from nmt_gan.

ZhenYangIACAS commented on May 27, 2024

@ @kellymarchisio I am sorry for late response. Your loss is so strange that it varies significantly ranging from the upper bound and lower bound. In our experiments, the loss should decrease smoothly. Have you shuffled your training data?

from nmt_gan.

kellymarchisio commented on May 27, 2024

@ZhenYangIACAS Yes, the training data is being shuffled. Now on Epoch 5, the model has begun to overfit. The peak was ~0.71-0.72 in earlier epochs. This config I'm using is below. Does anything look amiss here?

src_vocab: '/local/scratch/kvm23/angec_final/yang-gan/model_data/data_gan_ende_4.5mil_test/generate_data/vocab.bpe.32000.e'
dst_vocab: '/local/scratch/kvm23/angec_final/yang-gan/model_data/data_gan_ende_4.5mil_test/generate_data/vocab.bpe.32000.e'
src_vocab_size: 32000
dst_vocab_size: 32000
hidden_units: 512
scale_embedding: True
attention_dropout_rate: 0.0
residual_dropout_rate: 0.1
num_blocks: 6
num_heads: 8
binding_embedding: False
train:
logdir: '/local/scratch/kvm23/angec_final/yang-gan/experience/ende-4.5mil-test/dis_pretrain/4'
dis_src_vocab: '/local/scratch/kvm23/angec_final/yang-gan/model_data/data_gan_ende_4.5mil_test/dis_pretrain/vocab.bpe.32000.e.pkl'
dis_dst_vocab: '/local/scratch/kvm23/angec_final/yang-gan/model_data/data_gan_ende_4.5mil_test/dis_pretrain/vocab.bpe.32000.e.pkl'
dis_max_epoches: 10
dis_dispFreq: 1
dis_saveFreq: 100
dis_devFreq: 100
dis_batch_size: 100
dis_saveto: '/local/scratch/kvm23/angec_final/yang-gan/models/ende-4.5mil-test/4/disc_pretrain'
dis_reshuffle: True
dis_gpu_device: 'gpu-0'
dis_max_len: 50
dis_positive_data: '/local/scratch/kvm23/angec_final/yang-gan/model_data/data_gan_ende_4.5mil_test/dis_pretrain/4/train.tok.clean.bpe.32000.de.shuf.1mil-chop60'
dis_negative_data: '/local/scratch/kvm23/angec_final/yang-gan/model_data/data_gan_ende_4.5mil_test/dis_pretrain/4/negative_predictions.txt'
dis_source_data: '/local/scratch/kvm23/angec_final/yang-gan/model_data/data_gan_ende_4.5mil_test/dis_pretrain/4/train.tok.clean.bpe.32000.en.shuf.1mil-chop60'
dis_dev_positive_data: '/local/scratch/kvm23/angec_final/yang-gan/model_data/data_gan_ende_4.5mil_test/dis_pretrain/4/train.tok.clean.bpe.32000.de.shuf.300.dev'
dis_dev_negative_data: '/local/scratch/kvm23/angec_final/yang-gan/model_data/data_gan_ende_4.5mil_test/dis_pretrain/4/negative_predictions.dev.txt'
dis_dev_source_data: '/local/scratch/kvm23/angec_final/yang-gan/model_data/data_gan_ende_4.5mil_test/dis_pretrain/4/train.tok.clean.bpe.32000.en.shuf.300.dev'
dis_dev_log: '/local/scratch/kvm23/angec_final/yang-gan/experience/ende-4.5mil-test/dis_pretrain/4/dev_log-trial2'
dis_reload: True
dis_clip_c: 1.0
dis_dim_word: 512
dis_optimizer: 'rmsprop'
dis_scope: 'discnn'

The training accuracy is now 0.75-0.90 per batch, but dev accuracy stays 0.61-0.71, as it was in epoch 2, except in epoch 2 the performance was more consistent.

from nmt_gan.

ZhenYangIACAS commented on May 27, 2024

@kellymarchisio There is no obvious error for your configuration.

from nmt_gan.

kellymarchisio commented on May 27, 2024

@ZhenYangIACAS thanks for taking a look. To verify, should dis_positive_data, dis_negative_data, etc. look like regular sentences like:

This is a sentence .

Or do I have to pad the text file sent to the config so they look like:

<S> This is a sentence . </S> <PAD> <PAD> <PAD>...

I believe I've tried both, but your verification would be helpful.

from nmt_gan.

ZhenYangIACAS commented on May 27, 2024

@kellymarchisio You do not need to add the padding to the files manually. The code will do it automatically.

from nmt_gan.

kellymarchisio commented on May 27, 2024

@ZhenYangIACAS Thank you for the clarification. According to your paper, I won't be able to reproduce the GAN training unless I get 82% accuracy. A few quick related questions:

Are there any other parameters (perhaps not mentioned in the paper) necessary to get the discriminator to 82-95% performance?
Did you use pretrained word embeddings?
What initialisation did you use, and what was the tuning to get to higher accuracy?
How many epochs did it take to get to 95% accuracy?
Can you think of a reason why my loss bounces around so much at the beginning of training? (Could this signify a bug in the code?)

from nmt_gan.

ZhenYangIACAS commented on May 27, 2024

I am sure that all of the parameters which show much effect on the translation performance are described detailed in our paper. We did not use pre-trained word embeddings. You can find the initialization method in our code. I remember that when we use the Transformer as the generator, the accuracy is hard to get to more than 90%. For your problem, it seems that a bug exists, but I am not sure.

from nmt_gan.

ashwanitanwar commented on May 27, 2024

@kellymarchisio How did you find dev accuracy in discriminator? I am using transformer as a generator. Code for finding dev accuracy is commented in cnn_discriminator.py file. I used this code but it shows several values in validation accuracy and they vary a lot.

from nmt_gan.

luckper commented on May 27, 2024

@kellymarchisio Hi，Can you show me your data sample in config_discriminator_pretrain.yaml？for example，dis_positive_data dis_negative_data and so on.
thanks

from nmt_gan.

luckper commented on May 27, 2024

@ZhenYangIACAS Hi, I see many data sample in config_discriminator_pretrain.yaml，for example： dis_positive_data，dis_negative_data，dis_dev_positive_data and so on，Can you tell me what means about this data？
What data do I need to prepare if Iwant to run the code successfully？
Thanks！！！！！

from nmt_gan.

ZhenYangIACAS commented on May 27, 2024

@ @luckper dis_positive_data is the positive data for training the discriminator and dis_negative_data is the negative data for training the discriminator. dis_dev_positive_data is the development data for training discriminator, and so on...For understanding these files, I suggest that you should scan gan_train.py. Some files are what you should prepare beforehand, and some other files are generated automatically. I notice that so many files are a little messy for the users. We will re-construct our codes if we have free time.

from nmt_gan.

luckper commented on May 27, 2024

@ZhenYangIACAS OK，pass your suggestion，I scan gan_train.py . However, I still have some questions. First, where dis_dev_positive_data , dis_dev_negative_data , dis_dev_source_data come from? And what is different from dis_positive_data,dis_negative_data,dis_source_data?
Thanks!!!!!

from nmt_gan.

ZhenYangIACAS commented on May 27, 2024

@luckper I think it is easy to get the development data sets. We just randomly sampled 200 sentences from the dis_positive_data to get the dis_dev_positive_data, and similarly, we get the corresponding dis_negative_data and dis_source_data.

from nmt_gan.

luckper commented on May 27, 2024

@ZhenYangIACAS Hi, I run the generate_sample.sh, but some errors have occurred:
Instructions for updating:
Use argmax instead
using rmsprop for g_loss
Traceback (most recent call last):
File "generate_samples.py", line 60, in
generate_samples(config)
File "generate_samples.py", line 32, in generate_samples
optimizer=config.train.optimizer)
File "/home/xxx/Downloads/ZKY-GAN_NMT/NMT_GAN-master/model.py", line 119, in build_generate
optimizer=tf.train.RMSPropOptimizer(self.config.generator.learning_rate)
File "/home/xxx/Downloads/ZKY-GAN_NMT/NMT_GAN-master/utils.py", line 19, in getattr
if type(self[item]) is dict:
KeyError: 'generator'
What is the reason? and the log file record as follows:
Instructions for updating:
Use argmax instead
INFO:root:using rmsprop for g_loss

from nmt_gan.

alwaysprep commented on May 27, 2024

@ZhenYangIACAS @luckper have you solved "KeyError: 'generator'" error.

from nmt_gan.

Clarification on files in config_discriminator.yaml about nmt_gan HOT 20 CLOSED

Comments (20)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs