GithubHelp home page GithubHelp logo

Comments (20)

ZhenYangIACAS avatar ZhenYangIACAS commented on May 27, 2024

@kellymarchisio Firstly, Thanks for your contributions to this code. Actually, your understanding is right. However, the vocabulary file is a little different from vocab.bpe.32000 released by en-de wmt corpora in the artificial tokens, such as "PAD", "S", "S" and "UNK". These tokens are utilized for preparing the training data. You only need to add these four tokens manually at the begin of the vocab.bpe.32000. I checked the log file in our experiments again, I find that in our experiments, the discriminator achieves accuracy with 0.7 after 2-epochs training.

from nmt_gan.

jeicy07 avatar jeicy07 commented on May 27, 2024

In my project, when running the discriminator pretraining, my loss also falls to 0-2, but the accuracy always oscillates around 0.5. I wonder what kind of problems may cause that? Thanks.

from nmt_gan.

kellymarchisio avatar kellymarchisio commented on May 27, 2024

@jeicy07 In my project, the silly reason this behaviour was caused was because my pickled dictionary was built incorrectly. I fixed it to make sure it was a string:int mapping of word:id. When this was broken, my entire src/trg/neg matrices were written as 1s (for UNK). It sounds like the behaviour you observe is symptomatic of indistinguishable matrices. Try logging the final matrices you feed into the discriminator, to see if you observe anything unusual. Then backtrack from there.

from nmt_gan.

jeicy07 avatar jeicy07 commented on May 27, 2024

Thanks, I've tried to convert my pickled dictionary into a dict. Finally, it works!

from nmt_gan.

kellymarchisio avatar kellymarchisio commented on May 27, 2024

@ZhenYangIACAS Thanks very much for your response. After fixing some errors, I also achieve accuracy 0.70 after 2 epochs. After how many epochs do you reach 0.82/0.95? I am still training (~epoch 4) but performance is still ~0.70.

I notice though that the accuracy bounces around quite significantly as seen here:

  • testing the accuracy on the evaluation sets when epoch 1, samples 3400000
  • the total accuracy in evaluation is 0.714286
  • testing the accuracy on the evaluation sets when epoch 1, samples 3410000
  • the total accuracy in evaluation is 0.708494
  • testing the accuracy on the evaluation sets when epoch 1, samples 3420000
  • the total accuracy in evaluation is 0.722008
  • testing the accuracy on the evaluation sets when epoch 1, samples 3430000
  • the total accuracy in evaluation is 0.735521
  • testing the accuracy on the evaluation sets when epoch 1, samples 3440000
  • the total accuracy in evaluation is 0.700772

Is this expected, or a bug?

I also notice that loss alternates between very high values and lower values at the beginning of training:

  • epoch 0, samples 100, loss 8.257384, accuracy 0.510000 BatchTime 21.912569
  • epoch 0, samples 200, loss 109.485771, accuracy 0.500000 BatchTime 1.326053
  • epoch 0, samples 300, loss 26.595387, accuracy 0.500000 BatchTime 1.261572
  • epoch 0, samples 400, loss 109.342659, accuracy 0.500000 BatchTime 1.244808
  • epoch 0, samples 500, loss 24.455862, accuracy 0.500000 BatchTime 1.179417
  • epoch 0, samples 600, loss 101.686081, accuracy 0.500000 BatchTime 1.192621

Is this also expected, and what might cause this behavior? I would expect loss to monotonically decrease.

Thanks very much for releasing this code base - I've enjoyed working with it.

from nmt_gan.

ZhenYangIACAS avatar ZhenYangIACAS commented on May 27, 2024

@ @kellymarchisio I am sorry for late response. Your loss is so strange that it varies significantly ranging from the upper bound and lower bound. In our experiments, the loss should decrease smoothly. Have you shuffled your training data?

from nmt_gan.

kellymarchisio avatar kellymarchisio commented on May 27, 2024

@ZhenYangIACAS Yes, the training data is being shuffled. Now on Epoch 5, the model has begun to overfit. The peak was ~0.71-0.72 in earlier epochs. This config I'm using is below. Does anything look amiss here?

  • src_vocab: '/local/scratch/kvm23/angec_final/yang-gan/model_data/data_gan_ende_4.5mil_test/generate_data/vocab.bpe.32000.e'
    dst_vocab: '/local/scratch/kvm23/angec_final/yang-gan/model_data/data_gan_ende_4.5mil_test/generate_data/vocab.bpe.32000.e'
    src_vocab_size: 32000
    dst_vocab_size: 32000
    hidden_units: 512
    scale_embedding: True
    attention_dropout_rate: 0.0
    residual_dropout_rate: 0.1
    num_blocks: 6
    num_heads: 8
    binding_embedding: False
    train:
    logdir: '/local/scratch/kvm23/angec_final/yang-gan/experience/ende-4.5mil-test/dis_pretrain/4'
    dis_src_vocab: '/local/scratch/kvm23/angec_final/yang-gan/model_data/data_gan_ende_4.5mil_test/dis_pretrain/vocab.bpe.32000.e.pkl'
    dis_dst_vocab: '/local/scratch/kvm23/angec_final/yang-gan/model_data/data_gan_ende_4.5mil_test/dis_pretrain/vocab.bpe.32000.e.pkl'
    dis_max_epoches: 10
    dis_dispFreq: 1
    dis_saveFreq: 100
    dis_devFreq: 100
    dis_batch_size: 100
    dis_saveto: '/local/scratch/kvm23/angec_final/yang-gan/models/ende-4.5mil-test/4/disc_pretrain'
    dis_reshuffle: True
    dis_gpu_device: 'gpu-0'
    dis_max_len: 50
    dis_positive_data: '/local/scratch/kvm23/angec_final/yang-gan/model_data/data_gan_ende_4.5mil_test/dis_pretrain/4/train.tok.clean.bpe.32000.de.shuf.1mil-chop60'
    dis_negative_data: '/local/scratch/kvm23/angec_final/yang-gan/model_data/data_gan_ende_4.5mil_test/dis_pretrain/4/negative_predictions.txt'
    dis_source_data: '/local/scratch/kvm23/angec_final/yang-gan/model_data/data_gan_ende_4.5mil_test/dis_pretrain/4/train.tok.clean.bpe.32000.en.shuf.1mil-chop60'
    dis_dev_positive_data: '/local/scratch/kvm23/angec_final/yang-gan/model_data/data_gan_ende_4.5mil_test/dis_pretrain/4/train.tok.clean.bpe.32000.de.shuf.300.dev'
    dis_dev_negative_data: '/local/scratch/kvm23/angec_final/yang-gan/model_data/data_gan_ende_4.5mil_test/dis_pretrain/4/negative_predictions.dev.txt'
    dis_dev_source_data: '/local/scratch/kvm23/angec_final/yang-gan/model_data/data_gan_ende_4.5mil_test/dis_pretrain/4/train.tok.clean.bpe.32000.en.shuf.300.dev'
    dis_dev_log: '/local/scratch/kvm23/angec_final/yang-gan/experience/ende-4.5mil-test/dis_pretrain/4/dev_log-trial2'
    dis_reload: True
    dis_clip_c: 1.0
    dis_dim_word: 512
    dis_optimizer: 'rmsprop'
    dis_scope: 'discnn'

The training accuracy is now 0.75-0.90 per batch, but dev accuracy stays 0.61-0.71, as it was in epoch 2, except in epoch 2 the performance was more consistent.

from nmt_gan.

ZhenYangIACAS avatar ZhenYangIACAS commented on May 27, 2024

@kellymarchisio There is no obvious error for your configuration.

from nmt_gan.

kellymarchisio avatar kellymarchisio commented on May 27, 2024

@ZhenYangIACAS thanks for taking a look. To verify, should dis_positive_data, dis_negative_data, etc. look like regular sentences like:

  • This is a sentence .

Or do I have to pad the text file sent to the config so they look like:

  • <S> This is a sentence . </S> <PAD> <PAD> <PAD>...

I believe I've tried both, but your verification would be helpful.

from nmt_gan.

ZhenYangIACAS avatar ZhenYangIACAS commented on May 27, 2024

@kellymarchisio You do not need to add the padding to the files manually. The code will do it automatically.

from nmt_gan.

kellymarchisio avatar kellymarchisio commented on May 27, 2024

@ZhenYangIACAS Thank you for the clarification. According to your paper, I won't be able to reproduce the GAN training unless I get 82% accuracy. A few quick related questions:

  • Are there any other parameters (perhaps not mentioned in the paper) necessary to get the discriminator to 82-95% performance?
  • Did you use pretrained word embeddings?
  • What initialisation did you use, and what was the tuning to get to higher accuracy?
  • How many epochs did it take to get to 95% accuracy?
  • Can you think of a reason why my loss bounces around so much at the beginning of training? (Could this signify a bug in the code?)

from nmt_gan.

ZhenYangIACAS avatar ZhenYangIACAS commented on May 27, 2024

I am sure that all of the parameters which show much effect on the translation performance are described detailed in our paper. We did not use pre-trained word embeddings. You can find the initialization method in our code. I remember that when we use the Transformer as the generator, the accuracy is hard to get to more than 90%. For your problem, it seems that a bug exists, but I am not sure.

from nmt_gan.

ashwanitanwar avatar ashwanitanwar commented on May 27, 2024

@kellymarchisio How did you find dev accuracy in discriminator? I am using transformer as a generator. Code for finding dev accuracy is commented in cnn_discriminator.py file. I used this code but it shows several values in validation accuracy and they vary a lot.

from nmt_gan.

luckper avatar luckper commented on May 27, 2024

@kellymarchisio Hi,Can you show me your data sample in config_discriminator_pretrain.yaml?for example,dis_positive_data dis_negative_data and so on.
thanks

from nmt_gan.

luckper avatar luckper commented on May 27, 2024

@ZhenYangIACAS Hi, I see many data sample in config_discriminator_pretrain.yaml,for example: dis_positive_data,dis_negative_data,dis_dev_positive_data and so on,Can you tell me what means about this data?
What data do I need to prepare if Iwant to run the code successfully?
Thanks!!!!!

from nmt_gan.

ZhenYangIACAS avatar ZhenYangIACAS commented on May 27, 2024

@ @luckper dis_positive_data is the positive data for training the discriminator and dis_negative_data is the negative data for training the discriminator. dis_dev_positive_data is the development data for training discriminator, and so on...For understanding these files, I suggest that you should scan gan_train.py. Some files are what you should prepare beforehand, and some other files are generated automatically. I notice that so many files are a little messy for the users. We will re-construct our codes if we have free time.

from nmt_gan.

luckper avatar luckper commented on May 27, 2024

@ZhenYangIACAS OK,pass your suggestion,I scan gan_train.py . However, I still have some questions. First, where dis_dev_positive_data , dis_dev_negative_data , dis_dev_source_data come from? And what is different from dis_positive_data,dis_negative_data,dis_source_data?
Thanks!!!!!

from nmt_gan.

ZhenYangIACAS avatar ZhenYangIACAS commented on May 27, 2024

@luckper I think it is easy to get the development data sets. We just randomly sampled 200 sentences from the dis_positive_data to get the dis_dev_positive_data, and similarly, we get the corresponding dis_negative_data and dis_source_data.

from nmt_gan.

luckper avatar luckper commented on May 27, 2024

@ZhenYangIACAS Hi, I run the generate_sample.sh, but some errors have occurred:
Instructions for updating:
Use argmax instead
using rmsprop for g_loss
Traceback (most recent call last):
File "generate_samples.py", line 60, in
generate_samples(config)
File "generate_samples.py", line 32, in generate_samples
optimizer=config.train.optimizer)
File "/home/xxx/Downloads/ZKY-GAN_NMT/NMT_GAN-master/model.py", line 119, in build_generate
optimizer=tf.train.RMSPropOptimizer(self.config.generator.learning_rate)
File "/home/xxx/Downloads/ZKY-GAN_NMT/NMT_GAN-master/utils.py", line 19, in getattr
if type(self[item]) is dict:
KeyError: 'generator'
What is the reason? and the log file record as follows:
Instructions for updating:
Use argmax instead
INFO:root:using rmsprop for g_loss

from nmt_gan.

alwaysprep avatar alwaysprep commented on May 27, 2024

@ZhenYangIACAS @luckper have you solved "KeyError: 'generator'" error.

from nmt_gan.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.