domain adaptation fro NMT
This is the second edition(V2) DA4NMT:
- I add the entropy loss, and the common encoder tries to maxmize it.
- The negative cross entropy loss will only update the parameters of the discriminator.
- I train the network using the adversarial strategy.