GithubHelp home page GithubHelp logo

ictnlp / moe-waitk Goto Github PK

View Code? Open in Web Editor NEW
8.0 1.0 2.0 917 KB

Code for EMNLP 2021 oral paper "Universal Simultaneous Machine Translation with Mixture-of-Experts Wait-k Policy"

License: MIT License

Python 97.06% C++ 0.62% Cuda 1.42% Cython 0.41% Perl 0.21% Shell 0.13% Lua 0.16%
machine-translation simultaneous-translation

moe-waitk's People

Contributors

vily1998 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

vily1998

moe-waitk's Issues

Translation omission while generating with two-stage training model

Hi @Vily1998, thank you for your great work, and I'm encountered with a few issues while trying your code. Could you please help me to dig into it?

  1. The "Equal-Weight MoE Wait-k" model looks well, and the result is promising. While I'm trying the "MoEWait-k + FT", the trained model tends to have omissions while generating long sentences(especially with multi subsentences), in both en-zh and ja-zh language pair. I thought originally the reason for that is the proper timing for finetuning, and I can't find the description for that in your paper, I tried for several times, the omissions consistently shows in the finetuned model.
  2. While using the "fairseq-generate" for batch generating, I find the translation speed is quite slowly for long sentences, then I dig into your code, and find in "MoEWaitkMultiheadAttention", there is no cache incremental_state used. So is there possibly the k and v value be cached or partially cached? I don't know if the cache mechanism is against your model arch, and if this, i'll quit trying.

Looking forward for your reply.

hydra.errors.ConfigCompositionException: Could not override 'common_eval.path'.

I followed the Readme's guidance , and excuse the train command ,

expert_lagging=1,3,5,7,9,11,13,15
python train.py  --ddp-backend=no_c10d ${data} --arch transformer --share-all-embeddings \
 --optimizer adam \
 --adam-betas '(0.9, 0.98)' \
 --clip-norm 0.0 \
 --lr 5e-4 \
 --lr-scheduler inverse_sqrt \
 --warmup-init-lr 1e-07 \
 --warmup-updates 4000 \
 --dropout 0.3 \
 --criterion label_smoothed_cross_entropy \
 --reset-dataloader --reset-lr-scheduler --reset-optimizer\
 --label-smoothing 0.1 \
 --encoder-attention-heads 8 \
 --decoder-attention-heads 8 \
 --left-pad-source False \
 --fp16 \
 --equal-weight \
 --expert-lagging ${expert_lagging} \
 --save-dir ${modelfile}/$Prefix \
 --max-tokens 4096 --update-freq 1 

But unluckily it failed because of : hydra.errors.ConfigCompositionException: Could not override 'common_eval.path'. .
I thought it is hydra's version problem and installed it again , but it still did not work.

I don't know whether it is a bug , and I was trying to fix it but not successed . Could you help to take a look at it ? @Vily1998

The full error log shows as below .

anaconda3/envs/torch_moewaitk/lib/python3.6/site-packages/hydra/experimental/initialize.py:36: UserWarning: hydra.experimental.initialize() is no longer experimental. Use hydra.initialize()
message="hydra.experimental.initialize() is no longer experimental."
anaconda3/envs/torch_moewaitk/lib/python3.6/site-packages/hydra/experimental/compose.py:19: UserWarning: hydra.experimental.compose() is no longer experimental. Use hydra.compose()
message="hydra.experimental.compose() is no longer experimental."
anaconda3/envs/torch_moewaitk/lib/python3.6/site-packages/hydra/core/default_element.py:126: UserWarning: In 'config': Usage of deprecated keyword in package header '# @Package group'.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/changes_to_package_header for more information
See {url} for more information"""
anaconda3/envs/torch_moewaitk/lib/python3.6/site-packages/hydra/_internal/defaults_list.py:251: UserWarning: In 'config': Defaults list is missing _self_. See https://hydra.cc/docs/upgrades/1.0_to_1.1/default_composition_order for more information
warnings.warn(msg, UserWarning)
anaconda3/envs/torch_moewaitk/lib/python3.6/site-packages/hydra/core/default_element.py:126: UserWarning: In 'lr_scheduler/inverse_sqrt': Usage of deprecated keyword in package header '# @Package group'.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/changes_to_package_header for more information
See {url} for more information"""
anaconda3/envs/torch_moewaitk/lib/python3.6/site-packages/hydra/core/default_element.py:126: UserWarning: In 'optimizer/adam': Usage of deprecated keyword in package header '# @Package group'.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/changes_to_package_header for more information
See {url} for more information"""
anaconda3/envs/torch_moewaitk/lib/python3.6/site-packages/hydra/core/default_element.py:126: UserWarning: In 'task/language_modeling': Usage of deprecated keyword in package header '# @Package group'.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/changes_to_package_header for more information
See {url} for more information"""
Traceback (most recent call last):
File "anaconda3/envs/torch_moewaitk/lib/python3.6/site-packages/hydra/_internal/config_loader_impl.py", line 378, in apply_overrides_to_config
OmegaConf.update(cfg, key, value, merge=True)
File "anaconda3/envs/torch_moewaitk/lib/python3.6/site-packages/omegaconf/omegaconf.py", line 725, in update
root[key
] = {}
File "anaconda3/envs/torch_moewaitk/lib/python3.6/site-packages/omegaconf/dictconfig.py", line 311, in setitem
key=key, value=value, type_override=ConfigKeyError, cause=e
File "anaconda3/envs/torch_moewaitk/lib/python3.6/site-packages/omegaconf/base.py", line 196, in _format_and_raise
type_override=type_override,
File "anaconda3/envs/torch_moewaitk/lib/python3.6/site-packages/omegaconf/_utils.py", line 741, in format_and_raise
_raise(ex, cause)
File "anaconda3/envs/torch_moewaitk/lib/python3.6/site-packages/omegaconf/_utils.py", line 719, in _raise
raise ex.with_traceback(sys.exc_info()[2]) # set end OC_CAUSE=1 for full backtrace
File "anaconda3/envs/torch_moewaitk/lib/python3.6/site-packages/omegaconf/dictconfig.py", line 308, in setitem
self.__set_impl(key=key, value=value)
File "anaconda3/envs/torch_moewaitk/lib/python3.6/site-packages/omegaconf/dictconfig.py", line 318, in __set_impl
self._set_item_impl(key, value)
File "anaconda3/envs/torch_moewaitk/lib/python3.6/site-packages/omegaconf/basecontainer.py", line 511, in _set_item_impl
self._validate_set(key, value)
File "anaconda3/envs/torch_moewaitk/lib/python3.6/site-packages/omegaconf/dictconfig.py", line 180, in _validate_set
target = self._get_node(key) if key is not None else self
File "anaconda3/envs/torch_moewaitk/lib/python3.6/site-packages/omegaconf/dictconfig.py", line 465, in _get_node
self._validate_get(key)
File "anaconda3/envs/torch_moewaitk/lib/python3.6/site-packages/omegaconf/dictconfig.py", line 167, in _validate_get
key=key, value=value, cause=ConfigAttributeError(msg)
File "anaconda3/envs/torch_moewaitk/lib/python3.6/site-packages/omegaconf/base.py", line 196, in _format_and_raise
type_override=type_override,
File "anaconda3/envs/torch_moewaitk/lib/python3.6/site-packages/omegaconf/_utils.py", line 821, in format_and_raise
_raise(ex, cause)
File "anaconda3/envs/torch_moewaitk/lib/python3.6/site-packages/omegaconf/_utils.py", line 719, in _raise
raise ex.with_traceback(sys.exc_info()[2]) # set end OC_CAUSE=1 for full backtrace
omegaconf.errors.ConfigKeyError: Key 'common_eval' is not in struct
full_key: common_eval
object_type=dict

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "train.py", line 14, in
cli_main()
File "Moe-waitk/MoE-Waitk/fairseq_cli/train.py", line 395, in cli_main
cfg = convert_namespace_to_omegaconf(args)
File "Moe-waitk/MoE-Waitk/fairseq/dataclass/utils.py", line 285, in convert_namespace_to_omegaconf
composed_cfg = compose(cfg_name, overrides=overrides, strict=False)
File "anaconda3/envs/torch_moewaitk/lib/python3.6/site-packages/hydra/experimental/compose.py", line 26, in compose
strict=strict,
File "anaconda3/envs/torch_moewaitk/lib/python3.6/site-packages/hydra/compose.py", line 38, in compose
with_log_configuration=False,
File "anaconda3/envs/torch_moewaitk/lib/python3.6/site-packages/hydra/_internal/hydra.py", line 563, in compose_config
from_shell=from_shell,
File "anaconda3/envs/torch_moewaitk/lib/python3.6/site-packages/hydra/_internal/config_loader_impl.py", line 145, in load_configuration
from_shell=from_shell,
File "anaconda3/envs/torch_moewaitk/lib/python3.6/site-packages/hydra/_internal/config_loader_impl.py", line 262, in _load_configuration_impl
ConfigLoaderImpl._apply_overrides_to_config(config_overrides, cfg)
File "anaconda3/envs/torch_moewaitk/lib/python3.6/site-packages/hydra/_internal/config_loader_impl.py", line 383, in _apply_overrides_to_config
) from ex
hydra.errors.ConfigCompositionException: Could not override 'common_eval.path'.
To append to your config use +common_eval.path=null

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.