GithubHelp home page GithubHelp logo

martnquesada / sockeye-noise Goto Github PK

View Code? Open in Web Editor NEW

This project forked from yunsukim86/sockeye-noise

0.0 0.0 0.0 2.76 MB

Denoising Autoencoder in Sockeye

License: Apache License 2.0

Python 99.10% Dockerfile 0.07% Shell 0.49% CSS 0.05% JavaScript 0.28%

sockeye-noise's Introduction

Denoising Autoencoder in Sockeye

This version of Sockeye contains codes to train a denoising autoencoder for sequences. It includes the following artificial noises for source side:

  • Insertion of frequent tokens
  • Deletion of tokens
  • Permutation of tokens with a limited distance

If you use this code, please cite:

If you are looking for the language model integration into cross-lingual word embedding, please go to wbw-lm.

Installation

> pip install -r requirements/requirements.txt
> pip install .

after cloning the repository from git.

If you want to run on a GPU you need to make sure your version of Apache MXNet Incubating contains the GPU bindings. Depending on your version of CUDA you can do this by running the following:

> pip install -r requirements/requirements.gpu-cu${CUDA_VERSION}.txt
> pip install .

where ${CUDA_VERSION} can be 75 (7.5), 80 (8.0), 90 (9.0), or 91 (9.1).

Usage

To train a denoising autoencoder, turn on --source-noise-train with detailed noise options (--source-noise-insertion, --source-noise-insertion-vocab, --source-noise-deletion, --source-noise-permutation). Please put the same training data for both source and target sides and also the same validation data for both sides. Optionally, you can also switch on --source-noise-validation to evaluate your models on a noisy validation set during the training. Example:

> python -m sockeye.train -s {training_data} \
                          -t {training_data} \
                          -vs {validation_data} \
                          -vt {validation_data} \
                          --source-noise-train \
                          --source-noise-permutation 3 \
                          --source-noise-deletion 0.1 \
                          --source-noise-insertion 0.1 \
                          --source-noise-insertion-vocab 50 \
                          .... (other options)

Denoising with a trained model can be done with sockeye.translate module in the same way as translating an input sentence. You can use all other modules provided by Sockeye on denoising autoencoder, e.g. sharding the training data (sockeye.prepare_data) or model averaging (sockeye.average). Please refer to the Sockeye documentation for details.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.