GithubHelp home page GithubHelp logo

alexxnica / incubator-joshua Goto Github PK

View Code? Open in Web Editor NEW

This project forked from apache/joshua

0.0 1.0 0.0 354.24 MB

Mirror of Apache Joshua (Incubating)

License: Apache License 2.0

Shell 1.73% Python 3.09% C++ 0.34% Perl 7.11% Perl 6 0.03% Smalltalk 0.04% Emacs Lisp 0.39% JavaScript 0.02% NewLisp 0.04% Ruby 0.04% Slash 0.01% SystemVerilog 0.01% Makefile 0.03% C 0.40% Java 79.23% HTML 0.03% Roff 2.34% UrWeb 4.72% PostScript 0.39%

incubator-joshua's Introduction

Build Status homebrew license Jenkins Maven Central Twitter Follow

Welcome to Apache Joshua (Incubating)

Joshua is a statistical machine translation toolkit for both phrase-based (new in version 6.0) and syntax-based decoding. It can be run with pre-built language packs available for download, and can also be used to build models for new language pairs. Among the many features of Joshua are:

  • Support for both phrase-based and syntax-based decoding models
  • Translation of weighted input lattices
  • Thrax: a Hadoop-based, scalable grammar extractor
  • A sparse feature architecture supporting an arbitrary number of features

The latest release of Joshua is always linked to directly from the Home Page

New in 6.X

Joshua 6.X includes the following new features:

  • A fast phrase-based decoder with the ability to read Moses phrase tables
  • Large speed improvements compared to the previous syntax-based decoder
  • Special input handling
  • A host of bugfixes and stability improvements

Quick start

Joshua must be run with a Java JDK 1.8 minimum.

To run the decoder in any form requires setting a few basic environment variables: $JAVA_HOME, $JOSHUA, and, for certain (optional) portions of the model-training pipeline, potentially $MOSES.

export JAVA_HOME=/path/to/java  # maybe /usr/java/home
export JOSHUA=/path/to/joshua

You might also find it helpful to set these:

export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8

Then, compile Joshua by typing:

cd $JOSHUA
mvn clean package

You also need to download and compile KenLM and Thrax:

bash ./download-deps.sh

The basic method for invoking the decoder looks like this:

cat SOURCE | $JOSHUA/bin/joshua-decoder -m MEM -c CONFIG OPTIONS > OUTPUT

Some example usage scenarios and scripts can be found in the examples/ directory.

Development With Eclipse

If you are hoping to work on the decoder, we suggest you use Eclipse. You can get started with this by typing

mvn eclipse:eclipse

Working with "language packs"

Joshua includes a number of "language packs", which are pre-built models that allow you to use the translation system as a black box, without worrying too much about how machine translation works. You can browse the models available for download on the Joshua website.

Building new models

Joshua includes a pipeline script that allows you to build new models, provided you have training data. This pipeline can be run (more or less) by invoking a single command, which handles data preparation, alignment, phrase-table or grammar construction, and tuning of the model parameters. See the documentation for a walkthrough and more information about the many available options.

License

Joshua is licensed and released under the permissive Apache License v2.0, a copy of which ships with the Joshua source code.

incubator-joshua's People

Contributors

afader avatar antot avatar callison-burch avatar caoyuan avatar chrismattmann avatar cnap avatar dameikle avatar fhieber avatar gwenniger avatar jganitkevitch avatar joel-coffman avatar john-hewitt avatar jonmay avatar jweese avatar keisks avatar kellensunderland avatar khayrallah avatar kpu avatar lewismc avatar logogin avatar lukeorland avatar maxthomas avatar michael-aloys avatar mjmartindale avatar mjpost avatar noisychannel avatar tbpalsulich avatar thammegowda avatar tteofili avatar xuchen avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.