stanfordnlp / phrasal Goto Github PK

A large-scale statistical machine translation system written in Java.

License: GNU General Public License v3.0

HTML 0.55% Shell 1.89% Python 5.71% PLpgSQL 0.13% CSS 0.11% JavaScript 0.62% Perl 3.12% Ruby 0.01% C++ 22.67% XSLT 0.02% C 17.10% Makefile 0.03% Java 46.91% Batchfile 0.39% Yacc 0.38% CMake 0.35%

java-nlp statistical-machine-translation java natural-language-processing

phrasal's Introduction

Phrasal: A statistical machine translation system

Phrasal is licensed under the GPL (v3+). For details, please see the file LICENSE.txt in the root directory of this software package.

Installation

We use Gradle to build Phrasal. Gradle will install all dependencies. You need Gradle version 2.1+. If you are on OS X, the easiest way to get Gradle is to install Homebrew and then to type brew install gradle.

Linux / Mac OS X

These instructions assume you are using the bash shell, which is usually the default shell.

Switch to the root of the Phrasal repository and execute: gradle installDist
Set PHRASAL_HOME: export PHRASAL_HOME=`pwd`
Set CLASSPATH: export CLASSPATH=$PHRASAL_HOME/build/install/phrasal/lib/*
(Optional) Build Eclipse project files by executing: gradle eclipse.
(Optional, requires g++, JDK) Build the KenLM loader: gradle compileKenLM.
(Optional, requires g++, JDK, and Boost) Build the KenLM language model estimation tools: gradle compileKenLMtools.

Windows

Follow the Linux instructions above. Then be sure to execute gradle startupScripts to generate a .bat file.

Citation

If you use Phrasal for research, then please cite the following paper:

@inproceedings{Green2014,
 author = {Spence Green and Daniel Cer and Christopher D. Manning},
 title = {Phrasal: A Toolkit for New Directions in Statistical Machine Translation},
 booktitle = {In Proceddings of the Ninth Workshop on Statistical Machine Translation},
 year = {2014}
}

Documentation / User Guide

See the user guide for complete installation and configuration instructions. The guide also contains a tutorial for building an MT system from raw text.

Support

We have 3 mailing lists for Phrasal, all of which are shared with other Stanford JavaNLP tools (with the exclusion of the parser).

Each address is at @lists.stanford.edu:

java-nlp-user -- This is the best list to post to in order to ask questions, make announcements, or for discussion among JavaNLP users. You have to subscribe to be able to use it. Join the list via this webpage or by emailing [email protected]. (Leave the subject and message body empty.) You can also look at the list archives.

java-nlp-announce -- This list will be used only to announce new versions of Stanford JavaNLP tools. So it will be very low volume (expect 1-3 message a year). Join the list via via this webpage or by emailing [email protected]. (Leave the subject and message body empty.)

java-nlp-support -- This list goes only to the software maintainers. It's a good address for licensing questions, etc. For general use and support questions, please join and use java-nlp-user. You cannot join java-nlp-support, but you can mail questions to [email protected].

phrasal's People

Contributors

Stargazers

Watchers

Forkers

mkolod m2pathan chagge chrisjohnson46 pantapps zzmjohn chrishokamp ajluca wentaouc vsooda thanhnv3690 denglizong codeaudit iokays cuihaitao xiao2mo tiger66639 mility caomw bitisony tomzhang tiansiyuan benjamesbabala prvn16 yiiwood bikmaeff geraldsec dl-nisl transpiral kamaldsingh kamaldeep-ebay vivekyadav01 fgaim wqssyq prince1809 zbxzc35 goldenzero hfxunlp stevenlol miradel51 fancyerii rahul-sindhu kavgan nagabharat el-sebbo loretoparisi zhaowei8188127 ronakpanchal sanyaade-machine-learning voidlin solversa solertis hades210 biaoyinzi bin2000 harendranathvegi9 bellwind shivakyasaram bttrung zofuthan amjadoof igotpassion ahmedsmostafa maxy218 ah-cog ranamalhas afcarl yfinkels angi16 pandeyvineet batermj sonnydch darrenzhang01 angelodel80 fagan2888 pdhung3012 curioustauseef lilt paritoshg nitieaj 5l1v3r1 mobljs boronhub boubeinstein weexp ryan-haines annavinay gurpreetkaurjethra viv1dixit

phrasal's Issues

[gradle] How to build src-extra ?

I need to run web-service.sh, which need edu.stanford.nlp.mt.service.PhrasalService class.
That class is not build into build/libs/phrasal-3.6.0.jar when I run gradle installDist

I see that it is add in build.gradle

// Configure build targets
sourceSets {
  main {
    java.srcDirs = ['src/' ]
    resources.srcDirs = ['resources/']
  }
  test {
    java.srcDirs = ['test/']
    resources.srcDirs = ['test-resources/','src-cc']
  }
  extra {
    java.srcDirs = ['src-extra/']
    resources.srcDirs = ['resources/']
  }
}

However, I am new to gradle.
I need to know the command to build include src-extra.

Rewrite n-best procedure with A* search

build src-extra failed

I would like to use web-service.sh
However, I try to compile web service with gradle compileExtraJava
Here is full log from Gradle


> Configure project : 
The Task.leftShift(Closure) method has been deprecated and is scheduled to be removed in Gradle 5.0. Please use Task.doLast(Action) instead.
        at build_61jfp5nokcncjri2p7rblqs0e.run(/home/phrasal/build.gradle:96)
        (Run with --stacktrace to get the full stack trace of this deprecation warning.)

> Task :compileExtraJava FAILED
/home/phrasal/src-extra/edu/stanford/nlp/mt/service/handlers/RuleQueryRequestHandler.java:99: error: method getRules in interface TranslationModel<TK,FV> cannot be applied to given types;
            .getRules(source, inputProperties, null, qId.incrementAndGet(), scorer);
            ^
  required: Sequence<IString>,InputProperties,int,Scorer<String>
  found: Sequence<IString>,InputProperties,<null>,int,Scorer<String>
  reason: actual and formal argument lists differ in length
  where TK,FV are type-variables:
    TK extends Object declared in interface TranslationModel
    FV extends Object declared in interface TranslationModel
/home/phrasal/src-extra/edu/stanford/nlp/mt/service/handlers/RuleQueryRequestHandler.java:100: error: incompatible types: boolean cannot be converted to int
        RuleGrid<IString,String> ruleGrid = new RuleGrid<IString,String>(ruleList, source, true);
                                                                                           ^
/home/phrasal/src-extra/edu/stanford/nlp/mt/service/handlers/RuleQueryRequestHandler.java:104: error: cannot find symbol
        Sequence<IString> queryString = Sequences.concatenate(sourceContext, source);
                                                 ^
  symbol:   method concatenate(Sequence<IString>,Sequence<IString>)
  location: class Sequences
/home/phrasal/src-extra/edu/stanford/nlp/mt/service/handlers/RuleQueryRequestHandler.java:106: error: method getRules in interface TranslationModel<TK,FV> cannot be applied to given types;
            .getRules(queryString, inputProperties, null, qId.incrementAndGet(), scorer);
            ^
  required: Sequence<IString>,InputProperties,int,Scorer<String>
  found: Sequence<IString>,InputProperties,<null>,int,Scorer<String>
  reason: actual and formal argument lists differ in length
  where TK,FV are type-variables:
    TK extends Object declared in interface TranslationModel
    FV extends Object declared in interface TranslationModel
/home/phrasal/src-extra/edu/stanford/nlp/mt/service/handlers/RuleQueryRequestHandler.java:107: error: incompatible types: boolean cannot be converted to int
        RuleGrid<IString,String> ruleGrid = new RuleGrid<IString,String>(ruleList, queryString, true);
                                                                                                ^
/home/phrasal/src-extra/edu/stanford/nlp/mt/service/handlers/RuleQueryRequestHandler.java:134: error: cannot find symbol
          target = Sequences.concatenate(bestLeftContext.abstractRule.target, target);
                            ^
  symbol:   method concatenate(Sequence<IString>,Sequence<IString>)
  location: class Sequences
/home/phrasal/src-extra/edu/stanford/nlp/mt/tools/TranslationModelComparator.java:57: error: constructor TranslationModelFeaturizer in class TranslationModelFeaturizer cannot be applied to given types;
    RuleFeaturizer<IString,String> feat = new TranslationModelFeaturizer(6);
                                          ^
  required: no arguments
  found: int
  reason: actual and formal argument lists differ in length
/home/phrasal/src-extra/edu/stanford/nlp/mt/tools/TranslationModelComparator.java:65: error: cannot find symbol
      RuleGrid<IString,String> dynRules = dynTM.getRuleGrid(source, null, null, sourceId, scorer);
                                               ^
  symbol:   method getRuleGrid(Sequence<IString>,<null>,<null>,int,Scorer<String>)
  location: variable dynTM of type TranslationModel<IString,String>
/home/phrasal/src-extra/edu/stanford/nlp/mt/tools/TranslationModelComparator.java:66: error: cannot find symbol
      RuleGrid<IString,String> compRules = compiledTM.getRuleGrid(source, null, null, sourceId, scorerComp);
                                                     ^
  symbol:   method getRuleGrid(Sequence<IString>,<null>,<null>,int,Scorer<String>)
  location: variable compiledTM of type TranslationModel<IString,String>
Note: Some messages have been simplified; recompile with -Xdiags:verbose to get full output
9 errors


FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':compileExtraJava'.
> Compilation failed; see the compiler error output for details.

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output. Run with --scan to get full insights.

* Get more help at https://help.gradle.org

BUILD FAILED in 1s
5 actionable tasks: 1 executed, 4 up-to-date

Cannot change dependencies of configuration ':compile'

After removing "<<" in line 96, I get the following issue:

I also come across building failure on Mac with the following message:

You can delete the "<<" symbol on line 96, this will solve the "leftShift()" problem.

Originally posted by @victorcai101 in #35 (comment)

Lattice generation

Phrasel can consume the plf lattice format. But how do we get this lattice format from a list of possible sentences?

Need Help to Setup in Escllipse

I want to run this project in eclipse I read the instructions but did not get it it will be great if someone help to run this project in eclipse

kenlm State length mis-match

I replace newest kenlm (clone from their github) and build gradle compileKenLM

When I run step 2 with kenlm, read .online.stdout


Done loading phrase table: /data/20171214/config/dev.tables/phrase-table.gz (mem used: 71 MiB time: 0.253 s)
Longest foreign phrase: 5
Loading extended Moses Lexical Reordering Table: dev.tables/lo-hier.msd2-bidirectional-fe.gz
Done loading reordering table: dev.tables/lo-hier.msd2-bidirectional-fe.gz (mem used: 71 MiB time: 0.137s)
Hierarchical reordering model:
Distinguish between left and right discontinuous: true
Use containment orientation: false
Forward orientation: hierarchical
Backward orientation: hierarchical
Non-NPLM /data/trained_model/kenlm/20171124/20171124_lm_train_data.bin
[ERROR] 2017-12-14 16:46:32.184 [pool-2-thread-1] KenLMState - State length mis-match: 1 vs. 205
[ERROR] 2017-12-14 16:46:32.184 [pool-2-thread-3] KenLMState - State length mis-match: 1 vs. 205
[ERROR] 2017-12-14 16:46:32.184 [pool-2-thread-4] KenLMState - State length mis-match: 1 vs. 205
[ERROR] 2017-12-14 16:46:32.184 [pool-2-thread-2] KenLMState - State length mis-match: 1 vs. 205
java.lang.RuntimeException: Bad state length returned from KenLM query
	at edu.stanford.nlp.mt.lm.KenLMState.<init>(KenLMState.java:39)
	at edu.stanford.nlp.mt.lm.KenLanguageModel.score(KenLanguageModel.java:167)
	at edu.stanford.nlp.mt.decoder.feat.base.NGramLanguageModelFeaturizer.ruleFeaturize(NGramLanguageModelFeaturizer.java:162)
	at edu.stanford.nlp.mt.decoder.feat.FeatureExtractor.ruleFeaturize(FeatureExtractor.java:196)
	at edu.stanford.nlp.mt.tm.ConcreteRule.<init>(ConcreteRule.java:91)
	at edu.stanford.nlp.mt.tm.AbstractPhraseGenerator.getRules(AbstractPhraseGenerator.java:60)
	at edu.stanford.nlp.mt.tm.CombinedTranslationModel.getRules(CombinedTranslationModel.java:201)
	at edu.stanford.nlp.mt.decoder.AbstractBeamInferer.getRules(AbstractBeamInferer.java:115)
	at edu.stanford.nlp.mt.decoder.CubePruningDecoder.decode(CubePruningDecoder.java:130)
	at edu.stanford.nlp.mt.decoder.AbstractBeamInferer.nbest(AbstractBeamInferer.java:193)
	at edu.stanford.nlp.mt.decoder.AbstractBeamInferer.nbest(AbstractBeamInferer.java:95)
	at edu.stanford.nlp.mt.Phrasal.decode(Phrasal.java:1425)
	at edu.stanford.nlp.mt.tune.OnlineTuner$GradientProcessor.process(OnlineTuner.java:493)
	at edu.stanford.nlp.mt.tune.OnlineTuner$GradientProcessor.process(OnlineTuner.java:450)
	at edu.stanford.nlp.util.concurrent.MulticoreWrapper$CallableJob.call(MulticoreWrapper.java:255)
	at edu.stanford.nlp.util.concurrent.MulticoreWrapper$CallableJob.call(MulticoreWrapper.java:236)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
java.lang.RuntimeException: Bad state length returned from KenLM query
	at edu.stanford.nlp.mt.lm.KenLMState.<init>(KenLMState.java:39)
	at edu.stanford.nlp.mt.lm.KenLanguageModel.score(KenLanguageModel.java:167)
	at edu.stanford.nlp.mt.decoder.feat.base.NGramLanguageModelFeaturizer.ruleFeaturize(NGramLanguageModelFeaturizer.java:162)
	at edu.stanford.nlp.mt.decoder.feat.FeatureExtractor.ruleFeaturize(FeatureExtractor.java:196)
	at edu.stanford.nlp.mt.tm.ConcreteRule.<init>(ConcreteRule.java:91)
	at edu.stanford.nlp.mt.tm.AbstractPhraseGenerator.getRules(AbstractPhraseGenerator.java:60)
	at edu.stanford.nlp.mt.tm.CombinedTranslationModel.getRules(CombinedTranslationModel.java:201)
	at edu.stanford.nlp.mt.decoder.AbstractBeamInferer.getRules(AbstractBeamInferer.java:115)
	at edu.stanford.nlp.mt.decoder.CubePruningDecoder.decode(CubePruningDecoder.java:130)
	at edu.stanford.nlp.mt.decoder.AbstractBeamInferer.nbest(AbstractBeamInferer.java:193)
	at edu.stanford.nlp.mt.decoder.AbstractBeamInferer.nbest(AbstractBeamInferer.java:95)
	at edu.stanford.nlp.mt.Phrasal.decode(Phrasal.java:1425)
	at edu.stanford.nlp.mt.tune.OnlineTuner$GradientProcessor.process(OnlineTuner.java:493)
	at edu.stanford.nlp.mt.tune.OnlineTuner$GradientProcessor.process(OnlineTuner.java:450)
	at edu.stanford.nlp.util.concurrent.MulticoreWrapper$CallableJob.call(MulticoreWrapper.java:255)
	at edu.stanford.nlp.util.concurrent.MulticoreWrapper$CallableJob.call(MulticoreWrapper.java:236)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
java.lang.RuntimeException: Bad state length returned from KenLM query
	at edu.stanford.nlp.mt.lm.KenLMState.<init>(KenLMState.java:39)
	at edu.stanford.nlp.mt.lm.KenLanguageModel.score(KenLanguageModel.java:167)
	at edu.stanford.nlp.mt.decoder.feat.base.NGramLanguageModelFeaturizer.ruleFeaturize(NGramLanguageModelFeaturizer.java:162)
	at edu.stanford.nlp.mt.decoder.feat.FeatureExtractor.ruleFeaturize(FeatureExtractor.java:196)
	at edu.stanford.nlp.mt.tm.ConcreteRule.<init>(ConcreteRule.java:91)
	at edu.stanford.nlp.mt.tm.AbstractPhraseGenerator.getRules(AbstractPhraseGenerator.java:60)
	at edu.stanford.nlp.mt.tm.CombinedTranslationModel.getRules(CombinedTranslationModel.java:201)
	at edu.stanford.nlp.mt.decoder.AbstractBeamInferer.getRules(AbstractBeamInferer.java:115)
	at edu.stanford.nlp.mt.decoder.CubePruningDecoder.decode(CubePruningDecoder.java:130)
	at edu.stanford.nlp.mt.decoder.AbstractBeamInferer.nbest(AbstractBeamInferer.java:193)
	at edu.stanford.nlp.mt.decoder.AbstractBeamInferer.nbest(AbstractBeamInferer.java:95)
	at edu.stanford.nlp.mt.Phrasal.decode(Phrasal.java:1425)
	at edu.stanford.nlp.mt.tune.OnlineTuner$GradientProcessor.process(OnlineTuner.java:493)
	at edu.stanford.nlp.mt.tune.OnlineTuner$GradientProcessor.process(OnlineTuner.java:450)
	at edu.stanford.nlp.util.concurrent.MulticoreWrapper$CallableJob.call(MulticoreWrapper.java:255)
	at edu.stanford.nlp.util.concurrent.MulticoreWrapper$CallableJob.call(MulticoreWrapper.java:236)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
java.lang.RuntimeException: Bad state length returned from KenLM query
	at edu.stanford.nlp.mt.lm.KenLMState.<init>(KenLMState.java:39)
	at edu.stanford.nlp.mt.lm.KenLanguageModel.score(KenLanguageModel.java:167)
	at edu.stanford.nlp.mt.decoder.feat.base.NGramLanguageModelFeaturizer.ruleFeaturize(NGramLanguageModelFeaturizer.java:162)
	at edu.stanford.nlp.mt.decoder.feat.FeatureExtractor.ruleFeaturize(FeatureExtractor.java:196)
	at edu.stanford.nlp.mt.tm.ConcreteRule.<init>(ConcreteRule.java:91)
	at edu.stanford.nlp.mt.tm.AbstractPhraseGenerator.getRules(AbstractPhraseGenerator.java:60)
	at edu.stanford.nlp.mt.tm.CombinedTranslationModel.getRules(CombinedTranslationModel.java:201)
	at edu.stanford.nlp.mt.decoder.AbstractBeamInferer.getRules(AbstractBeamInferer.java:115)
	at edu.stanford.nlp.mt.decoder.CubePruningDecoder.decode(CubePruningDecoder.java:130)
	at edu.stanford.nlp.mt.decoder.AbstractBeamInferer.nbest(AbstractBeamInferer.java:193)
	at edu.stanford.nlp.mt.decoder.AbstractBeamInferer.nbest(AbstractBeamInferer.java:95)
	at edu.stanford.nlp.mt.Phrasal.decode(Phrasal.java:1425)
	at edu.stanford.nlp.mt.tune.OnlineTuner$GradientProcessor.process(OnlineTuner.java:493)
	at edu.stanford.nlp.mt.tune.OnlineTuner$GradientProcessor.process(OnlineTuner.java:450)
	at edu.stanford.nlp.util.concurrent.MulticoreWrapper$CallableJob.call(MulticoreWrapper.java:255)
	at edu.stanford.nlp.util.concurrent.MulticoreWrapper$CallableJob.call(MulticoreWrapper.java:236)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Remove streams and lambdas from main decoder loop

Slow.

TranslationModelComparator bug

Change:
List<ConcreteRule<IString,String>> dynRules = dynTM.getRules(source, null, sourceId, scorer);
List<ConcreteRule<IString,String>> compRules = compiledTM.getRules(source, null, sourceId, scorerComp);
To:
List<ConcreteRule<IString,String>> dynRulesList = dynTM.getRules(source, null, sourceId, scorer);
List<ConcreteRule<IString,String>> compRulesList = compiledTM.getRules(source, null, sourceId, scorerComp);
RuleGrid<IString,String> dynRules = new RuleGrid<IString,String>(dynRulesList, source);
RuleGrid<IString,String> compRules = new RuleGrid<IString,String>(compRulesList, source);

Is it possible to rewrite a sentence without loosing its meaning using Phrasal?

gradle installDist error

I am having ubuntu 14.04.

When i run gradle installDist I got following error.

FAILURE: Build failed with an exception.

Where:
Build file '/pathToDirectory/phrasal-master/build.gradle' line: 151
What went wrong:
A problem occurred evaluating root project 'phrasal-master'.

Could not find method jcenter() for arguments [] on repository container.

Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output.

BUILD FAILED

Total time: 4.773 secs

Gradle reports success despite compilation actually failing

gradle compileKenLM was failing, but Gradle's "BUILD SUCCESSFUL" output lead me to think it was working:

I solved this on my Ubuntu-based system by exporting JAVA_HOME prior to building: export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

This then resulted in a clean compileKenLM build with no errors:

How to debug: Tuning step (step 2 in phrasal.sh) running too long (~10 hours) but empty .binwts file

Step 2 run too long (for more than 10 hours). I cancel before it finish. It does not generate binwts (empty file). The only binwts file is the file i copy in example folder, which is still empty. I did not see any other .binwts

The process take all CPU and ~ 7GB of RAM (almost all I have) until I cancel. But below is all the logs. No binwts.

CONFIG

.vars

#
# Online parameter tuning with with phrasal-train-tune.sh
#

# General parameters
#
HOST=`hostname -s`
MEM=7g
JAVA_OPTS="-server -ea -Xmx${MEM} -Xms${MEM} -XX:+UseParallelGC -XX:+UseParallelOldGC"
DECODER_OPTS="-Djava.library.path=/home/me/phrasal.ver/src-cc"

# Set if you want to receive an email when a run completes.
# Assumes that the 'mail' unix program is installed and
# configured on your system.
[email protected]

# Resource locations
#
REFDIR=/data/refdir
CORPUSDIR=/data/corpusdir
CORPUS_SRC=${CORPUSDIR}/train.src.filt.gz
CORPUS_TGT=${CORPUSDIR}/train.dest.filt.gz
CORPUS_EF=${CORPUSDIR}/dest_src.A3.final.merge
CORPUS_FE=${CORPUSDIR}/src_dest.A3.final.merge



# Directory for reporting system.
#REPORTING_DIR=
#RESULTS_FILE=$REPORTING_DIR/results.html

#
# Phrase extraction parameters
#

# Mandatory extraction set format. See Usage of mt.train.PhraseExtract
# for the several different extraction set formats
EXTRACT_SET="-fCorpus $CORPUS_SRC -eCorpus $CORPUS_TGT -feAlign $CORPUS_FE -efAlign $CORPUS_EF -symmetrization grow-diag"
THREADS_EXTRACT=8
MAX_PHRASE_LEN=5
# DEBUG_PROPERTY=true
# DETAILED_DEBUG_PROPERTY=true
OTHER_EXTRACT_OPTS="-phiFilter 1e-4 -maxELen $MAX_PHRASE_LEN"

# Feature extractors
EXTRACTORS=edu.stanford.nlp.mt.train.MosesPharoahFeatureExtractor=phrase-table.gz:edu.stanford.nlp.mt.train.CountFeatureExtractor=phrase-table.gz:edu.stanford.nlp.mt.train.LexicalReorderingFeatureExtractor=lo-hier.msd2-bidirectional-fe.gz
EXTRACTOR_OPTS=""

# Lexicalized re-ordering models
LO_ARGS="-hierarchicalOrientationModel true -orientationModelType msd2-bidirectional-fe"

# Online tuning parameters
TUNE_MODE=online
TUNE_SET_NAME=dev_data
TUNE_SET=$CORPUSDIR/$TUNE_SET_NAME.dest
TUNE_REF=$REFDIR/$TUNE_SET_NAME/ref0
INITIAL_WTS=20171212.binwts
TUNE_NBEST=100

#Options to pass directly to OnlineTuner
METRIC=bleu-smooth
# default
# ONLINE_OPTS="-e 8 -ef 20 -b 20 -uw -m $METRIC -o pro-sgd -of 1,5000,50,0.5,Infinity,0.02,adagradl1f,0.1"
ONLINE_OPTS="-e 1 -ef 10 -b 20 -uw -m $METRIC -o pro-sgd -of 1,5000,50,0.5,Infinity,0.02,adagradl1f,0.1"



# Decoding parameters for dev/test set
DECODE_SET_NAME=test_data
DECODE_SET=$CORPUSDIR/$DECODE_SET_NAME.dest
NBEST=1

.ini

# Example Phrasal ini file
# These options are described by the usage statement
# that is shown on the command line (use the "-help" option).
#
# phrasal.sh will modify this template depending on the steps
# selected to run.
#

# phrasal.sh replaces the token SETID with the
# dev or test set name.
[ttable-file]
SETID.tables/phrase-table.gz

# The 'kenlm:' enables the KenLM loader. Remove the
# prefix for the standard Java ARPA loader.
[lmodel-file]
/data/kenlm.arpa

[ttable-limit]
20

[distortion-limit]
5


# The dense Moses feature set is loaded by default.
# Also load the hierarchical re-ordering model of Galley and Manning (2008)
[reordering-model]
hierarchical
SETID.tables/lo-hier.msd2-bidirectional-fe.gz
msd2-bidirectional-fe
hierarchical
hierarchical
bin

# Number of decoding threads
[threads]
3

LOG

.online.stdout log

Done loading phrase table: /data/dev_data.tables/phrase-table.gz (mem used: 465 MiB time: 0.737 s)
Longest foreign phrase: 5
Loading extended Moses Lexical Reordering Table: dev_data.tables/lo-hier.msd2-bidirectional-fe.gz
Done loading reordering table: dev_data.tables/lo-hier.msd2-bidirectional-fe.gz (mem used: 573 MiB time: 0.716s)
Hierarchical reordering model:
Distinguish between left and right discontinuous: true
Use containment orientation: false
Forward orientation: hierarchical
Backward orientation: hierarchical
Reading 262144 1-grams...
Reading 8388608 2-grams...
Reading 67108864 3-grams...
Reading 134217728 4-grams...
Reading 134217728 5-grams...
Done loading arpa lm: /data/kenlm.arpa (order: 5) (mem used: 4595 MiB time: 172.626 s)

phrasal.log

[INFO ] 2017-12-13 16:37:07.993 [main] OnlineTuner - Phrasal Online Tuner
[INFO ] 2017-12-13 16:37:08.143 [main] OnlineTuner - Options:  /data/dev_data.src /data/refdir/dev_data/ref0 dev_data.20171212baseline.ini 20171212.binwts b 20 e 1 ef 10 m bleu-smooth n dev_data.20171212baseline o pro-sgd of 1,5000,50,0.5,Infinity,0.02,adagradl1f,0.1 uw true
[INFO ] 2017-12-13 16:37:08.174 [main] Phrasal - Number of threads: 8
[INFO ] 2017-12-13 16:37:08.174 [main] Phrasal - Phrase table rule query limit: 20
[INFO ] 2017-12-13 16:37:08.174 [main] Phrasal - Translation model options []
[INFO ] 2017-12-13 16:37:08.949 [main] Phrasal - Translation model mode: static
[INFO ] 2017-12-13 16:37:09.684 [main] Phrasal - Language model: /data/kenlm.arpa

Missing RuleGrid Constructor

RuleGrid<IString,String> ruleGrid = new RuleGrid<IString,String>(ruleList, queryString, true);
These parameters are not in any of the constructors provided.

Bug in README.md

The first step of the Linux install instructions is:

Switch to the root of the Phrasal repository and execute: gradle installDist

But there is no such target in build.gradle.

error in gradle compileKenLM

My spec

$ gradle --version

------------------------------------------------------------
Gradle 4.3.1
------------------------------------------------------------

Build time:   2017-11-08 08:59:45 UTC
Revision:     e4f4804807ef7c2829da51877861ff06e07e006d

Groovy:       2.4.12
Ant:          Apache Ant(TM) version 1.9.6 compiled on June 29 2015
JVM:          1.8.0_151 (Oracle Corporation 25.151-b12)
OS:           Linux 4.10.0-40-generic amd64

$ java -version
openjdk version "1.8.0_151"
OpenJDK Runtime Environment (build 1.8.0_151-8u151-b12-0ubuntu0.17.04.2-b12)
OpenJDK 64-Bit Server VM (build 25.151-b12, mixed mode)

$ javac -version
javac 1.8.0_151

I am trying to Build the KenLM loader:
(With lastest commit be69585 on master branch)

$ gradle compileKenLM
Starting a Gradle Daemon, 1 incompatible Daemon could not be reused, use --status for details

> Configure project : 
The Task.leftShift(Closure) method has been deprecated and is scheduled to be removed in Gradle 5.0. Please use Task.doLast(Action) instead.
        at build_8pp33np0p9752ezmkum94m96i.run(/home/cpu11453local/workspace/study/phrasal/build.gradle:96)
        (Run with --stacktrace to get the full stack trace of this deprecation warning.)

> Task :compileKenLM 
You must use ./bjam if you want language model estimation, filtering, or support for compressed files (.gz, .bz2, .xz)
Compiling with g++ -DNDEBUG -O3 -fPIC -DHAVE_ZLIB -I. -O3 -DNDEBUG -DKENLM_MAX_ORDER=7
In file included from /usr/include/c++/6/stdlib.h:36:0,
                 from /usr/lib/gcc/x86_64-linux-gnu/6/include/mm_malloc.h:27,
                 from /usr/lib/gcc/x86_64-linux-gnu/6/include/xmmintrin.h:34,
                 from /usr/lib/gcc/x86_64-linux-gnu/6/include/emmintrin.h:31,
                 from util/integer_to_string.cc:72:
/usr/include/c++/6/cstdlib:124:11: error: ‘::div_t’ has not been declared
   using ::div_t;
           ^~~~~
/usr/include/c++/6/cstdlib:125:11: error: ‘::ldiv_t’ has not been declared
   using ::ldiv_t;
           ^~~~~~
/usr/include/c++/6/cstdlib:127:11: error: ‘::abort’ has not been declared
   using ::abort;
           ^~~~~
/usr/include/c++/6/cstdlib:128:11: error: ‘::abs’ has not been declared
   using ::abs;
           ^~~
/usr/include/c++/6/cstdlib:129:11: error: ‘::atexit’ has not been declared
   using ::atexit;
           ^~~~~~
/usr/include/c++/6/cstdlib:132:11: error: ‘::at_quick_exit’ has not been declared
   using ::at_quick_exit;
           ^~~~~~~~~~~~~
/usr/include/c++/6/cstdlib:135:11: error: ‘::atof’ has not been declared
   using ::atof;
           ^~~~
/usr/include/c++/6/cstdlib:136:11: error: ‘::atoi’ has not been declared
   using ::atoi;
           ^~~~
/usr/include/c++/6/cstdlib:137:11: error: ‘::atol’ has not been declared
   using ::atol;
           ^~~~
/usr/include/c++/6/cstdlib:138:11: error: ‘::bsearch’ has not been declared
   using ::bsearch;
           ^~~~~~~
/usr/include/c++/6/cstdlib:139:11: error: ‘::calloc’ has not been declared
   using ::calloc;
           ^~~~~~
/usr/include/c++/6/cstdlib:140:11: error: ‘::div’ has not been declared
   using ::div;
           ^~~
/usr/include/c++/6/cstdlib:141:11: error: ‘::exit’ has not been declared
   using ::exit;
           ^~~~
/usr/include/c++/6/cstdlib:142:11: error: ‘::free’ has not been declared
   using ::free;
           ^~~~
/usr/include/c++/6/cstdlib:143:11: error: ‘::getenv’ has not been declared
   using ::getenv;
           ^~~~~~
/usr/include/c++/6/cstdlib:144:11: error: ‘::labs’ has not been declared
   using ::labs;
           ^~~~
/usr/include/c++/6/cstdlib:145:11: error: ‘::ldiv’ has not been declared
   using ::ldiv;
           ^~~~
/usr/include/c++/6/cstdlib:146:11: error: ‘::malloc’ has not been declared
   using ::malloc;
           ^~~~~~
/usr/include/c++/6/cstdlib:148:11: error: ‘::mblen’ has not been declared
   using ::mblen;
           ^~~~~
/usr/include/c++/6/cstdlib:149:11: error: ‘::mbstowcs’ has not been declared
   using ::mbstowcs;
           ^~~~~~~~
/usr/include/c++/6/cstdlib:150:11: error: ‘::mbtowc’ has not been declared
   using ::mbtowc;
           ^~~~~~
/usr/include/c++/6/cstdlib:152:11: error: ‘::qsort’ has not been declared
   using ::qsort;
           ^~~~~
/usr/include/c++/6/cstdlib:155:11: error: ‘::quick_exit’ has not been declared
   using ::quick_exit;
           ^~~~~~~~~~
/usr/include/c++/6/cstdlib:158:11: error: ‘::rand’ has not been declared
   using ::rand;
           ^~~~
/usr/include/c++/6/cstdlib:159:11: error: ‘::realloc’ has not been declared
   using ::realloc;
           ^~~~~~~
/usr/include/c++/6/cstdlib:160:11: error: ‘::srand’ has not been declared
   using ::srand;
           ^~~~~
/usr/include/c++/6/cstdlib:161:11: error: ‘::strtod’ has not been declared
   using ::strtod;
           ^~~~~~
/usr/include/c++/6/cstdlib:162:11: error: ‘::strtol’ has not been declared
   using ::strtol;
           ^~~~~~
/usr/include/c++/6/cstdlib:163:11: error: ‘::strtoul’ has not been declared
   using ::strtoul;
           ^~~~~~~
/usr/include/c++/6/cstdlib:164:11: error: ‘::system’ has not been declared
   using ::system;
           ^~~~~~
/usr/include/c++/6/cstdlib:166:11: error: ‘::wcstombs’ has not been declared
   using ::wcstombs;
           ^~~~~~~~
/usr/include/c++/6/cstdlib:167:11: error: ‘::wctomb’ has not been declared
   using ::wctomb;
           ^~~~~~
/usr/include/c++/6/cstdlib:220:11: error: ‘::lldiv_t’ has not been declared
   using ::lldiv_t;
           ^~~~~~~
/usr/include/c++/6/cstdlib:226:11: error: ‘::_Exit’ has not been declared
   using ::_Exit;
           ^~~~~
/usr/include/c++/6/cstdlib:230:11: error: ‘::llabs’ has not been declared
   using ::llabs;
           ^~~~~
/usr/include/c++/6/cstdlib:236:11: error: ‘::lldiv’ has not been declared
   using ::lldiv;
           ^~~~~
/usr/include/c++/6/cstdlib:247:11: error: ‘::atoll’ has not been declared
   using ::atoll;
           ^~~~~
/usr/include/c++/6/cstdlib:248:11: error: ‘::strtoll’ has not been declared
   using ::strtoll;
           ^~~~~~~
/usr/include/c++/6/cstdlib:249:11: error: ‘::strtoull’ has not been declared
   using ::strtoull;
           ^~~~~~~~
/usr/include/c++/6/cstdlib:251:11: error: ‘::strtof’ has not been declared
   using ::strtof;
           ^~~~~~
/usr/include/c++/6/cstdlib:252:11: error: ‘::strtold’ has not been declared
   using ::strtold;
           ^~~~~~~
/usr/include/c++/6/cstdlib:260:22: error: ‘__gnu_cxx::lldiv_t’ has not been declared
   using ::__gnu_cxx::lldiv_t;
                      ^~~~~~~
/usr/include/c++/6/cstdlib:262:22: error: ‘__gnu_cxx::_Exit’ has not been declared
   using ::__gnu_cxx::_Exit;
                      ^~~~~
/usr/include/c++/6/cstdlib:264:22: error: ‘__gnu_cxx::llabs’ has not been declared
   using ::__gnu_cxx::llabs;
                      ^~~~~
/usr/include/c++/6/cstdlib:265:22: error: ‘__gnu_cxx::div’ has not been declared
   using ::__gnu_cxx::div;
                      ^~~
/usr/include/c++/6/cstdlib:266:22: error: ‘__gnu_cxx::lldiv’ has not been declared
   using ::__gnu_cxx::lldiv;
                      ^~~~~
/usr/include/c++/6/cstdlib:268:22: error: ‘__gnu_cxx::atoll’ has not been declared
   using ::__gnu_cxx::atoll;
                      ^~~~~
/usr/include/c++/6/cstdlib:269:22: error: ‘__gnu_cxx::strtof’ has not been declared
   using ::__gnu_cxx::strtof;
                      ^~~~~~
/usr/include/c++/6/cstdlib:270:22: error: ‘__gnu_cxx::strtoll’ has not been declared
   using ::__gnu_cxx::strtoll;
                      ^~~~~~~
/usr/include/c++/6/cstdlib:271:22: error: ‘__gnu_cxx::strtoull’ has not been declared
   using ::__gnu_cxx::strtoull;
                      ^~~~~~~~
/usr/include/c++/6/cstdlib:272:22: error: ‘__gnu_cxx::strtold’ has not been declared
   using ::__gnu_cxx::strtold;
                      ^~~~~~~
In file included from /usr/lib/gcc/x86_64-linux-gnu/6/include/mm_malloc.h:27:0,
                 from /usr/lib/gcc/x86_64-linux-gnu/6/include/xmmintrin.h:34,
                 from /usr/lib/gcc/x86_64-linux-gnu/6/include/emmintrin.h:31,
                 from util/integer_to_string.cc:72:
/usr/include/c++/6/stdlib.h:38:12: error: ‘util::std::abort’ has not been declared
 using std::abort;
            ^~~~~
/usr/include/c++/6/stdlib.h:39:12: error: ‘util::std::atexit’ has not been declared
 using std::atexit;
            ^~~~~~
/usr/include/c++/6/stdlib.h:40:12: error: ‘util::std::exit’ has not been declared
 using std::exit;
            ^~~~
/usr/include/c++/6/stdlib.h:43:14: error: ‘util::std::at_quick_exit’ has not been declared
   using std::at_quick_exit;
              ^~~~~~~~~~~~~
/usr/include/c++/6/stdlib.h:46:14: error: ‘util::std::quick_exit’ has not been declared
   using std::quick_exit;
              ^~~~~~~~~~
/usr/include/c++/6/stdlib.h:51:12: error: ‘util::std::div_t’ has not been declared
 using std::div_t;
            ^~~~~
/usr/include/c++/6/stdlib.h:52:12: error: ‘util::std::ldiv_t’ has not been declared
 using std::ldiv_t;
            ^~~~~~
/usr/include/c++/6/stdlib.h:55:12: error: ‘util::std::atof’ has not been declared
 using std::atof;
            ^~~~
/usr/include/c++/6/stdlib.h:56:12: error: ‘util::std::atoi’ has not been declared
 using std::atoi;
            ^~~~
/usr/include/c++/6/stdlib.h:57:12: error: ‘util::std::atol’ has not been declared
 using std::atol;
            ^~~~
/usr/include/c++/6/stdlib.h:58:12: error: ‘util::std::bsearch’ has not been declared
 using std::bsearch;
            ^~~~~~~
/usr/include/c++/6/stdlib.h:59:12: error: ‘util::std::calloc’ has not been declared
 using std::calloc;
            ^~~~~~
/usr/include/c++/6/stdlib.h:61:12: error: ‘util::std::free’ has not been declared
 using std::free;
            ^~~~
/usr/include/c++/6/stdlib.h:62:12: error: ‘util::std::getenv’ has not been declared
 using std::getenv;
            ^~~~~~
/usr/include/c++/6/stdlib.h:63:12: error: ‘util::std::labs’ has not been declared
 using std::labs;
            ^~~~
/usr/include/c++/6/stdlib.h:64:12: error: ‘util::std::ldiv’ has not been declared
 using std::ldiv;
            ^~~~
/usr/include/c++/6/stdlib.h:65:12: error: ‘util::std::malloc’ has not been declared
 using std::malloc;
            ^~~~~~
/usr/include/c++/6/stdlib.h:67:12: error: ‘util::std::mblen’ has not been declared
 using std::mblen;
            ^~~~~
/usr/include/c++/6/stdlib.h:68:12: error: ‘util::std::mbstowcs’ has not been declared
 using std::mbstowcs;
            ^~~~~~~~
/usr/include/c++/6/stdlib.h:69:12: error: ‘util::std::mbtowc’ has not been declared
 using std::mbtowc;
            ^~~~~~
/usr/include/c++/6/stdlib.h:71:12: error: ‘util::std::qsort’ has not been declared
 using std::qsort;
            ^~~~~
/usr/include/c++/6/stdlib.h:72:12: error: ‘util::std::rand’ has not been declared
 using std::rand;
            ^~~~
/usr/include/c++/6/stdlib.h:73:12: error: ‘util::std::realloc’ has not been declared
 using std::realloc;
            ^~~~~~~
/usr/include/c++/6/stdlib.h:74:12: error: ‘util::std::srand’ has not been declared
 using std::srand;
            ^~~~~
/usr/include/c++/6/stdlib.h:75:12: error: ‘util::std::strtod’ has not been declared
 using std::strtod;
            ^~~~~~
/usr/include/c++/6/stdlib.h:76:12: error: ‘util::std::strtol’ has not been declared
 using std::strtol;
            ^~~~~~
/usr/include/c++/6/stdlib.h:77:12: error: ‘util::std::strtoul’ has not been declared
 using std::strtoul;
            ^~~~~~~
/usr/include/c++/6/stdlib.h:78:12: error: ‘util::std::system’ has not been declared
 using std::system;
            ^~~~~~
/usr/include/c++/6/stdlib.h:80:12: error: ‘util::std::wcstombs’ has not been declared
 using std::wcstombs;
            ^~~~~~~~
/usr/include/c++/6/stdlib.h:81:12: error: ‘util::std::wctomb’ has not been declared
 using std::wctomb;
            ^~~~~~
g++: error: kenlm/lm/*.o: No such file or directory

Sequences.concatenate(bestLeftContext.abstractRule.target, target); Does not exist

This function does not exist in the current build.
This is called within RuleQueryRequestHandler.java

java.lang.Boolean cannot be cast to java.lang.String -- Phrasal.java:1213

hi all thanks for the awesome work!

i was trying to run the webservice, which I successfully did after changing 759bb65 back to what it was before. the problem stems from this line, where the property is set as a Boolean. Is there some other reason you changed it, or a different way to run the webservice?

The full error:

java.lang.ClassCastException: java.lang.Boolean cannot be cast to java.lang.String
        at edu.stanford.nlp.mt.Phrasal.decode(Phrasal.java:1213) ~[phrasal-3.4.1.jar:3.4.1]
        at edu.stanford.nlp.mt.service.handlers.TranslationRequestHandler$DecoderService.process(TranslationRequestHandler.java:177) [phrasal-3.4.1.jar:3.4.1]
        at edu.stanford.nlp.mt.service.handlers.TranslationRequestHandler$DecoderService.process(TranslationRequestHandler.java:117) [phrasal-3.4.1.jar:3.4.1]
        at edu.stanford.nlp.util.concurrent.MulticoreWrapper$CallableJob.call(MulticoreWrapper.java:249) [stanford-corenlp-3.5.2.jar:3.5.2]
        at edu.stanford.nlp.util.concurrent.MulticoreWrapper$CallableJob.call(MulticoreWrapper.java:230) [stanford-corenlp-3.5.2.jar:3.5.2]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_45]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_45]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_45]
        at java.lang.Thread.run(Thread.java:745) [?:1.8.0_45]

my test query, which works once I revert 759bb65
(I built a Spanish --> English system)

http://127.0.0.1:8017/x?tReq={"src":"ES", "tgt":"EN", "text": "el parlamento de ucrania", "tgtPrefix":"ukraine", "n": 6}

[gradle] Task.leftShift(Closure) method has been deprecated

My current gradle


------------------------------------------------------------
Gradle 4.3.1
------------------------------------------------------------

Build time:   2017-11-08 08:59:45 UTC
Revision:     e4f4804807ef7c2829da51877861ff06e07e006d

Groovy:       2.4.12
Ant:          Apache Ant(TM) version 1.9.6 compiled on June 29 2015
JVM:          1.8.0_151 (Oracle Corporation 25.151-b12)
OS:           Linux 4.10.0-42-generic amd64

When I build with

gradle compileKenLMtools

I got a message:

> Configure project : 
The Task.leftShift(Closure) method has been deprecated and is scheduled to be removed in Gradle 5.0. Please use Task.doLast(Action) instead.
        at build_8pp33np0p9752ezmkum94m96i.run(/home/cpu11453local/workspace/study/phrasal/build.gradle:96)
        (Run with --stacktrace to get the full stack trace of this deprecation warning.)