GithubHelp home page GithubHelp logo

daormar / thot Goto Github PK

View Code? Open in Web Editor NEW
50.0 16.0 13.0 13.29 MB

Thot toolkit for statistical machine translation

Home Page: https://daormar.github.io/thot/

License: GNU Lesser General Public License v3.0

C 1.31% C++ 80.70% Shell 13.35% Python 2.62% Makefile 1.31% M4 0.71% Awk 0.01%
natural-language-processing machine-learning machine-translation artificial-intelligence pattern-recognition statistics statistical-machine-translation tokenizer detokenizer recaser

thot's People

Contributors

ahara avatar daormar avatar jesus-g-rubio avatar mfomicheva avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

thot's Issues

awk: illegal statement at source line 1

Hi, I'm new here and I have some problem about training.

I follow the manual to train a translator of the toy_corpus provided by this project.
However, when I execute those commands :

src_train_corpus=${PREFIX}/share/thot/toy_corpus/sp_tok_lc.train
trg_train_corpus=${PREFIX}/share/thot/toy_corpus/en_tok_lc.train
thot_tm_train -s ${src_train_corpus} -t ${trg_train_corpus} -o tm_outdir

I get many but the same error :

awk: syntax error at source line 1
context is
{print >>> substr($1,length($1)-3)== <<<
awk: illegal statement at source line 1
awk: illegal statement at source line 1
/usr/local/bin//thot_pbs_gen_batch_sw_model: line 346: [: -eq: unary operator expected

I ignore those errors and keep going on executing the commands in the manual.
In the end I still can get a set of output, but I can't tell it is correct or not.
Is there any thing I do wrong? Any respond is appreciated.

missing files

hi

i am trying to use the thot_tokenize and noted the following issues:

  1. error in the tok_sent.encode function
  2. no thot_smt_preproc module

and also when i try out the => thot_tm_train -s ${src_train_corpus} -t ${trg_train_corpus} -o tm
there is only folder created but there is no desciptor files in it.

thot_tm_train -s ${src_train_corpus} -t ${trg_train_corpus} -o tm_outdir error

hello, i am currently having issues with
thot_tm_train -s ${src_train_corpus} -t ${trg_train_corpus} -o tm_outdir
It keeps giving me this
cat: /home/oluwasegun/thot_pbs_gen_batch_sw_model_sdir_3814_3821/models_per_chunk/__proc_n2.log: No such file or directory
cat: /home/oluwasegun/thot_pbs_gen_batch_sw_model_sdir_3814_3821/models_per_chunk/__proc_n3.log: No such file or directory
cat: /home/oluwasegun/thot_pbs_gen_batch_sw_model_sdir_3814_3821/models_per_chunk/__proc_n4.log: No such file or directory
cat: /home/oluwasegun/thot_pbs_gen_batch_sw_model_sdir_3814_3821/models_per_chunk/__proc_n5.log: No such file or directory
cat: /home/oluwasegun/thot_pbs_gen_batch_sw_model_sdir_3814_3821/curr_tables/generate_final_model.log: No such file or directory
Error during the execution of thot_pbs_gen_batch_sw_model (proc_chunk)
File /home/oluwasegun/tm_outdir/main/src_trg_swm.genswm_err contains information for error diagnosing

Any help?
Thanks ;)

Installation make error

Hi, I'm having an issue running the make command on installation. Seems to be an ar command with no file inputs somewhere. Running on macOS Sierra. Output is below:
$ make /Library/Developer/CommandLineTools/usr/bin/make all-recursive Making all in src Making all in nlp_common make[3]: Nothing to be done for all'.
Making all in incr_models
make[3]: Nothing to be done for all'. Making all in sw_models make[3]: Nothing to be done for all'.
Making all in phrase_models
make[3]: Nothing to be done for all'. Making all in smt_preproc make[3]: Nothing to be done for all'.
Making all in error_correction
make[3]: Nothing to be done for all'. Making all in downhill_simplex make[3]: Nothing to be done for all'.
Making all in stack_dec
make[3]: Nothing to be done for all'. Making all in exper make[3]: Nothing to be done for all'.
Making all in testing
make[3]: Nothing to be done for all'. Making all in hat_trie make[3]: Nothing to be done for all'.
/bin/sh ../libtool --tag=CC --mode=link gcc -W -Wno-deprecated -I./nlp_common -I./incr_models -I./sw_models -I./phrase_models -I./smt_preproc -I./error_correction -I./downhill_simplex -I./stack_dec -I./hat_trie -DTHOT_MASTER_INI_PATH="/usr/local/share/thot/ini_files/master.ini" -DTHOT_LIBDIR="/usr/local/lib" -Ino/src/include -g -O2 -g -Wall -O2 -o libhattrie.la -rpath /usr/local/lib -lm
libtool: link: gcc -dynamiclib -Wl,-undefined -Wl,dynamic_lookup -o .libs/libhattrie.0.dylib -lm -g -O2 -g -O2 -install_name /usr/local/lib/libhattrie.0.dylib -compatibility_version 1 -current_version 1.0 -Wl,-single_module
libtool: link: (cd ".libs" && rm -f "libhattrie.dylib" && ln -s "libhattrie.0.dylib" "libhattrie.dylib")
libtool: link: ar cru .libs/libhattrie.a
ar: no archive members specified
usage: ar -d [-TLsv] archive file ...
ar -m [-TLsv] archive file ...
ar -m [-abiTLsv] position archive file ...
ar -p [-TLsv] archive [file ...]
ar -q [-cTLsv] archive file ...
ar -r [-cuTLsv] archive file ...
ar -r [-abciuTLsv] position archive file ...
ar -t [-TLsv] archive [file ...]
ar -x [-ouTLsv] archive [file ...]
make[3]: *** [libhattrie.la] Error 1
make[2]: *** [all-recursive] Error 1
make[1]: *** [all-recursive] Error 1
make: *** [all] Error 2
`

Add support for an incremental alignment model that models fertility

Currently, Thot has support for IBM1, IBM2, and HMM models, none of which models fertility. Are there any plans to support IBM3, IBM4, IBM5, or one of the extensions to HMM that models/simulates fertility? Obviously, the IBM models 3-5 are complex and might be difficult to support. Some of the fertility extensions to HMM seem simpler and would improve accuracy.

macos high sierra installation

Hello,
Trying to install on high sierra, at the make step I got the error:

/bin/sh ../libtool --tag=CXX --mode=link g++ -W -Wno-deprecated -I./nlp_common -I./incr_models -I./sw_models -I./phrase_models -I./smt_preproc -I./error_correction -I./downhill_simplex -I./stack_dec -I./hat_trie -I./picojson -DTHOT_MASTER_INI_PATH="/Users/bilge/Desktop/thot/share/thot/ini_files/master.ini" -DTHOT_LIBDIR="/Users/bilge/Desktop/thot/lib" -Ino/src/include -g -Wall -Wno-deprecated -O2 -std=c++11 libthot.la -o thot_lm_perp incr_models/thot_lm_perp.o -lgmp -lz -lpthread -ldl -lm
libtool: error: cannot find the library 'libthot.la' or unhandled argument 'libthot.la'
make[3]: *** [thot_lm_perp] Error 1
make[2]: *** [all-recursive] Error 1
make[1]: *** [all-recursive] Error 1
make: *** [all] Error 2

Any idea for how to solve this ?
Thanks

Installation on Ubuntu18, Python2 vs Python3 print statements

After installation of the Thot package on a newly installed Ubuntu18, 'make installcheck' threw an error when performing 'Tuning log-linear model weights', file 'thot_get_nblist_segm_info', line 99, print line.encode('utf-8'), syntax error. I modified print statements in all of the files containing Python code in the source files directory to comply with Python3. After that I ran installation again and then 'make installcheck' showed that all tests passed.

Windows support.

I open this ticket for porting stuff to windows native since it might be useful tool in Win32 panorama. Please assign this to me.
The port:

  1. Use cmake and build.
  2. Get rid of Posix code to a indipendent cpp11/cpp14 code/boost

Cannot get the tm_desc file after using "thot_tm_train"

MY command is : thot_tm_train -s /usr/local/share/thot/toy_corpus/train.sp -t /usr/local/share/thot/toy_corpus/train.en -o /home/gtct/yuchao/mt/tm_outdir/
And I just get a "main" directory with nothing.I can find tm_desc nowhere.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.