GithubHelp home page GithubHelp logo

roxana-lafuente / mttt Goto Github PK

View Code? Open in Web Editor NEW
3.0 4.0 5.0 3.61 MB

Machine Translation Training Tool (MTTT): Machine translation made easy for human translators!

License: GNU General Public License v3.0

Python 63.63% Perl 0.49% Java 4.94% Makefile 0.27% CSS 23.75% Shell 0.03% HTML 6.83% Batchfile 0.05%
machine-translation moses human-translators portable python

mttt's Introduction

Machine Translation Training Tool (MTTT) -- Gtk-based version

Machine translation made easy for human translators!

MTTT is an under development post-editing suite which aims to improve the translators experience with machine translation tools such as moses. It provides the user with a graphical user interface to:

  • Work with the moses machine translation pipeline.
  • Apply evaluation metrics such as BLEU.
  • Post-edit the obtained machine translation.

Features

  • Portable (Windows / Linux)
  • Friendly Graphical User Interface (GUI) for MOSES.
  • Use machine translation tool MOSES, post-edit the output and run evaluation metrics.
Corpus preparation tab

Screenshot1

Training tab

Screenshot2

Dependencies

Source code

About Linux
  • You should link /bin/sh to /bin/bash and not to bin dash. To do that:
    • Check the link:
     ls -l /bin/sh
    
    • If /bin/sh is a link to /bin/dash, change it to /bin/bash.
     sudo mv /bin/sh /bin/sh.orig
     sudo ln -s /bin/bash /bin/sh
    

This is necessary to use the redirection commands used by MOSES commands.

On Ubuntu
  • MOSES (Install with "--with-mm" and "--install-scripts" flags)

  • To install its dependencies run

     python ubuntu_install.py
    
On Windows using Cygwin
  • MOSES (Install with "--with-mm" and "--install-scripts" flags)

  • To install Cygwin and its dependencies run

     python cygwin_install.py
    
On Windows

Binaries (portable)

More details on this soon!

Status

  • Under development. Currently everything is working but we need a better GUI design and add robustness (strict error handling).

How to use

Source code

On Linux

Simply install all dependencies and run:

python main.py
On Windows

Run LXDE or any other X window environment from CygWin. From inside LXDE or your favorite one run:

python main.py

Binaries (portable)

More details on this soon!

Contributors

  • Paula Estrella
  • Roxana Lafuente <roxana.lafuente at gmail dot com>
  • Miguel Lemos

We welcome new contributions! If you would like to be part of the team, create a new pull request and contact Paula or Roxana to let us know. If it is merged into the project you will be added as a contributor.

Please, check out our other contributions:

mttt's People

Contributors

miguelemosreverte avatar roxana-lafuente avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

mttt's Issues

When saving no changes the time_per_segment statistics calculations blows up

Solution 1: Do not, ever let the save button be shown if no changes have been done: The save button, when clicked not only saves, but also calculates the statistics to see which one can be shown. Insertion and deletion stats can be calculated on empty changes, but the same cannot be said about the time_per_segment stats, because to calculate it it tries to access the last key of the self.source_log dictionary. The last modification, in other words. And when doing so, when no changes have been added to the log, no value can be accessed from the keys() array, blowing up.

Solution 2: See if self.source_log.keys() is not empty before accessing it.

Issue with GIZA 64bits

When using GIZA 64 bits, you need to add option -mgiza -mgiza-cpus 2 to the command moses-64bit/scripts/training/train-model.perl. Otherwise it won't find the path for the translation model.

Assertion Error in Machine Translation tab

When using the machine translation tab wrong, that is straight up selecting a file and asking for it to be translated, the following error shows:


Traceback (most recent call last):
  File "main.py", line 774, in _machine_translation
    adapt_path_for_cygwin(self.is_windows, self.output_text.get_text()) + "/train/model/moses.ini",
  File "/home/migue/TTT/constants.py", line 39, in adapt_path_for_cygwin
    assert len(directory) > 0
AssertionError

Possible solutions:
Maybe a friendlier error message could be shown?
Not to allow the user to ask for translation if its going to cause an error

The statistics can only be seen after saving, else the button is unresponsive

Because the original idea was to have the statistic menu only pop up when saving, now that it is always visible after the first save, it is evident its uncooperativeness when it comes to react to the user input. It just does not want to cooperate, and wont show anything until something is saved once more.

Well, to be fair with the button, its behavior obeys the idea that no dated, obsolete, statistics should ever reach the user eye. So with the point given, I believe the first solution to the problem would be to unlock it for the user to use, and to have it save the information itself, once used.

Why sometimes the deletions and insertions differ

Say I have a log where I saved a ton of u's in a segment:
screenshot from 2017-01-25 23-13-52

When I come back in a different TTT session, and add another "u" in the middle of the u's, the statistics do not show any insertion made to the segment

Closing the program sometimes generates the following error

Traceback (most recent call last): File "main.py", line 1119, in final_responsabilities self.PostEditing.saveChangedFromPostEditing() File "/home/migue/Desktop/TTT_GTK/post_editing.py", line 319, in saveChangedFromPostEditing self.show_the_available_stats() File "/home/migue/Desktop/TTT_GTK/post_editing.py", line 245, in show_the_available_stats insertions = self.calculate_insertions_per_segment()[0] File "/home/migue/Desktop/TTT_GTK/post_editing.py", line 165, in calculate_insertions_per_segment percentaje_spent_by_segment=self.tables["translation_table"].calculate_insertions_or_deletions_percentajes(False) File "/home/migue/Desktop/TTT_GTK/table.py", line 401, in calculate_insertions_or_deletions_percentajes modified_segments = map(str.strip, modified_segments) TypeError: descriptor 'strip' requires a 'str' object but received a 'unicode'

where the main thing to be asked is, sure it is important, a "final responsibility", of the program, to save unsaved changes. But how on earth is it important, or even relevant, to call for the calculation of the statistics during closing?

Of course this rant is against code of my own, thus the complete lack of tact shown in the criticism. Will be fixed soon.

Machine translation tab error

It is not performing the machine translation.

Info:
Traceback (most recent call last):
File "main.py", line 765, in _machine_translation
self._has_empty_last_line(in_file):
File "main.py", line 753, in _has_empty_last_line
last_line_is_empty = "\n" in (f.readlines()[-1])
IndexError: list index out of range

In "Evaluation" tab, need to check if files exist

When the path to source text and/or reference text does not exist, the program breaks.

Traceback:
Traceback (most recent call last):
File "main.py", line 944, in _evaluate
self.evaluation_reference.get_text())
File "/home/rlafuente/TTT/evaluation.py", line 89, in evaluate
key = (test,creation_date(test),reference,creation_date(reference), checkbox_indexes_constants[checkbox_index])
File "/home/rlafuente/TTT/evaluation.py", line 37, in creation_date
stat = os.stat(path_to_file)
OSError: [Errno 2] No existe el archivo o el directorio: '5e/corpus/translate.en'

Traceback (most recent call last):
File "main.py", line 944, in _evaluate
self.evaluation_reference.get_text())
File "/home/rlafuente/TTT/evaluation.py", line 89, in evaluate
key = (test,creation_date(test),reference,creation_date(reference), checkbox_indexes_constants[checkbox_index])
File "/home/rlafuente/TTT/evaluation.py", line 37, in creation_date
stat = os.stat(path_to_file)
OSError: [Errno 2] No existe el archivo o el directorio: '/home/rlafuente/cor5ource.en'

In "Evaluation" tab, start evaluation exception

When you do not choose a file but choose (WER, BLEU, PER, HTER, BLEU3GRAM, BLEU4GRAM, GTM) and click on "Start Evaluation" button
Info:

Traceback (most recent call last):
File "main.py", line 941, in _evaluate
self.evaluation_reference.get_text())
File "/home/rlafuente/TTT/evaluation.py", line 89, in evaluate
key = (test,creation_date(test),reference,creation_date(reference), checkbox_indexes_constants[checkbox_index])
File "/home/rlafuente/TTT/evaluation.py", line 37, in creation_date
stat = os.stat(path_to_file)
OSError: [Errno 2] No existe el archivo o el directorio: ''

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.