roxana-lafuente / mttt Goto Github PK

View Code? Open in Web Editor NEW

3.0 4.0 5.0 3.61 MB

Machine Translation Training Tool (MTTT): Machine translation made easy for human translators!

License: GNU General Public License v3.0

Python 63.63% Perl 0.49% Java 4.94% Makefile 0.27% CSS 23.75% Shell 0.03% HTML 6.83% Batchfile 0.05%

machine-translation moses human-translators portable python

mttt's Introduction

Machine Translation Training Tool (MTTT) -- Gtk-based version

Machine translation made easy for human translators!

MTTT is an under development post-editing suite which aims to improve the translators experience with machine translation tools such as moses. It provides the user with a graphical user interface to:

Work with the moses machine translation pipeline.
Apply evaluation metrics such as BLEU.
Post-edit the obtained machine translation.

Features

Portable (Windows / Linux)
Friendly Graphical User Interface (GUI) for MOSES.
Use machine translation tool MOSES, post-edit the output and run evaluation metrics.

Corpus preparation tab

Training tab

Dependencies

Source code

About Linux

You should link /bin/sh to /bin/bash and not to bin dash. To do that:
- Check the link:
```
 ls -l /bin/sh
```
- If /bin/sh is a link to /bin/dash, change it to /bin/bash.
```
 sudo mv /bin/sh /bin/sh.orig
 sudo ln -s /bin/bash /bin/sh
```

This is necessary to use the redirection commands used by MOSES commands.

On Ubuntu

MOSES (Install with "--with-mm" and "--install-scripts" flags)
To install its dependencies run
```
 python ubuntu_install.py
```

On Windows using Cygwin

MOSES (Install with "--with-mm" and "--install-scripts" flags)
To install Cygwin and its dependencies run
```
 python cygwin_install.py
```

On Windows

MOSES (Install with "--with-mm" and "--install-scripts" flags)
the following installer is recommended: https://sourceforge.net/projects/pygobjectwin32/files/pygi-aio-3.18.2_rev10-setup_84c21bc2679ff32e73de38cbaa6ef6d30c628ae5.exe/download
- visual installation guide:

Binaries (portable)

More details on this soon!

Status

Under development. Currently everything is working but we need a better GUI design and add robustness (strict error handling).

How to use

Source code

On Linux

Simply install all dependencies and run:

python main.py

On Windows

Run LXDE or any other X window environment from CygWin. From inside LXDE or your favorite one run:

python main.py

Binaries (portable)

More details on this soon!

Contributors

Paula Estrella
Roxana Lafuente <roxana.lafuente at gmail dot com>
Miguel Lemos

We welcome new contributions! If you would like to be part of the team, create a new pull request and contact Paula or Roxana to let us know. If it is merged into the project you will be added as a contributor.

Please, check out our other contributions:

MTTT PyQt-based version: https://github.com/PaulaEstrella/TTT_PyQT
MTTT Web-based version POC: https://github.com/miguelemosreverte/TTT_web

mttt's People

Contributors

Stargazers

Watchers

Forkers

paulaestrella miguelemosreverte lulzzz

mttt's Issues

The directory dialogs, when canceled, changes the directory anyway

Steps to reproduce:
1.Select a directory as normal, one that is not the default
2.Re open the dialog, and close
Now the default directory will have replaced the old one you selected on the first step.

When a modified segment gets completely erased, it is ignored by the deletion stats

When saving no changes the time_per_segment statistics calculations blows up

Solution 1: Do not, ever let the save button be shown if no changes have been done: The save button, when clicked not only saves, but also calculates the statistics to see which one can be shown. Insertion and deletion stats can be calculated on empty changes, but the same cannot be said about the time_per_segment stats, because to calculate it it tries to access the last key of the self.source_log dictionary. The last modification, in other words. And when doing so, when no changes have been added to the log, no value can be accessed from the keys() array, blowing up.

Solution 2: See if self.source_log.keys() is not empty before accessing it.

Issue with GIZA 64bits

When using GIZA 64 bits, you need to add option -mgiza -mgiza-cpus 2 to the command moses-64bit/scripts/training/train-model.perl. Otherwise it won't find the path for the translation model.

Assertion Error in Machine Translation tab

When using the machine translation tab wrong, that is straight up selecting a file and asking for it to be translated, the following error shows:


Traceback (most recent call last):
  File "main.py", line 774, in _machine_translation
    adapt_path_for_cygwin(self.is_windows, self.output_text.get_text()) + "/train/model/moses.ini",
  File "/home/migue/TTT/constants.py", line 39, in adapt_path_for_cygwin
    assert len(directory) > 0
AssertionError

Possible solutions:
Maybe a friendlier error message could be shown?
Not to allow the user to ask for translation if its going to cause an error

The statistics can only be seen after saving, else the button is unresponsive

Because the original idea was to have the statistic menu only pop up when saving, now that it is always visible after the first save, it is evident its uncooperativeness when it comes to react to the user input. It just does not want to cooperate, and wont show anything until something is saved once more.

Well, to be fair with the button, its behavior obeys the idea that no dated, obsolete, statistics should ever reach the user eye. So with the point given, I believe the first solution to the problem would be to unlock it for the user to use, and to have it save the information itself, once used.

In "Post editing" tab, after modifying a part, color is not persistent.

I modified the text, it changes to blue but after moving to another segment, it goes back to white again.

Why sometimes the deletions and insertions differ

Say I have a log where I saved a ton of u's in a segment:

When I come back in a different TTT session, and add another "u" in the middle of the u's, the statistics do not show any insertion made to the segment

Program should start on "Corpus Preparation" tab

Program should start on "Corpus Preparation" tab, instead it is starting on "Machine Translation" tab

Closing the program sometimes generates the following error

Traceback (most recent call last): File "main.py", line 1119, in final_responsabilities self.PostEditing.saveChangedFromPostEditing() File "/home/migue/Desktop/TTT_GTK/post_editing.py", line 319, in saveChangedFromPostEditing self.show_the_available_stats() File "/home/migue/Desktop/TTT_GTK/post_editing.py", line 245, in show_the_available_stats insertions = self.calculate_insertions_per_segment()[0] File "/home/migue/Desktop/TTT_GTK/post_editing.py", line 165, in calculate_insertions_per_segment percentaje_spent_by_segment=self.tables["translation_table"].calculate_insertions_or_deletions_percentajes(False) File "/home/migue/Desktop/TTT_GTK/table.py", line 401, in calculate_insertions_or_deletions_percentajes modified_segments = map(str.strip, modified_segments) TypeError: descriptor 'strip' requires a 'str' object but received a 'unicode'

where the main thing to be asked is, sure it is important, a "final responsibility", of the program, to save unsaved changes. But how on earth is it important, or even relevant, to call for the calculation of the statistics during closing?

Of course this rant is against code of my own, thus the complete lack of tact shown in the criticism. Will be fixed soon.

In evaluation script when it tries to save the output, crash ocurrs

It seems as if it was trying to use the output directory as a valid filename, and failing in the process.
Addinga proper filename should do the trick. ie.: evaluation_output_filename = output_directory + "/evaluation_output.txt"

In "Evaluation" tab, output directory expects a file but is asking for a directory.

Which one is correct? Does it need a directory or a file?

When performing a MT, the output is saved in the /home dir

Instead, it should be saved in the output directory.

>& (bash only) can be changed to 2>&1 so that is works in dash

Machine translation tab error

It is not performing the machine translation.

Info:
Traceback (most recent call last):
File "main.py", line 765, in _machine_translation
self._has_empty_last_line(in_file):
File "main.py", line 753, in _has_empty_last_line
last_line_is_empty = "\n" in (f.readlines()[-1])
IndexError: list index out of range

In "Evaluation" tab, problem with HTER and GTM

When you choose two identical files (source text and reference), HTER and GTM are empty. Should it show an error message or a value?

In "Evaluation" tab, need to check if files exist

When the path to source text and/or reference text does not exist, the program breaks.

Traceback:
Traceback (most recent call last):
File "main.py", line 944, in _evaluate
self.evaluation_reference.get_text())
File "/home/rlafuente/TTT/evaluation.py", line 89, in evaluate
key = (test,creation_date(test),reference,creation_date(reference), checkbox_indexes_constants[checkbox_index])
File "/home/rlafuente/TTT/evaluation.py", line 37, in creation_date
stat = os.stat(path_to_file)
OSError: [Errno 2] No existe el archivo o el directorio: '5e/corpus/translate.en'

Traceback (most recent call last):
File "main.py", line 944, in _evaluate
self.evaluation_reference.get_text())
File "/home/rlafuente/TTT/evaluation.py", line 89, in evaluate
key = (test,creation_date(test),reference,creation_date(reference), checkbox_indexes_constants[checkbox_index])
File "/home/rlafuente/TTT/evaluation.py", line 37, in creation_date
stat = os.stat(path_to_file)
OSError: [Errno 2] No existe el archivo o el directorio: '/home/rlafuente/cor5ource.en'

The source and target on the HTML statistics are always the same

In "Post Editing" tab, problem with term search.

Term search is not working on ubuntu. Pressing enter is doing nothing.

After creating a model in "Machine Translation" tab, it goes back to the "Corpus Preparation" tab.

After creating a model in "Machine Translation" tab, it goes back to the "Corpus Preparation" tab. However, it should stay on the "Machine Translation" tab.

In "Evaluation" tab, problem with HTER and GTM

Both metrics are not defined. I think this was working before.

Output:

HTER.....
GTM.....

In "Evaluation" tab, start evaluation exception

When you do not choose a file but choose (WER, BLEU, PER, HTER, BLEU3GRAM, BLEU4GRAM, GTM) and click on "Start Evaluation" button
Info:

Traceback (most recent call last):
File "main.py", line 941, in _evaluate
self.evaluation_reference.get_text())
File "/home/rlafuente/TTT/evaluation.py", line 89, in evaluate
key = (test,creation_date(test),reference,creation_date(reference), checkbox_indexes_constants[checkbox_index])
File "/home/rlafuente/TTT/evaluation.py", line 37, in creation_date
stat = os.stat(path_to_file)
OSError: [Errno 2] No existe el archivo o el directorio: ''

Moses

Hdhhdhshhs

roxana-lafuente / mttt Goto Github PK

mttt's Introduction

Machine Translation Training Tool (MTTT) -- Gtk-based version

Machine translation made easy for human translators!

Features

Corpus preparation tab

Training tab

Dependencies

Source code

About Linux

On Ubuntu

On Windows using Cygwin

On Windows

Binaries (portable)

Status

How to use

Source code

On Linux

On Windows

Binaries (portable)

Contributors

mttt's People

Contributors

Stargazers

Watchers

Forkers

mttt's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs