GithubHelp home page GithubHelp logo

lizezhonglaile / alignman Goto Github PK

View Code? Open in Web Editor NEW

This project forked from steinst/alignman

0.0 0.0 0.0 176 KB

Tool for manual word alignment of parallel sentences.

License: Apache License 2.0

Python 100.00%

alignman's Introduction

AlignMan

This is a tool for manual word alignment of parallel sentences.

Requirements

  • SpaCy
  • tokenizer

Usage

In order to do the word alignments a database, containing the sentences to align, has to be created. Then a graphical tool can be used to do the alignments, and finally the alignments can be exported from the database in various formats.

Create new database

python3 primeDB.py --input-file examples/sentences.txt

The input file should be a list of aligned sentences, separated by a tab. The sentences should ideally be tokenized.:

What can I order for you ?    Hvað get ég pantað handa ykkur ?

I 'll have one of those .    Ég þigg einn svoleiðis .

What are you worrying about ?    Hvaða áhyggjur eru þetta ?

An example file is included, examples/sentences.txt.

It is also possible to use untokenized sentences by adding the flag --tokenize. Currently it only supports English and Icelandic. It assumes the first sentence is in English and uses SpaCy to tokenize it, and that the latter sentence is in Icelandic and uses tokenizer from Miðeind to tokenize that.

--db-name dbname.db changes the name of the output database from the default alignments.db.

Manual Word Alignment

python3 align.py

The tool can be run for one user by running the file without any parameters. By using the --user parameter a second user can be selected. The users then align the sentences separately and when both have finished a sentence the alignments are rated as sure or possible. All alignments that both evaluators set as 1-to-1 are tagged sure. Other alignments, the ones that only one evaluator creates or if one or both create 1-to-many, many-to-1 or many-to-many, they are tagged as possible.

Export manual alignments

python3 export_alignments.py --alignments

It is possible to export the alignments in two different formats by adding the parameter --alignment-format:

| classic (default) | Alignments are exported as ... | | pharaoh | Alignments are exported as ... |

Examples of both export formats are available in the examples folder.

This script can also export the sentences that were aligned by using the --sentences flag. For other options, run python3 export_alignments.py -h.

Citation

If you use AlignMan for published research, please cite the paper:

@inproceedings{combalign-nodalida2021,
  author    = {Steingrímsson, Steinþór  and  Loftsson, Hrafn  and  Way, Andy},
  title     = {CombAlign: a Tool for Obtaining High-Quality Word Alignments},
  booktitle = {Proceedings of the 23rd Nordic Conference on Computational Linguistics},
  month     = {June},
  year      = {2021},
  address   = {Online},
  publisher = {Link{\"o}ping University Electronic Press},
}

License

Copyright (C) 2021, Steinþór Steingrímsson

Licensed under the terms of the Apache License, version 2.0. A full copy of the license can be found in LICENSE.

alignman's People

Contributors

lizezhonglaile avatar steinst avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.