GithubHelp home page GithubHelp logo

alignertools's Introduction

alignertools

Scripts for preparing data to be used by the Prosodylab-Aligner

Usage

This is a package of scripts that can be used to prepare existing data for use in the Prosodylab-Aligner. It contains the following kinds of scripts:

  • relabel_clean.py
  • rename.py
  • prep_lab_files.py
  • fix_lab.py
  • get_[language]_dict.py

Relabel & Clean

The relabel_clean.py script unites many useful transcription cleaning functions into a simple, iterative script. The user can choose to "relabel" corrupt .lab files (or create new ones) by taking the original tab-delimited experiment file and generating a new set of .lab files for every .wav file in a directory. The user can also opt to "clean" .lab files with the wrong transcription standards for the supported dictionaries, or generate a new orthography-based dictionary for use in the Prosodylab-Aligner. Old .lab files are stored in a new directory that is named by the user. All resulting .lab files will be encoded using UTF-8.

$ python relabel.py

On startup, the user will be prompted to select the function they want to use. If the user selects the "relabel" function, they will have to provide the following information:

  • The name of the tab-delimited experiment file or files that will provide the new .lab text - these can be dragged and dropped into the Terminal window
  • The id columns of the experiment files, separated by an underscore - the default is "experiment_item_condition"
  • The directory where the .wav files are stored. It is assumed that all .wav files are stored in the same directory. This can be dragged and dropped into the Terminal window.
  • The id columns of the sound files, separated by an underscore - the default is "experiment_participant_item_condition"
  • The name of the directory in which to put any old .lab files - the default is "0_old_labfile_relabel/"

The "clean" function requires the following:

  • The language to be used for .lab file cleaning
  • The directory where the .wav files are stored. It is assumed that all .wav files are stored in the same directory. This can be dragged and dropped into the Terminal window.
  • The name of the directory in which to put the old .lab files - the default is "0_old_labfile_clean/"

The "dictionary" function requires the following:

  • The directory where the .lab files are stored. This can be dragged and dropped into the Terminal window.

Prep Lab Files

The prep_lab_files.py script takes as input any .TextGrid or .eaf (ELAN) file and returns a set of .lab files based on the transcriptions in that file. Ideally for use in conjunction with chop.praat, offered as part of the praatscripts package (to appear). An orthography-based dictionary will also be generated for use in the Prosodylab-Aligner.

$ python prep_lab_files.py

On startup, the user will be prompted to select from the following options:

  • The file format (.TextGrid or .eaf)
  • The encoding format of the file (for use with .TextGrid files only)
  • The annotation tier from which to make the .lab files and dictionary
  • The name of the directory in which to put the old files

Fix Lab Files

The fix_lab.py script ...

Rename

The rename.py script acts as a find-and-replace editor for the names of files of a given directory. Old files are stored in a new directory that is named by the user.

$ python rename.py

On startup, the user will be prompted to provide the following information:

  • The directory where all of the files are stored. This can be dragged and dropped into the terminal window.
  • The name of the directory in which to put old files - the default is "0_old_file_rename/"
  • The problematic string in the file names that needs to be replaced
  • The new string that will replace the old string

Get Dictionary

There is a get_[language]_dict.py script for the each of the supported languages, except English, since support for this language is included in the Prosodylab-Aligner.

$ python get_[language]_dict.py

In addition, a basic dictionary for each supported language is provided as part of the alignertools package. The French dictionary comes from Lexique v. 3.8.0 and will be downloaded to the user's computer automatically when running the script. The Germany dictionary is from ... and will need to be downloaded prior to running the get_german_dict.py script.

alignertools's People

Contributors

yaseminmb avatar erin-daphne-olson avatar mchlwgnr avatar

Watchers

Symon Jory Stevens-Guille avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.