GithubHelp home page GithubHelp logo

sedes's Introduction

SEDES

Metrical position in Greek hexameter.

See https://sasansom.github.io/sedes/ for web-based visualizations produced by this system.

See the tag tapa-version to reproduce the figures from the TAPA article, "Sedes as Style in Greek Hexameter: A Computational Approach."

See support here for the DHQ article, "SEDES: Metrical Position in Greek Hexameter."

Setup

Sedes depends on The Classical Language Toolkit for lemmatization. You first need to install CLTK in a virtual environment:

python3 -m venv venv
source venv/bin/activate
pip3 install -U pip setuptools wheel
pip3 install cltk bs4 lxml

Next, install the grc_models_cltk corpus:

python3 -c 'from cltk.data.fetch import FetchCorpus; FetchCorpus("grc").import_corpus("grc_models_cltk")'

The corpus is stored in a cltk_data subdirectory of your home directory. The authors have used commit 94c04ac of the grc_models_cltk corpus.

You only need to do the steps above once. Thereafter, every time you start a new shell, you need to run only the single command

source venv/bin/activate

Programs

The "src" subdirectory contains a tei2csv program that processes a TEI-encoded XML document as downloaded from Perseus and produces a CSV file that annotates every word with its line number and sedes. For example:

./src/tei2csv "Il." corpus/iliad.xml > corpus/iliad.csv

The expectancy program annotates one or more CSV files as produced by tei2csv with statistics about expectancy for each word.

./src/expectancy corpus/*.csv > expectancy.all.csv

The tei2html program produces an HTML representation of a TEI-encoded XML document, with visual highlighting of word expectancy. If you put the HTML file in the sedes-web directory, it will have access to locally installed web fonts for Greek.

./src/tei2html corpus/iliad.xml expectancy.all.csv > sedes-web/iliad.html

The join-expectancy program takes a work-specific CSV file (as produced by tei2csv) and augments it with lemma/sedes expectancy numbers.

./src/join-expectancy corpus/iliad.csv expectancy.all.csv > iliad-expectancy.csv

The "src/hexameter" subdirectory contains a Python module that we use for metrical analysis. It is by Hope Ranker and comes from https://github.com/epilanthanomai/hexameter.

Corpus

The "corpus" subdirectory contains selected TEI-encoded XML texts downloaded from Perseus. These are suitable for input to tei2csv and tei2html.

Getting started

If you have GNU Make installed, you can analyze all the texts in the corpus using the command

make -j4

The above command will run tei2csv, expectancy, and tei2html to produce HTML visualizations in the sedes-web directory, as well as intermediary files.

If you do not have GNU Make, the script make.sh runs the same commands as make would:

./make.sh

Data format

The output of tei2csv is CSV that may be imported into a spreadsheet or further processed by another program.

Greek text is represented as UTF-8 Unicode text. Characters are stored in decomposed form using Normalization Form D (NFD); this means that diacritics are separate combining characters. For example, the word ἀοιδή is stored as the sequence of characters

U+03B1 GREEK SMALL LETTER ALPHA
U+0313 COMBINING COMMA ABOVE
U+03BF GREEK SMALL LETTER OMICRON
U+03B9 GREEK SMALL LETTER IOTA
U+03B4 GREEK SMALL LETTER DELTA
U+03B7 GREEK SMALL LETTER ETA
U+0301 COMBINING ACUTE ACCENT

After UTF-8 encoding, this sequence is \xce\xb1\xcc\x93\xce\xbf\xce\xb9\xce\xb4\xce\xb7\xcc\x81.

The characters that mark long and short metrical values are respectively U+2013 EN DASH and U+23D1 METRICAL BREVE.

sedes's People

Contributors

nickdgardner avatar sasansom avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.