alatius / latin-macronizer Goto Github PK

View Code? Open in Web Editor NEW

77.0 77.0 17.0 36.51 MB

Script to automatically mark long vowels in Latin texts. Also optionally performs conversion of u to v and i to j.

License: GNU General Public License v3.0

Python 99.87% Shell 0.13%

latin-macronizer's People

Contributors

Stargazers

Watchers

Forkers

mk270 kylepjohnson cltk simon-will johnlawrenceaspden classicist lucianonooijen bblumenfelder dstodolny alijc userprogrammer alex-lee nkprasad12 dmforall

latin-macronizer's Issues

Suggestions

Thanks so much for this great program. If you're open to suggestions, here are a couple:

Make a PPA archive for easy installation and upgrades on Ubuntu-based distros.
Consider working with the developers of espeak-ng for text-to-speech output of macronized texts. Espeak-ng requires texts to have macrons, and there are issues with the Latin voices that need a developer who understands Latin.

Ambiguous form not marked as ambiguous

Thanks for making this! It's fantastic.

I found an interesting edge case for you. The word was "Matris". The macronizer marked it as "Matrīs" when the context makes it clear that it should be "Mātris". I was perplexed until I realized that the capital M leads the macronizer to identify it as the dative/ablative of "Matra" rather than the genitive of "māter" that just happens to be capitalized.

Bug in New Version?

Hi,

The latest version does not work for me.

[Wed Jul 07 12:51:59.107649 2021] [cgi:error] [pid 14342] [client ::1:51462] AH01215: Traceback (most recent call last):: /var/www/html/macronize.py, referer: http://localhost/macronize.py
[Wed Jul 07 12:51:59.107771 2021] [cgi:error] [pid 14342] [client ::1:51462] AH01215: File "/var/www/html/macronize.py", line 288, in : /var/www/html/macronize.py, referer: http://localhost/macronize.py
[Wed Jul 07 12:51:59.107913 2021] [cgi:error] [pid 14342] [client ::1:51462] AH01215: print(create_html_page(scriptname, texttomacronize, domacronize, alsomaius, scan, performitoj, performutov, doevaluate)): /var/www/html/macronize.py, referer: http://localhost/macronize.py
[Wed Jul 07 12:51:59.108002 2021] [cgi:error] [pid 14342] [client ::1:51462] AH01215: File "/var/www/html/macronize.py", line 44, in create_html_page: /var/www/html/macronize.py, referer: http://localhost/macronize.py
[Wed Jul 07 12:51:59.108100 2021] [cgi:error] [pid 14342] [client ::1:51462] AH01215: texttomacronize = unicodedata.normalize('NFC', texttomacronize).replace('\r', ''): /var/www/html/macronize.py, referer: http://localhost/macronize.py
[Wed Jul 07 12:51:59.108169 2021] [cgi:error] [pid 14342] [client ::1:51462] AH01215: TypeError: normalize() argument 2 must be unicode, not str: /var/www/html/macronize.py, referer: http://localhost/macronize.py

The macronizer is not actually compatible with Python 3

Commit 919fddb introduced Python 3 compatibility, but it was broken again by adding a string prefixed with ur (unicode and raw) in macronizer.py. I propose importing unicode_literals from __future__ and removing the u prefix everywhere. I will send a pull request shortly.

Error: Something went wrong with the tagging.

Hi,

I was trying to upgrade my local installation. python macronize.py --test worked fine, but when I copied everything to /usr/local and run it on "Rosa" I get

Error: Something went wrong with the tagging.

Any idea why?

Installation

Hi,

First, thank you for a very useful tool. I successfully installed it on my machine. However, I had to move latin-macronizer from my user space to /usr/local and change recursively ownership to root.root for it to work. Maybe there are more elegant solutions, although I like it this way now.

Cheers!

macronizing the genitive of "unus nauta" adjectives

I'm working with your macronizer to produce xml versions of Vergil and Ovid's hexameters with scansion. The macronizer is a really great piece of work - thanks! (I tried all 3 backends for cltk's macronizer, with limited success). I'll probably be able to give more feedback later, but for now I note that it's consistently macronizing the i in the genitive of adjectives like unus and ipse (result is ūnīus, ipsīus, should be ūnius, ipsius). Let me know if you'd like more examples.

fluctus

Macronizer gives flūctus. I believe it should be fluctus, since the u in fluō is short. Thanks!

Debian: Could not connect to the PostgreSQL database.

Hi, I'm trying to get the code to run on my debian box, and I'm diligently following the setup instructions, it all seems to work fine, but when I get to:

python macronize.py --initialize

I get:

john@dell-3537$ python macronize.py --initialize
Could not connect to the PostgreSQL database.

I imagine this is just some funny about how postgresql is set up on debian, but I don't suppose you know the fix?

ūsque

usque -> usque, but there are those that say:

https://www.reddit.com/r/latin/comments/czm2x7/usque_or_%C5%ABsque/

that it should be ūsque.

I would be quite happy to pay attention to such things and submit pull requests, if you would like!

Problem with latest release

Hi,

I made sure I have the latest version

After going through
./train-rftagger.sh
python macronize.py --initialize
python macronize.py --test
Traceback (most recent call last):
File "macronize.py", line 314, in
macronizer.settext(texttomacronize)
File "/Users/lionel/latin-macronizer/macronizer.py", line 1054, in settext
self.tokenization.addlemmas(self.wordlist)
File "/Users/lionel/latin-macronizer/macronizer.py", line 501, in addlemmas
from lemmas import lemma_frequency, word_lemma_freq, wordform_to_corpus_lemmas
ImportError: No module named lemmas

Any idea? In general, what is the right procedure for an update? Thanks!

Question about macrons.txt

Hi. I would like to know how are the words organised in this file. I have noticed that in the first column
are the non macronized words and in the last column is the macronized version. Furthermore, the second column
shows information about case, number, and type of the word. What are the possible combinations of letters in this
column and their meanings ? Finally, what is the purpose of the third column and what is the license of this file ?

Donate button not functioning well

Thought I would let you know, this took me 2-3 attempts to make work.

Word-initial s in poetry.

Note to self, the following fails to scan:
Sed quid ago? Tene alta silentia rumpere spero,
nil multos te annos cum mihi scripse sciam?

Allow option of apex in place of macron

Hi there, some Latinists prefer apex / acute accents in place of macrons, either as a more “authentic” or aesthetic choice. Could you provide an option to out put these?

Morpheus build errors except on Ubuntu 20

Hi,
in the context of a wider research project, I'm trying to wrap the functionality of this excellent macronizer in a very simple web API, and package it into a Docker image. This would allow any developer compose a stack where this simple API gets called by upper-layer services, thus providing a maybe sub-optimal, but certainly easy and flexible way of integrating it, whatever the programming language and environment. Of course, should I get it working it will be open source and publicly available.

So, my first step was following the instructions to setup the macronizer on a Linux machine. To start clean, I created a new Ubuntu VM and followed the instructions. Yet, I found out that the morpheus make command seems to work only on Ubuntu 20.04. There, I could follow the instructions to the end, and then write and test the API successfully. Yet, when I tried with Ubuntu 21 and 22, and in a Debian GNU/Linux 11 based distribution, I found a constant compile error at morpheus/src make:

...
a - mkend.o
a - nextsufftab.o
a - retrends.o
a - stor.o
ranlib gkends.a
gcc -o buildword expwordmain.o acccompos.o checkforbreath.o contract.o countendtables.o endindex.o euphend.o expendtable.o fixeta.o getcurrend.o indexendtables.o lcontr.o merge.o mkend.o nextsufftab.o retrends.o stor.o  gkends.a ../morphlib/morphlib.a ../greeklib/greeklib.a
/usr/bin/ld: indexendtables.o:(.bss+0x78): multiple definition of `endlines'; countendtables.o:(.bss+0x78): first defined here
/usr/bin/ld: indexendtables.o:(.bss+0x0): multiple definition of `Gstr'; countendtables.o:(.bss+0x0): first defined here
collect2: error: ld returned 1 exit status
make[1]: *** [makefile:42: buildword] Error 1
make[1]: Leaving directory '/usr/local/macronizer/latin-macronizer/morpheus/src/gkends'
make: *** [makefile:4: all] Error 2

Building a Docker image implies having PostgreSql available; so I'd typically start from the official PostgreSql Docker image. Yet, if I work with bash inside a container based on this image (in turn based on Debian GNU/Linux 11), I get the exact same compilation errors at make for morpheus I got on a full-fledged Ubuntu VM (when its version is different from 20.04). In all my tests, I managed to get this compiled only on a full Ubuntu 20.04 distribution, which of course is not what we start with when building a Docker image.
So, do you know of any environmental constraints for your instructions, or I'm just missing something more obvious?
Thank you!

sīquidem?

siquidem -> siquidem, which seems wrong or at least debatable?

L&S have: http://www.perseus.tufts.edu/hopper/morph?l=siquidem&la=la#lexicon

Macroniser destroys formatting

When using the web page (http://www.alatius.com/macronizer/), if you put in text with formatting:

puella
puer
amatur

then the macronised display looks right:

puella
puer
amātur

but when you copy the text back, either using the button or just selecting the text, all the newlines go missing and you get:

puella puer amātur

I can't see any way to recover the formatting whilst keeping the macrons.

(I'm using debian and firefox in case that matters)

P.S. utterly love macroniser, thank you!

"dimovit" macronized as "dimōvit"

"dimovit" is macronized as "dimōvit", but shouldn't it be "dīmōvit"? (L&S, Gaffiot, LaNe give it as such.)

All variants are analysed as having a short "i" it seems:

dimoveō
dimovēre
dimōvī
dimōvit

(I stumbled across this because it didn't know how to scan inque foco tepidum cinerem dimovit et ignes as dactylic hexameter )

Morpheus Compiling

Functions set to return an integer value are wrongly set to return; rather than return 0;.

Return statements were corrected and line 98 of smk2beta.c was changed to if( *s == L'�' && ! fromsmk ) {, but two errors still occurred.

Here is the error log when the original source is ran on Ubuntu 14.04: https://gist.github.com/kylepjohnson/4700056226790716dd79

Add macrons to EPUB books

This individual has done a lot of connected work with Russian (Russian has an identical problem around marking stress on words).

https://github.com/Vuizur/add-stress-to-epub

Converting a whole directory of epubs is pretty nice. The dictionaries and OCR are helpful too. His thesis is pretty interesting on this topic PDF. It even uses GPT3! Which is smart!

Anyway this is a note for me or someone else who might want to wrap some stuff around this program to make it work conveniently with epubs.

alatius / latin-macronizer Goto Github PK

latin-macronizer's People

Contributors

Stargazers

Watchers

Forkers

latin-macronizer's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs