The amazing Murre (genitive Murren đ) will normalize non-standard Finnish (puhekieli) to standard Finnish (kirjakieli). This repository is maintained by Mika HĂ€mĂ€lĂ€inen.
This library is designed for Python 3 and it may not work on Python 2.
pip3 install murre
python3 -m murre.download
To normalize Finnish, all you need to do is to run:
from murre import normalize_sentence
normalize_sentence("mÀ syön paljo karkkii")
>> minÀ syön paljon karkkia
To use the same chunk level BRNN model as described in the paper, you can pass wnut19_model=True, however this model might only work on Linux.
You can normalize multiple sentences at the same time by running
from murre import normalize_sentences
sents = ["kissa syö karkkii", "jok laulaa tuol puole", "en tiiÀ oikee et kuka se o", "kyl on hölömöö"]
normalize_sentences(sents)
>> ['kissa syö karkkia', 'joka laulaa tuolla puolen', 'en tiedÀ oikein ettÀ kuka se on', 'kyllÀ on hölmöÀ']
Murre can also generate different dialects. All you need to do, is to run:
from murre import dialectalize_sentence
dialectalize_sentence("kodin takana on koira", "Inkerinsuomalaismurteet")
>> 'kojin takan on koira'
Or for multiple sentences:
from murre import dialectalize_sentences
sents = ["kissa syö karkkia", "kÀdellÀ on perhonen", "kettu juoksee sutta karkuun"]
dialectalize_sentences(sents,'Kainuu')
>> ['kissa syöpi karkkia', 'kÀellÀ om perhonej', 'kettu juoksee sutta karkuu']
The list of available dialects can be obtained by:
from murre import supported_dialects
supported_dialects()
>> ['Pohjois-Satakunta', 'Keski-Karjala', 'Kainuu', 'EtelĂ€-Pohjanmaa', 'EtelĂ€-Satakunta', 'Pohjois-Savo', 'Pohjois-Karjala', 'Keski-Pohjanmaa', 'Kaakkois-HĂ€me', 'PohjoinenKeski-Suomi', 'Pohjois-Pohjanmaa', 'PohjoinenVarsinais-Suomi', 'EtelĂ€-Karjala', 'LĂ€nsi-Uusimaa', 'Inkerinsuomalaismurteet', 'LĂ€ntinenKeski-Suomi', 'LĂ€nsi-Satakunta', 'EtelĂ€-Savo', 'LĂ€nsipohja', 'Pohjois-HĂ€me', 'EtelĂ€inenKeski-Suomi', 'EtelaÌ-HaÌme', 'PerĂ€pohjola']
Normalization
Niko Partanen, Mika HÀmÀlÀinen, and Khalid Alnajjar. 2019. Dialect Text Normalization to Normative Standard Finnish. In the Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT).
Dialect generation
HÀmÀlÀinen, M., Partanen, N., Alnajjar, K., Rueter J. & Poibeau T. (2020). Automatic Dialect Adaptation in Finnish and its Effect on Perceived Creativity. In Proceedings of the 11th International Conference on Computational Creativity. p. 204-211
The data used in the paper describing dialect generation has been published on Zenodo .