GithubHelp home page GithubHelp logo

ml-ai-nlp-ir / murre Goto Github PK

View Code? Open in Web Editor NEW

This project forked from mikahama/murre

0.0 1.0 0.0 99.95 MB

The amazing 🐕will normalize non-standard Finnish and dialectalize standard Finnish!

License: Apache License 2.0

Python 100.00%

murre's Introduction

đŸ¶ Murre 🐕

DOI

The amazing Murre (genitive Murren 🐕) will normalize non-standard Finnish (puhekieli) to standard Finnish (kirjakieli). This repository is maintained by Mika HĂ€mĂ€lĂ€inen.

Installation

This library is designed for Python 3 and it may not work on Python 2.

pip3 install murre
python3 -m murre.download

Normalize

To normalize Finnish, all you need to do is to run:

from murre import normalize_sentence

normalize_sentence("mÀ syön paljo karkkii")
>> minÀ syön paljon karkkia

To use the same chunk level BRNN model as described in the paper, you can pass wnut19_model=True, however this model might only work on Linux.

You can normalize multiple sentences at the same time by running

from murre import normalize_sentences

sents = ["kissa syö karkkii", "jok laulaa tuol puole", "en tiiÀ oikee et kuka se o", "kyl on hölömöö"]
normalize_sentences(sents)
>> ['kissa syö karkkia', 'joka laulaa tuolla puolen', 'en tiedÀ oikein ettÀ kuka se on', 'kyllÀ on hölmöÀ']

Generate

Murre can also generate different dialects. All you need to do, is to run:

from murre import dialectalize_sentence
dialectalize_sentence("kodin takana on koira", "Inkerinsuomalaismurteet")
>> 'kojin takan on koira'

Or for multiple sentences:

from murre import dialectalize_sentences
sents = ["kissa syö karkkia", "kÀdellÀ on perhonen", "kettu juoksee sutta karkuun"]
dialectalize_sentences(sents,'Kainuu')
>> ['kissa syöpi karkkia', 'kÀellÀ om perhonej', 'kettu juoksee sutta karkuu']

The list of available dialects can be obtained by:

from murre import supported_dialects
supported_dialects()
>> ['Pohjois-Satakunta', 'Keski-Karjala', 'Kainuu', 'EtelĂ€-Pohjanmaa', 'EtelĂ€-Satakunta', 'Pohjois-Savo', 'Pohjois-Karjala', 'Keski-Pohjanmaa', 'Kaakkois-HĂ€me', 'PohjoinenKeski-Suomi', 'Pohjois-Pohjanmaa', 'PohjoinenVarsinais-Suomi', 'EtelĂ€-Karjala', 'LĂ€nsi-Uusimaa', 'Inkerinsuomalaismurteet', 'LĂ€ntinenKeski-Suomi', 'LĂ€nsi-Satakunta', 'EtelĂ€-Savo', 'LĂ€nsipohja', 'Pohjois-HĂ€me', 'EtelĂ€inenKeski-Suomi', 'Etelä-Häme', 'PerĂ€pohjola']

Cite

Normalization

Niko Partanen, Mika HÀmÀlÀinen, and Khalid Alnajjar. 2019. Dialect Text Normalization to Normative Standard Finnish. In the Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT).

Dialect generation

HÀmÀlÀinen, M., Partanen, N., Alnajjar, K., Rueter J. & Poibeau T. (2020). Automatic Dialect Adaptation in Finnish and its Effect on Perceived Creativity. In Proceedings of the 11th International Conference on Computational Creativity. p. 204-211

Data

The data used in the paper describing dialect generation has been published on Zenodo DOI.

murre's People

Contributors

mikahama avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.