GithubHelp home page GithubHelp logo

hb20007 / greek-dialect-classifier Goto Github PK

View Code? Open in Web Editor NEW
7.0 3.0 3.0 1.08 MB

Classifier that identifies Greek text as Cypriot Greek or Standard Modern Greek

License: MIT License

Jupyter Notebook 100.00%
greek cypriot classification classifier n-grams dialect dialect-identification dialects language-identification language-classification

greek-dialect-classifier's Introduction

Binder

Greek Dialect Classifier

Putting an end to “It's all Greek to me.”

This is a classifier that identifies Greek text as Cypriot Greek (CG) or Standard Modern Greek (SMG).

For more information, you can read my thesis: A Classifier to Distinguish Between Cypriot Greek and Standard Modern Greek.

1. Notebooks

Index of Jupyter Notebooks
1. Obtaining CG and SMG tweets
Code used to collect tweets
2. Data Analysis
Analyzing the corpus
3. Building the Classifier
Building the CG-SMG classifier

2. The corpus

The corpus can be found in the Data directory. It was collected by me personally and labeled into CG and SMG by separating text into files.

Index of files in corpus
CG Facebook
CG text collected from Facebook posts and comments
CG Twitter
CG text collected from tweets
CG Other
CG text collected from forum posts, blog and news article comments
SMG Facebook
SMG text collected from Facebook posts and comments
SMG Twitter
SMG text collected from tweets
SMG Other
SMG text collected from forum posts, blog and news article comments

Feel free to use the corpus or a subset of it in any kind of project as long as you provide a link to this repository.

3. Instructions

In order to run the code, either clone the repository and run Jupyter Notebooks locally, or click on the Binder badge at the top of this readme to instantly run the notebooks on a remote server. If you choose the latter option, you still need to use nltk.download() in order to download the required NLTK modules.

4. Trying the classifier

If you are only interested in running the classifier with your own text as input, go to the last section of 3. Building the Classifier.

5. Meta

H. Z. Sababa — hb20007 — [email protected]

Distributed under the MIT license. See LICENSE for more information.

greek-dialect-classifier's People

Contributors

hb20007 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.