GithubHelp home page GithubHelp logo

awesome-resources-for-indic-nlp's Introduction

Awesome Resources for IndicNLP

Common Resources

OPUS the open parrallel corpus

A Dravidian Etymological Dictionary

Byte Pair Encoding - Pretrained for 275 language

FastText word vectors for 157 languages

Indian Language Technology Proliferation and Deployment Center

Center For Indian Language Technology - CFILT FB page

Indian Institute of Language Studies (IILS)

Central Institute of Indian Languages

Central Institute of Indian Languages

OpenSLR Speech datasets

Research Papers

Survey:Natural Language Parsing For Indian Languages

Language Specific

Malayalam

mlmorph - Malayalam Morphological Analyzer using Finite State Transducer

Tamil

Datasets

Datasets in tamil text

Other projects

Open Tamil Suite of tools for operating on tamil text.

Tokenizer, Language model and Classifier for Tamil language by Ravi Annaswamy

Scrapers

  1. Tamil Etymological Dictionary
  2. Newspaper Crawlers

ML models

Text Classification model in Pytorch: Can be easily applied to other datasets, infact the linked repository also contains a dataset for film reviews in tamil.

Bengali

Bangla2Vec

Bengali News Classification

NLP for Bengali

  • Contains Wikipedia Articles Dataset (72,374 articles) and scripts which were used to scrape Wikipedia and clean that dataset
  • Contains Language Model with Perplexity ~41
  • Contains Bengali News Classification Model with 94% accuracy

Scrapers

Bengali News Channel Scraper

Telgu

Telugu-NLP - Contains NLP tools developed for telugu

Research Papers and Data

Research Papers in Bengali NLP

Collection of Repositories

Language Repository Perplexity of Language model Wikipedia Articles Dataset Classification accuracy Classification Kappa score
Hindi NLP for Hindi ~36 55,000 articles ~79 (News Classification) ~30 (Movie Review Classification)
Punjabi NLP for Punjabi ~13 44,000 articles ~89 (News Classification) ~60 (News Classification)
Sanskrit NLP for Sanskrit ~6 22,273 articles ~70 (Shloka Classification) ~56 (Shloka Classification)
Gujarati NLP for Gujarati ~34 31,913 articles ~91 (News Classification) ~85 (News Classification)
Kannada NLP for Kannada ~70 32,997 articles ~94 (News Classification) ~90 (News Classification)
Malyalam NLP for Malyalam ~26 12,388 articles ~94 (News Classification) ~91 (News Classification)
Nepali NLP for Nepali ~32 38,757 articles ~97 (News Classification) ~96 (News Classification)
Odia NLP for Odia ~27 17,781 articles ~95 (News Classification) ~92 (News Classification)
Marathi NLP for Marathi ~18 85,537 articles ~91 (News Classification) ~84 (News Classification)
Bengali NLP for Bengali ~41 72,374 articles ~94 (News Classification) ~92 (News Classification)

awesome-resources-for-indic-nlp's People

Contributors

adamshamsudeen avatar goru001 avatar soham96 avatar vanangamudi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

awesome-resources-for-indic-nlp's Issues

Need to add more details about each link

Just having links is not informative enough. We need to add more details about what the link contains.

I will take up this task for the Bengali and Common resources.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.