GithubHelp home page GithubHelp logo

ai4bharat / indic-glossaries Goto Github PK

View Code? Open in Web Editor NEW
4.0 8.0 0.0 1.27 GB

Collection of datasets for glossaries in Indian languages

Home Page: https://tools.ai4bharat.org/#/indic-glossary-explorer

License: MIT License

indic-glossaries's Introduction

Indic Glossary Datasets & Workflows


An umbrella portal for all the glossary creation workflows and datasets

License: MIT


Indic-Glossaries is an open source portal that is an umbrella to expose all the workflows for the creation of Indic glossaries and also the created/curated glossary datasets.

Datasets

The datasets are licensed under : CC BY 4.0

(The entire collected glossary datasets are submitted to ULCA platform. Please refer https://bhashini.gov.in/ulca for further details)

Breakdown by Collection Source

Collection Source Glossary Corpus Count Download link
IndoWordNet 1,724,816 download
Bharatavani 866,423 download
CSTT 319,420 download
OSF 212,951 download
NLPC - Univ of Moratuwa 19,273 download
Anuvaad 1,690 download
NCF-NCERT 272 download

Breakdown by Language Pair

Language Pair Glossary Corpus Count
English-Assamese 75,256
English-Bengali 102,855
English-Bodo 158,457
English-Dogri 8,624
English-Goan Konkani 87,300
English-Gujarati 168,343
English-Hindi 852,324
English-Kannada 156,777
English-Kashmiri 69,673
English-Maithili 8,297
English-Malayalam 98,092
English-Manipuri 7,183
English-Marathi 106,787
English-Nepali 65,795
English-Odia 130,801
English-Punjabi 152,081
English-Sanskrit 130,042
English-Sindhi 4,797
English-Tamil 316,976
English-Telugu 92,839
English-Urdu 79,253
Hindi-Tamil 113
Hindi-Telugu 13,340
Hindi-Urdu 3,133
Hindi-English 130,422
Malayalam-English 9,269
Sanskrit-Hindi 111,155
Tamil-Hindi 4,868

Breakdown by Domain

Domain Glossary Corpus Count
general 2,209,449
economy 21,762
technology 26,360
education 351,387
geography 92,169
legal 46,870
financial 17,163
automobile 6,303
healthcare 278,436
national-security-and-defence 30,570
agriculture 4,807
parliamentary 22,180
history 23,955
news 7,303
lifestyle 3,506
entertainment 1,143
philosophy 1,489

Goal

The goal is to build high quality glossary datasets for the Indian languages across various domains (General, Legal, Education, Healthcare, Automobile, News etc).

Read more about Glossary Explorer @ https://glossary.ai4bharat.org

Communication Forum

Any information/help/discussion required, can be taken up using the following link : https://github.com/AI4Bharat/Indic-Glossaries/discussions

Code of Conduct

This project adheres to the Contributor Covenant code of conduct. By participating, you are expected to uphold this code. Please report unacceptable behavior to [email protected].

indic-glossaries's People

Contributors

aravinth avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.