GithubHelp home page GithubHelp logo

ubc-nlp / afrolid Goto Github PK

View Code? Open in Web Editor NEW
27.0 1.0 8.0 12.62 MB

AfroLID, a powerful neural toolkit for African languages identification which covers 517 African languages.

Home Page: https://demos.dlnlp.ai/afrolid/

License: Apache License 2.0

Python 100.00%
african-languages deep-learning language-identification nlp toolkit dlnlp natural-langauge-processing ubc-nlp

afrolid's Introduction

AfroLID

GitHub release Documentation GitHub license Documentation Status GitHub stars GitHub forks

online_demo

AfroLID, a neural LID toolkit for 517 African languages and varieties. AfroLID exploits a multi-domain web dataset manually curated from across 14 language families utilizing five orthographic systems. AfroLID is described in this paper: AfroLID: A Neural Language Identification Tool for African Languages.


Requirements

  • Download AfroLID model:
    wget https://demos.dlnlp.ai/afrolid/afrolid_model.tar.gz
    tar -xf afrolid_model.tar.gz

Installation

  • To install AfroLID and develop directly using pip:
    pip install -U afrolid
  • To install AfroLID and develop directly GitHub repo using pip:
    pip install -U git+https://github.com/UBC-NLP/afrolid.git
  • To install AfroLID and develop locally:
    git clone https://github.com/UBC-NLP/afrolid.git
    cd afrolid
    pip install .

Getting Started

The full documentation contains instructions for getting started, translation using diffrent methods, intergrate AfroLID with your code, and provides more examples.

Colab Examples

(1) Integrate AfroLID with your python code

ContentColab link
  • Install AfroLID
  • Download AfroLID's model
  • Initial AfroLID object
  • Get language prediction(s)
  • Integrate with Pandas
colab

(2) Command Line Interface

Command ContentColab link
afrolid_cli
  • Usage and Arguments
  • Examples
colab

Supported languages

Please refer to suported-languages

License

afrolid(-py) is Apache-2.0 licensed. The license applies to the pre-trained models as well.

Citation

If you use AfroLID toolkit or the pre-trained models for your scientific publication, or if you find the resources in this repository useful, please cite our paper as follows (to be updated):

@article{adebara2022afrolid,
  title={AfroLID: A Neural Language Identification Tool for African Languages},
  author={Adebara, Ife and Elmadany, AbdelRahim and Abdul-Mageed, Muhammad and Inciarte, Alcides Alcoba},
  booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
  month = December,
  year = "2022",
}

Acknowledgments

We gratefully acknowledge support from Canada Research Chairs (CRC), the Natural Sciences and Engineering Research Council of Canada (NSERC; RGPIN-2018-04267), the Social Sciences and Humanities Research Council of Canada (SSHRC; 435-2018-0576; 895-2020-1004; 895-2021-1008), Canadian Foundation for Innovation (CFI; 37771), Digital Research Alliance of Canada, UBC ARC-Sockeye, Advanced Micro Devices, Inc. (AMD), and Google. Any opinions, conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of CRC, NSERC, SSHRC, CFI, CC, AMD, Google, or UBC ARC-Sockeye.

afrolid's People

Contributors

elmadany avatar fenimi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

afrolid's Issues

RuntimeError: Mask Type should be defined

Description

Hi, I get this error when I try to run the AfroLID code that integrates it with pandas. When I troubleshoot they say the main reason could be data type incompatibility between the input and output data, but I've made sure the input data is a string type. Could it be because AfroLID doesn't recognise the language being input? I'm using South African tweets that are sometimes a mix of English and other local languages. Although, I also get the same error when working with purely Swahili tweets. Please help.

Screenshots

Error

Files

Integrate AfroLID with Python code

To Reproduce

import pandas as pd
from tqdm import tqdm
tqdm.pandas()
df = pd.read_csv(filename)

def get_afrolid_prediction(text):
predictions = cl.classify(text, max_outputs=1)
for lang in predictions:
return lang, predictions[lang]['score'], predictions[lang]['name'], predictions[lang]['script']

df['predict_iso'], df['predict_score'], df['predict_name'], df['predict_script'] = zip(*df['Text'].progress_apply(get_afrolid_prediction))

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.