GithubHelp home page GithubHelp logo

sagorbrur / bendeep Goto Github PK

View Code? Open in Web Editor NEW
5.0 3.0 1.0 10.62 MB

pytorch based deep learning solution for bengali nlp task

Home Page: https://bendeep.readthedocs.io

License: MIT License

Python 2.10% Jupyter Notebook 97.90%
sentiment-analysis bengali-sentiment-analysis bengali bangla pytorch bengali-translation

bendeep's Introduction

BENDeep

Downloads Notebook

BENDeep is a pytorch based deep learning solution for Bengali NLP Task like bengali translation, bengali sentiment analysis and so on.

Installation

pip install bendeep

Dependency

  • pytorch 1.5.0+

Pretrained Model

API

Sentiment Analysis

Analyzing Sentiment

This sentiment analysis model is a RNN based GRU model trained with socian sentiment dataset with loss 0.073 in 150 epochs. Dataset size: 4000 sentences

from bendeep import sentiment
model_path = "senti_trained.pt"
vocab_path = "vocab.txt"
text = "রোহিঙ্গা মুসলমানদের দুর্ভোগের অন্ত নেই।জলে কুমির ডাংগায় বাঘ।আজকে দুটি ঘটনা আমাকে ভীষণ ব্যতিত করেছে।নিরবে কিছুক্ষন অশ্রু বিসর্জন দিয়ে মনটাকে হাল্কা করার ব্যর্থ প্রয়াস চালিয়েছি।"

sentiment.analyze(model_path, vocab_path, text)

Training Sentiment Model

To train this model you need a csv file with one column review means text and another column sentiment with 0 or 1, where 1 for positive and 0 for negative sentiment.

Example:

,review,sentiment
0,তোমাকে খুব সুন্দর লাগছে।,1
1,আজকের আবহাওয়া খুব খারাপ।,0
review sentiment
0 তোমাকে খুব সুন্দর লাগছে। 1
1 আজকের আবহাওয়া খুব খারাপ। 0
from bendeep import sentiment
data_path = "sentiment_data.csv"
sentiment.train(data_path)
# you can also pass these parameter
# sentiment.train(data_path, batch_size = 64, epochs=100, model_name="trained.pt")

after successfully training it will complete training and save model as trained.pt also save vocab file as vocab.txt

Machine Translation

Translate Bengali to English

This model is a seq2seq attentional model trained with this dataset with loss 0.0.

from bendeep import translation
from bendeep.translation import EncoderRNN
from bendeep.translation import AttnDecoderRNN

data_path = "data/translation/eng-ben.txt"
encoder = "models/translation/encoder.pt"
decoder = "models/translation/decoder.pt"
input_sentence = "আমার শীত করছে।"
translation.bn2en(data_path, encoder, decoder, input_sentence)
# outupt
# > আমার শীত করছে ।
# = i feel cold .

Training Translation Model

To train translation model you need a dataset in .txt format with tab separate input and target sentences.

Example:

I eat rice. আমি ভাত খাই।
He goes to school.  সে বিদ্যালয়ে যায়।
from bendeep import translation
from bendeep.translation import EncoderRNN
from bendeep.translation import AttnDecoderRNN

data_path = "data/translation/eng-ben.txt"
translation.training(data_path, iteration=75000)

after successfully training it will complete training and save encoder and decoder model as encoder.pt, decoder.pt. Also display some random evaluation results.

References

bendeep's People

Contributors

sagorbrur avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

bendeep's Issues

where to get vocab.txt?

i was trying this :

from bendeep import sentiment
model_path = "senti_trained.pt"
vocab_path = "vocab.txt"
text = "রোহিঙ্গা মুসলমানদের দুর্ভোগের অন্ত নেই।জলে কুমির ডাংগায় বাঘ।আজকে দুটি ঘটনা আমাকে ভীষণ ব্যতিত করেছে।নিরবে কিছুক্ষন অশ্রু বিসর্জন দিয়ে মনটাকে হাল্কা করার ব্যর্থ প্রয়াস চালিয়েছি।"
sentiment.analyze(model_path, vocab_path, text)

can you share the vocab.txt with us? i just wanted to test

Getting error while running on colab

I am following the documentation of Machine Translation and trying to run the code on colab. But I am getting the following error

`Error:

Explanation:
Sentence contains out of vocabulary word. That means sentence has a word which not exist in trained data
/usr/local/lib/python3.7/dist-packages/torch/serialization.py:656: SourceChangeWarning: source code of class 'torch.nn.modules.sparse.Embedding' has changed. you can retrieve the original source code by accessing the object's source attribute or set torch.nn.Module.dump_patches = True and use the patch tool to revert the changes.
warnings.warn(msg, SourceChangeWarning)
/usr/local/lib/python3.7/dist-packages/torch/serialization.py:656: SourceChangeWarning: source code of class 'torch.nn.modules.rnn.GRU' has changed. you can retrieve the original source code by accessing the object's source attribute or set torch.nn.Module.dump_patches = True and use the patch tool to revert the changes.
warnings.warn(msg, SourceChangeWarning)
/usr/local/lib/python3.7/dist-packages/torch/serialization.py:656: SourceChangeWarning: source code of class 'torch.nn.modules.linear.Linear' has changed. you can retrieve the original source code by accessing the object's source attribute or set torch.nn.Module.dump_patches = True and use the patch tool to revert the changes.
warnings.warn(msg, SourceChangeWarning)
/usr/local/lib/python3.7/dist-packages/torch/serialization.py:656: SourceChangeWarning: source code of class 'torch.nn.modules.dropout.Dropout' has changed. you can retrieve the original source code by accessing the object's source attribute or set torch.nn.Module.dump_patches = True and use the patch tool to revert the changes.
warnings.warn(msg, SourceChangeWarning)`

FYI, I have successfully run the "Sentiment Analysis" code on colab and got proper output.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.