GithubHelp home page GithubHelp logo

major-project's Introduction

Image Captioning in Nepali Language.

This project aims to develop a deep learning-based system mainly focus for image captioning in Nepali languages. The system will take an input image and generate paragraph caption in Nepali Language. The project leverages state-of-the-art deep learning models and techniques to achieve accurate and meaningful results.

Abstract

The advent of deep neural networks has made the image captioning task more feasible. It is a method of generating text by analyzing the different parts of an image. A lot of tasks related to this have been done in the English language while very little effort is put into this task in other languages, particularly in Nepali language. It is an even harder task to carry out research in the Nepali language because of its difficult grammatical structure and vast language domain. Further, the little work done in the Nepali language is done to generate only a single sentence but we emphasize to generate the paragraph long (3-4) coherent sentences. We used the Stanford human-genome dataset which was translated into Nepali language using the Google Translate API. Along with this, we manually curated a dataset consisting of 800 images of the cultural sites of Nepal along with their Nepali captions. These two datasets were combined to train the deep learning model. The work was carried out on encoder-decoder architecture, with pre-trained CNN (Inception-V3)acting as an encoder that extracts the features from the images, and for the decoder purpose, we have used two architectures LSTM and Transformers to see which architecture works better. We used the BLEU score as an evaluation metric for this research. Experiments showed the transformer works better than LSTM in the case of Nepali language for this captioning task

Overall Tech Used:

  • Model ( /jupyter notebook)
    • LSTM
      • RESNET152 ( For Feature Extraction)
    • Transformer
      • Inception-V3 ( For encoder)
  • Frontend ( /Projet UI)
    • React
    • axios
    • scss
  • Backend ( /Server & /Backend)
    • Flask
    • Python
    • MongoDB
    • Node JS

Setup Guide

  1. Clone the repository. /Data Collection folder is heavy: Be very careful here!

    git clone https://github.com/chhetri123/Major_Project.git
    
  2. Once cloned successfully, open this project in your IDE

Backend ( Trained Model Setup)

  1. Once the above steps are done, open the terminal of your IDE and head over to the \Server:

    Cd Server
    
  2. Then Create the virutalenv.

    # For windows
    python -m venv venv
    
    <!-- OR -->
    
    # For macos
    python3 -m venv venv
    
  3. Activate virutalenv using the below command :

    source venv/bin/activate
    
  4. Install Require packages

    # For windows
    pip install -r requirements.txt
    
    <!-- OR -->
    
    # For macos
    pip3 install -r requirements.txt
    
  5. As everything is ready now, we can run the Model as

    # For windows
    python app.py
    
    <!-- OR -->
    
    # For macos
    python3 app.py
    
    

Frontend Setup

  1. Open another terminal and head over into the \Project UI :

    cd  Project UI
    
  2. Install the require packages:

    yarn install
    
  3. And run server:

  yarn run dev
  1. And you can view the page with the url http://localhost:3000

Team Members

Contributions and License

Contributions to the project are welcome! If you encounter any issues or have suggestions for improvements, please open an issue or submit a pull request. The project is licensed under the MIT License.

major-project's People

Contributors

subedinab avatar

Stargazers

Kiran Shahi avatar Tek Singh Ayer avatar

Watchers

 avatar

Forkers

kiranshahi

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.