GithubHelp home page GithubHelp logo

gjyakhwa1 / multi-label-text-classification-with-fully-connected-nn Goto Github PK

View Code? Open in Web Editor NEW

This project forked from kritinyoupane/multi-label-text-classification-with-fully-connected-nn

0.0 1.0 0.0 297 KB

Jupyter Notebook 100.00%

multi-label-text-classification-with-fully-connected-nn's Introduction

Multi-Label-Text-Classification-with-Fully-Connected-NN

Data-preprocess

For preprocessing the data, the code is present in google colab. One can use any one of the following method to clean the data

  1. Cleaning Using Regular Expression
  2. Stemming

The programmer can either run the code for cleaning the dataset or run the code to import the clean dataset directly from drive. The second option is recommended as it will save the time.

Vectorize the dataset

It converts the given sequence of text into vectors. Word vectorization is a map words from vocabulary to a corresponding vector of real numbers which is used to find word predictions, word similarities/semantics. We have used TFIDF vectorizer.

Balancing the dataset

The dataset is highly imbalance. So, we have used SMOTE (Synthetic Minority Over-sampling Technique) for minority class and RandomUnderSampler for majority class to balance the dataset.

Label Encoding and One Hot Encoding

Label Encoding is used to represent the labels in numeric form. But, the number of labels in train dataset and validation dataset isn't equal. So, one hot encoding is used to ensure the uniformity in the shape of target classes.

Training Fully Connected Neural Network

Train the neural network from the colab code or directly import it from the model subdirectory in the drive link.

Making the Predictions

One can make the prediction and calculate f1 score on either validation or test dataset running the function predictions_f1score.

Summary

We have used the Fully Connected Neural Network as our model for this project. Packages like Tensorflow,SkLearn, Numpy, Pandas, matplotlib, seaborn, and so on are used as required. It takes almost 1 hour 40 minutes to train the model.

System Specifications

We have used Google Colab for training our model. Google Colab uses python 3 Google Compute engineer backend. Total Ram provided by google colab: 12.69 GB (used almost 4 GB) Disk Space: 42.16GB/107.72GB

Team Info

Kriti Nyoupane - [email protected] Gaurav Jyakhwa - [email protected]

multi-label-text-classification-with-fully-connected-nn's People

Contributors

kritinyoupane avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.