GithubHelp home page GithubHelp logo

fangxm233 / text_classification Goto Github PK

View Code? Open in Web Editor NEW

This project forked from donglinchen/text_classification

0.0 0.0 0.0 34 MB

Build text classifiers using 3 most popular machine learning / deep learning frameworks - Scikit-learn, PyTorch, TensorFlow

Jupyter Notebook 99.63% Python 0.37%

text_classification's Introduction

Text classification using scikit-learn, PyTorch, and TensorFlow

Build text classifiers using 3 most popular machine learning or deep learning libraries - Scikit-learn, PyTorch, TensorFlow

Table of Contents

  1. Installation
  2. Project Motivation
  3. File Descriptions
  4. Results
  5. Licensing, Authors, and Acknowledgements

Installation

You can download anaconda individual edition from https://www.anaconda.com/products/individual, which contains all the useful libraries used by data scientists. Another option is to intall the following packages individually use pip package manager.

  • python 3.8.3
  • tensorflow 2.2.0
  • torch 1.5.1
  • jupyterlab 2.1.4
  • pandas 1.0.4

Project Motivation

Text classification has been widely used in real world business processes like email spam detection, support ticket classification, or content recommendation based on text topics. I would like to build multi-class text classfier using the 3 most popular open source machine learning or deep learning libraries: scikit-learn, PyTorch, and TensorFlow. I am interested in seeing how they perform comparing to each other.

File Descriptions

  1. gather_explore_data.ipynb: Gathers sample data used for this project and explore how the data look like
  2. feature_extraction.ipynb: Transforms texts or words into numerical vector representation in order to feed into models for training
  3. util.py: The help functions for feature extraction
  4. model_scikit_learn.ipynb: Build and train text classifiers using Scikit Learn
  5. model_pytorch.ipynb: Build and train text classification using PyTorch
  6. model_tensorflow_tfidf.ipynb: Build and train text classification using TensorFlow, and encoding input texts using TF-IDF algorithm
  7. model_tensorflow.ipynb: Build and train tect classification using TensorFlow, and encode imput text using padded sequences. Also apply word embedding.

Results

The result can be found at the post available at https://medium.com/@donglinchen/text-classification-using-scikit-learn-pytorch-and-tensorflow-a3350808f9f7

Licensing, Authors, Acknowledgements

Sample data are available at: https://www.kaggle.com/yufengdev/bbc-fulltext-and-category

text_classification's People

Contributors

donglinchen avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.