GithubHelp home page GithubHelp logo

divyansha1115 / text-classification-using-lda-and-gcn Goto Github PK

View Code? Open in Web Editor NEW
14.0 2.0 7.0 131.67 MB

Constructed a structured heterogeneous text corpus graph to transform text classification problem into a node classification problem. Created semantic rich features by using Text GCN and topic modeling based approach-LDA which are then fed into a novel classification model.

Python 34.68% Jupyter Notebook 65.32%
latent-dirichlet-allocation graph-con text-cla ohsumed

text-classification-using-lda-and-gcn's Introduction

Topic-Modelling GCN + LDA

This repository explores Latent Dirichlet Allocation methods for text based classification employing various Graph Convolutional Networks.

Dataset

Download from: https://drive.google.com/file/d/10kx3z3bjYFoeRjjg1_DZOAP39Jln0BCh/view?usp=sharing and keep under TestSGC/ after extracting.

Step:: 1 Pre-Processing

For pre-processing and arranging the dataset into DataFrame except for 20ng and ohsumed (which are done as given in the code step_1_data_to_pandas_normal.py) remaining datasets have their iPyNB in their respective dataset directories.

Step:: 2 LDA Feature Vector

step_2_topic_modelling.py

Step:: 3 Gathering LDA Feature Vector into a Composite Feature Matrix

For matching the feature matrix of GCN in "Graph Convolutional Networks for Text Classification's" implementation, we have used their file used for indicating document names, training/test split, document labels. Each line is for a document. These files are stored under document_information.

Features

Our approach is an ensemble work of features from LDA and Graph Convolution.

Code Description

  • LDA features obtained from topic_modelling.py
  • The data is then converted to LDA probability matrix using data_to_pandas.py
  • The GCN features are calculated considering various parameters and stored for training.
  • SGC/downstream/TextSGC/ contains the model files for training the different network architectures.
  • The model also involves the use of skip-architecture which improves model performance.

The features obtained from LDA together with the GCN features are merged in a systematic fashion to obtain a feature rich map which is then fed into a custom-build model. Experiments were carried on to obtain optimal results.

text-classification-using-lda-and-gcn's People

Contributors

abdullahkhilji avatar divyansha1115 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.