GithubHelp home page GithubHelp logo

dimartinot / text-semantic-similarity Goto Github PK

View Code? Open in Web Editor NEW
30.0 2.0 7.0 44.73 MB

This is the repository for the code that ran the experiments presented in the following article: Introduction to Deep Similarity Learning for Sequences

Home Page: https://towardsdatascience.com/introduction-to-deep-similarity-learning-for-sequences-89d9c26f8392

License: Apache License 2.0

Python 10.96% Jupyter Notebook 89.04%

text-semantic-similarity's Introduction

Text Semantic Similarity

This is the repository for the code that ran the experiments presented in the following article: Introduction to Deep Similarity Learning for Sequences

File Exploration

The most important files are:

  • EDA.ipynb Exploratory Data Analysis notebook: used to clean and analyse the dataset. Generates the pickled version of the dataset with pre-computed sentence embeddings
  • Training.ipynb Main training pipeline: loads pickled dataset generated using the EDA.ipynb notebook
  • contrastiveModel.py : Models are kept in a single file for the moment as they share loads of similarities.

Installation

I advise the use of Anaconda distribution to run the code of this project. An anaconda environment file has been generated and can be used to create a new working environment using the following command:

conda env create -f environment.yml

Dataset generation

To generate the dataset, retrieved the source in kaggle and then import it and run the commands shown in notebook/EDA.ipynb to save a pickled dataset file (approx. 3GB of size).

Execution

To execute the main code of this project, you can either run:

cd notebook
jupyter notebook

and then run the Training.ipynb model.

Or you could just run:

python main.py

Results

The training result of my initial TextSimilarityDeepSiameseLSTM class with a LogReg classifier are the following:

Train Acc: 0.7993654994990785 - Val Acc: 0.7652195423623995 - Test Acc: 0.7669758812615955

img

text-semantic-similarity's People

Contributors

dependabot[bot] avatar dimartinot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.