GithubHelp home page GithubHelp logo

aditeyabaral / doc2sim Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 2.0 1.56 MB

A simple command line utility to find similarity in content between documents using Doc2Vec.

Python 41.63% JavaScript 22.47% HTML 35.25% Shell 0.64%
doc2vec gensim machine machine-learning nlp python python3 word2vec

doc2sim's Introduction

Hello, I am Aditeya

I am an aspiring NLP researcher and a Masters in Computer Science student at New York University, Courant.

I am inspired by our ability to learn and comprehend languages in various environments and would like to enable machines to develop human-like visual and language understanding. My dream is to build more inclusive language technologies for low-resource and under-represented languages, including code-mixed languages.

My research interests lie in the field of language-vision understanding and representation learning, with a focus on creating task-agnostic architectures to advance natural language understanding and related applications.

๐Ÿ‘จโ€๐Ÿ’ป Languages

Python Scala Java C R Shell Script Octave Markdown Apache Groovy LaTeX

๐Ÿค– Machine Learning and Statistics

PyTorch TensorFlow Keras OpenCV scikit-learn NumPy Pandas SciPy Matplotlib Plotly

๐Ÿ’พ Big Data and Databases

Apache Hadoop Apache Kafka Apache Flink Apache Spark RabbitMQ Redis Kibana ElasticSearch MongoDB MySQL Postgres

๐Ÿงฐ Tools

Visual Studio Code IntelliJ IDEA Jupyter Notebook PyCharm Colab Spyder RStudio Arduino IDE Overleaf

๐Ÿ“ฆ Libraries and Frameworks

Anaconda Flask Selenium Linux Grafana Jira Postman Docker Apache Maven Jenkins Git GitHub GitLab GitHub Actions AWS Azure Github Pages Google Cloud Heroku Canva Gimp Gnu Image Manipulation Program Microsoft Office

โ˜Ž๏ธ Contact me

Gmail Discord Twitter Telegram

๐Ÿ† GitHub Stats

@aditeyabaral's Holopin board

doc2sim's People

Contributors

aditeyabaral avatar andidevel avatar

Stargazers

 avatar

Watchers

 avatar  avatar

doc2sim's Issues

Rewrite code using MOSS

Rewrite similarity metrics using MOSS. Display the similarity scores received from MOSS and display in a tabular format and also export them as a CSV. Should support all the filetypes supported by MOSS.

Use mosspy to integrate Python and MOSS

Addition of Word Embedding Models

Looking to add more word embedding models that handle context as well as sentence vectors. Contributions such as trainable ELMo, fastText, weighted word2vec are welcome.

GUI

It would be great to support the application with a neat GUI built using Flask (since its a pretty simple script with no heavy backend). Looking for contributions towards this!

Rewrite using argparse

Use argparse to take in either a directory, or a set of file paths and compute similarity between those files

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.