GithubHelp home page GithubHelp logo

rajaswa / drift Goto Github PK

View Code? Open in Web Editor NEW
112.0 3.0 12.0 141.14 MB

DRIFT is a tool for Diachronic Analysis of Scientific Literature.

Home Page: https://aclanthology.org/2021.emnlp-demo.40/

License: MIT License

Makefile 0.55% Python 99.45%
diachronic-embeddings scientific-visualization nlp hacktoberfest

drift's Introduction

Hi there, I'm Rajaswa - aka rajaswa ๐Ÿ‘‹

I'm a senior year undergraduate student at BITS Goa.

  • ๐ŸŒฑ Iโ€™m currently exploring Computational Psycholinguistics
  • ๐Ÿ‘ฏ Iโ€™m looking to collaborate with other researchers and linguists
  • ๐Ÿฅ… 2020 Goals: Building computational tools for researchers working at the intersection of linguistics theory and NLP!
  • ๐Ÿ˜„ Pronouns: He/Him
  • โšก Fun fact: Most of the Indic languages will have grammatical genders!

Connect with me:

sites.google.com/view/rajaswa/ RajaswaPatil | Twitter rajaswa-patil | LinkedIn


Languages and Tools:

Python

AWS

TF

Azure

Docker

Latex

Linux

Visual Studio Code

Git

GitHub

terminal

vim

sklearn

\

Rajaswa's Github Stats

drift's People

Contributors

abheesht17 avatar gchhablani avatar harsh4799 avatar rajaswa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

drift's Issues

Data Scraping

  • Scrape data from here
  • Make a separate Excel file for every conference
  • Can also store it as json

Alternatives to top-k selection

The top-k words is not very useful as of this moment.
Need to implement the following options:

  • POS-based selection for keywords.
  • TF/IDF-based selection.
  • Yake-based selection.

Add statistics to analysis methods

Need to display data frames of some statistical information, year-wise word-frequency, number of articles/words/tokens/POS in each year, etc.

LDA Topic Modelling error

When choosing the LDA Topic Modelling section , the following message appears :
ValueError: list.remove(x): x not in list
Traceback:
File "c:\users\doub2420.virtualenvs\drift-qengzvvy\lib\site-packages\streamlit\script_runner.py", line 337, in run_script
exec(code, module.dict)
File "C:\drift\app.py", line 1726, in
year_paths.remove(os.path.join(vars
["data_path"], "compass.txt"))

What would that mean ?

Thanks !

Changes to Productivity Plot

  1. Check if cluster labels are more or less correct, otherwise we will remove/change the cluster table.
  2. Formatting changes for cluster table might be required
  3. Labels in dataframe should be named

Top-K in Yake is unused

Either top-K should be removed, or used in plot. If top-K is being used, then there should be many bars in the bar plot.

image

Script for Clustering Word Embeddings

  • Use K-Means for clustering the diachronic word embeddings.
  • Rough Sketch:
    • The class can have multiple functions: train, predict, store, visualisation, etc.
    • The function(s) will take as input word vectors from a particular timestamp. They will also take as input parameters of K-Means like number of clusters, etc.
    • Add functionality for visualisation.
    • Return the centroids and the cluster to which the words belong.

Keyword Extraction from every timespan

For every timespan, identify keywords (make diagrams: https://arxiv.org/pdf/2006.01131.pdf). Frequency is generally used as a proxy for keyword identification. We can explore methods like RAKE, etc. Other than just identifying words with the highest frequency in every timespan, we can look for words with the highest jump in frequency in two consecutive timespans. Not just words, we can analyse n-grams too.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.