GithubHelp home page GithubHelp logo

diverse-crowd's Introduction

Installation

pipenv install if using pipenv

OR

pip install -r requirements.txt if not

Run

python train.py --help for options

Use python train.py --filename test --refresh as an example.

--filename specifies what name to save the files as. These files will all be saved to the static folder with the following convention:

  • : a binary file of tweets in raw-text form
  • .model : a gensim-model pickle, this is saved but not used later (but can be used later if you want to feed more raw data into the model)
  • .kv : a gensim-model KeyedVector instance that is loaded in order to calculate vectors from input words

--refresh / --no-refresh is an option that speeds things up and saves money by calling from the local file instead of the Twitter API. If you put --refresh, it will definitely call Twitter's API; if you put --no-refresh, it'll scan your static folder for , and if available, load it up to proceed; if not available, it will do the same as --refresh.

--word_vectors is an option that can be supplied in addition to --filename. Setting this option with a model name provided here will result in the corresponding model being downloaded from gensim, instead of training a model using tweets from Twitter API or cached data. The tweets data will still be used to calculate user similarity etc, but not used to train a model.

Example: python train.py --word_vectors glove-twitter-25 --filename test

Model Development Ideas

  1. More users, more tweets beyond most recent 20
  2. Use bigrams
  3. Tweaking gensim hyperparameters e.g. min_count
  4. Clean out more stopwords like 'and', 'or', 'I'll'
  5. Use representative words, phrases and tweets instead of representative users
  6. Label dimensions like liberal/conservative, pro-life/pro-choice, crypto/anti-crypto

diverse-crowd's People

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.