GithubHelp home page GithubHelp logo

bailey-j / tweedr Goto Github PK

View Code? Open in Web Editor NEW

This project forked from dssg/tweedr

0.0 0.0 0.0 12.03 MB

A machine learning API to analyze tweets during disasters.

Home Page: http://dssg.uchicago.edu/project/measuring-disaster-damage-with-tweets/

License: MIT License

Python 35.48% JavaScript 51.27% CSS 0.23% R 0.21% PHP 12.80%

tweedr's Introduction

Tweedr: measuring disaster damage with tweets

Tweedr makes information from social media more accessible to providers of disaster relief. There are two aspects to the application:

  1. An API / pipeline for applying machine learning techniques and natural language processing tools to analyze social media produced in response to a disaster.
  2. A user interface for manipulating, filtering, and aggregating this enhanced social media data.

Tweedr is a Data Science for Social Good project, through a partnership with the Qatar Computational Research Institute.

Problem, solution, data

web app screenshot

Project layout

  • doc/ contains various presentations, along with accompanying slides and poster.
    • doc/report/ contains a more technical and extensive write-up of this project. In progress.
  • ext/ is created by a complete install; external data sources and libraries are downloaded to this folder.
  • static/ contains static (non-Javascript) files used by the web app.
  • templates/ contain templates (both server-side and client-side) used by the web app.
  • tests/ contain unittest-like tests. Use python setup.py test to run these.
  • tools/ holds tools to aid development (currently, only a test-running git-hook).
  • tweedr/ contains the main Python app and functions as a Python package (e.g., import tweedr).

Installation guide

git clone https://github.com/dssg/tweedr.git
cd tweedr
python setup.py develop download_ext

If you want to jump straight to development, see the Contributing wiki page.

Dependencies

Tweedr uses a number of external libraries and resources. This is the dependency tree:

crfsuite and liblbfgs are the only components that can't be installed directly with Python via setuptools. Though if you have trouble installing some of the packages above, you might have better luck looking for those packages in your operating system's pacakge manager or as binaries on the projects' websites.

Installation steps

1. Installing libLBFGS

The source code can be downloaded from the maintainer's webpage, though this Github fork (and below) attempts to simplify the install process.

git clone https://github.com/chbrown/liblbfgs.git
cd liblbfgs
./configure
make
sudo make install

2. Installing CRFsuite

Like libLBFGS, a tarball can be downloaded from the original website, though the accompanying fork on Github attempts to document the installation process and make compilation more automatic on both Linux and Mac OS X.

git clone https://github.com/chbrown/crfsuite.git
cd crfsuite
./configure
make
sudo make install

That installs the library, but not the Python wrapper, which takes a few more steps:

cd swig/python
python setup.py build_ext
sudo python setup.py install_lib

To test whether it installed correctly, you can run the following at your terminal, which should print out the current CRFsuite version:

python -c 'import crfsuite; print crfsuite.version()'
> 0.12.2

The github repository documents a few more options that might come in handy if the process above does not work for your operating system.

3. Configuring environment variables

Tweedr also connects to a number of remote resources when running live; see [[Environment]] for instructions on setting those up.

4. Installing Tweedr

After installing crfsuite and liblbfgs, everything else should be installable via setuptools / distutils:

git clone https://github.com/dssg/tweedr.git
cd tweedr
python setup.py install

And then to download external data requirements:

python setup.py download_ext

The download_ext command will download external data, which currently includes the following packages / sources:

You may get an error, "IOError: cmu.arktweetnlp.RunTagger error", if you try to use some parts of Tweedr before installing this component.

5. Instantiating the database

While we are not currently able to release our data, you can easily recreate the structure of our database by running the following command:

tweedr-database create

This simply uses SQLAlchemy to un-reflect the database, by running metadata.create_all().

Running Tweedr

At this point, you should have tools like tweedr-ui and tweedr-pipeline on your PATH, and you can run each of those with the --help flag to view the usage messages.

See the API section of the wiki for a description of some of the fields that tweedr-pipeline adds.

Troubleshooting

If your installation is still missing packages, see the manually installing page of the wiki.

Team

Team

Contributing to the project

Want to get in touch? Found a bug? Open up a new issue or email us at [email protected].

License

Copyright © 2013 The University of Chicago. MIT Licensed.

tweedr's People

Contributors

chbrown avatar shirini721 avatar lejit avatar aronwc avatar jpvelez avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.