GithubHelp home page GithubHelp logo

arboj / arbogast-capstone Goto Github PK

View Code? Open in Web Editor NEW
2.0 1.0 0.0 328.33 MB

Project repo for the mapping natural disaster locations from social media project

Python 0.35% HTML 0.07% R 0.22% JavaScript 0.02% Jupyter Notebook 99.34% PureBasic 0.01% CSS 0.01%
social-media geoprocessing r-shiny leafletjs python3 mordecai snscrape

arbogast-capstone's Introduction

Mapping Natural Disaster Locations from Social Media Content

Overview

       This project explored a workflow and method to condition, classify, geolocate and map unstructured data from social media in order to discover clusters of locations mentioned in natural disaster social media posts. This method collected posts from Twitter leveraging the Snscrape python library, and processed and conditioned the text of the posts.

       A binary classification recurrent neural network was trained in TensorFlow with Keras using human labeled tweets aggregated by the CrisisBenchmark dataset created by the Crisis NLP project at the Qatar Computing Research Institute.

The text of the training data was vectorized using a pre-trained text embedding built from tweets using the Global Vectors for Word Representation (GloVE) methodology and available from the GloVe Project at Stanford University (Pennington, Socher, and Manning 2014). Hyperparameters for the model were tuned with the Hyperband algorithm, and the final model was evaluated using a 5-fold cross-validation. The model and text embedding built from the training data were used to classify tweets pulled from June and July 2021 for natural disaster informativeness.

       The resulting informative tweets were then parsed for location and georeferenced using the Mordecai python library. Finally, the informative posts with locations mentioned in the text were displayed in an interactive R shiny web application for end users to map location explore by filtering data by geography, date and time, and research interest; discover trending topics on their selected data in a word cloud and export data for use by analysts in further research and visualization.

Scrape, Clean, Predict and Geoparse

       Clone this repository to your local machine, there are several very large files such as the model as well as the tweets scraped and geo refereneced from June and July 2021, so pulling the first clone will take several minutes.

The

Helpful other links

Docker https://www.docker.com Docker was used it initiate a elastic search over a geonames index for the mordecai geo parser Elastic https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html Elastic was used to enable Mordecai to search the geonames index in a docker container

Jupiter Notebooks Used for testing and data wrangling

Exploration of Variables TextDataEDA.ipynb

Text Preprocessing TextConditioningandMachineLearning.ipynb

arbogast-capstone's People

Contributors

arboj avatar

Stargazers

 avatar  avatar

Watchers

 avatar

arbogast-capstone's Issues

Initial Run - Thoughts

Successfully ran OR query over tweets, negating the need for a list. Could also do compose to add "OR" after each term
I'm not sure if I get the same 500 results each time.
Due outs:

  • [ By COB 23 March
    Need to find a "disaster" corpus or glossary, pull out the terms and create long search string.]
    By COB 25 March]

  • [ Handle the JSON string of the parsing: Try two solutions - make everything a json then flatten OR just try to flatten on column.]

Map embiggening

Make the map more cromulent, and embiggen so it takes all of the area.

Shiny app - memory

Work on dealing with memory issues on shiny deployment.

  • use the loaded data more efficiently. Right now you have several nested queries, build a 1 function to call data
  • load only relevant data during read.csv

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.