GithubHelp home page GithubHelp logo

jebyrnes / data-digging Goto Github PK

View Code? Open in Web Editor NEW

This project forked from zooniverse/data-digging

0.0 0.0 0.0 133.27 MB

Scripts and such for data management, analysis, visualization, etc.

License: MIT License

Python 42.29% Jupyter Notebook 36.71% HTML 14.77% R 6.23%

data-digging's Introduction

Data-digging

This repository contains scripts and documentation related to analyzing classification data from Zooniverse projects. Most content is tailored to Panoptes-based Project Builder projects, but there is also some legacy Ouroboros-based code.

docs: Column descriptions for Panoptes export CSV files.

example_scripts: The example_scripts directory holds top-level example scripts (which are generally applicable to any project) and project-specific subdirectories, each with scripts and data files. These scripts convert classification data export CSV into more useful formats and data products. In most cases, these scripts extract information from the compact JSON-formatted “annotations” column data into an easier flat CSV file.

development: Sandbox directory for code development.

Project & Script Descriptions

Below we describe the analysis components implemented in each processing script. Feel free to pick-and-choose features described below when writing new scripts for your own project.

Some issues that all or most of these scripts address:

  • extracting classification marks/answers from within the JSON fields of the CSV classification data exports
  • cleaning the classification export files:
    • removing duplicate classifications (if they occur)
    • dealing with empty classifications (some projects throw them out, others count them as "nothing here" votes)
    • only including classifications from the most up-to-date workflow version(s)

For R code that addresses these issues, please see www.github.com/aliburchard/DataProcessing.

Marking star cluster locations in Hubble Space Telescope images.

Script -- Creates CSV of circular marker info from simple marking workflow.

Marker type -- circle

Watch videos of bats flying around their roost and tag the behaviors that you see.

Scripts -- to 1) turn original videos into smaller duration videos and populate a manifest and 2) upload subjects with manifest to Panoptes found in this repo.

The decoding the civil war project invites volunteers to transcribe contemporary, hand-written transcripts of telegrams sent between allies during the American Civil War. Portions of these transcripts are enciphered using whole-word substitutions. The ultimate goal of the project is to allow volunteers to identify these substituted words based on their contextual appropriateness.

The bespoke consensus and aggregation code written for this project is archived and documented in a separate repository.

Marker type -- line, text input attached to mark

An exoplanet-finding project run as part of Stargazing Live.

Scripts -- Aggregate simple question task (with weighting). Save outputs to Google Drive folder for easy data sharing. This script is adapted from the Pulsar Hunters aggregation script described below; it may be more generally applicable because it doesn't need a bunch of additional files with gold-standard data etc.

Marker Type -- question task

A beta project to examine HI structures in the Milky Way.

Scripts -- Extracts markings from classification file into individual files (ready for clustering).

Marker type -- line, point, ellipse, text input attached to mark

A survey project run by Cleveland Metroparks.

Scripts -- Adapts the survey aggregation script initially developed and tested for Wildwatch Kenya (described below)

Marker type -- Survey

Answering questions about the presence of bar structures and marking bar dimensions.

Scripts -- Analyzes joint question+marking workflow (but mostly the markings).

Marker type -- line

A transcription project for museum collections. The label reconciliation scripts are maintained in a separate repository.

Extracting markings of damage and other features from post-disaster satellite imagery.

Script -- puts classification information together with geocoordinate information from subject exports.

Marker type -- point, polygon (though these aren't reduced here)

Marking interesting objects (including moving objects) in images from the WISE satellite.

Script -- Creates CSV of point marker info from simple marking workflow.

Marker type -- point

Classification of radio observations to identify pulsar candidates.

Scripts -- Analyzes responses and aggregates object type answer, also script for counting classifications. IP address tracking was wonky during this project, so unique non-logged-in users were identified with browser session info instead.

Marker type -- no markers, only 1 question task

Workflow #1: Yes/No if sea lions are present.

Scripts -- 1) Extracts normal csv from embedded JSON. 2) Aggregates results.

Marker type -- no marks, only question tasks

A survey of species from camera trap data in Kenya.

Scripts -- Jailbreak survey annotations into a format more easily digestible by external scripts (1 line per species ID or "nothing here" classification), aggregate jailbroken annotations into a flattened CSV file with one line per subject. Also uses general utility scripts.

Marker type -- Survey

Older Scripts (Ouroboros-based)

Galaxy Zoo

Misc

Includes scripts that generate progress reports for Ouroboros-based GZ project, and decision tree processing

Talk

Scripts that compute statistics and analyzes Talk data for Ouroboros-based GZ project.

Reduction

Fairly general scripts to process Galaxy Zoo classification database dumps into vote fractions for each subject and match with subject metadata. Note that this does not (yet) include debiasing.

data-digging's People

Contributors

aliburchard avatar bamford avatar camallen avatar ckrawczyk avatar eatyourgreens avatar fiona-jones avatar hughdickinson avatar lcjohnso avatar mcbouslog avatar mkosmala avatar philrosenfield avatar pmasonff avatar shaunanoordin avatar trouille avatar vrooje avatar willettk avatar zambonee avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.