GithubHelp home page GithubHelp logo

kdm9 / ooni-pipeline Goto Github PK

View Code? Open in Web Editor NEW

This project forked from openobservatory/ooni-pipeline.backup

0.0 1.0 0.0 562 KB

This is the ooni pipeline for doing data analytics on the collected reports

Python 100.00%

ooni-pipeline's Introduction

OONI Pipeline

This repository contains all the tasks needed to implement the OONI data analytics pipeline.

OONI Pipeline

Usage

To run the pipeline you should set the following environment variables:

export OONI_RAW_DIR=/data/pipeline/raw/
export OONI_SANITISED_DIR=/data/pipeline/sanitised/
export OONI_PUBLIC_DIR=/data/pipeline/public/
export OONI_ARCHIVE_DIR=/data/pipeline/archive/

export OONI_REMOTE_SERVERS_FILE=/data/pipeline/remote_servers.txt

export OONI_BRIDGE_DB_FILE=/data/pipeline/bridge_db.json

export OONI_DB_IP=127.0.0.1
export OONI_DB_PORT=27017

Then you can run the tool with:

./bin/oonipipeline sync

To check if data should be copied from the remote probes into the RAW directory.

./bin/oonipipeline sanitise

To move data from the RAW state into the SANITISED state.

./bin/oonipipeline import

To import the data into the database and publish it to PUBLIC_DIR

./bin/oonipipeline export

To export the data in the JSON format for the visualisation team.

How it works

The data pipeline is comprised of 3 steps (or states, depending on how you want to look at it). When the data is submitted to a OONI collector it is synchronized with the aggregator. This is a central machine responsible for running all the data processing tasks, storing the collected data in a database and hosting a public interface to the sanitised reports. Since all the steps are independent from one another it is not necessary that they run on the machine, but it may also be more distributed.

Once the data is on the aggregator machine it is said to be in the RAW state. The sanitise task is then run on the RAW data to remove sensitive information and strip out some superfluous information. A RAW copy of every report is also stored in a private compressed archive for future reference. Once the data is sanitised it is said to tbe in SANITISED state. At this point a import task is run on the data to place it inside of a database. The SANITISED reports are then place in a directory that is publicly exposed to the internet to allow people to download also a copy of the YAML reports.

At this point is is possible to run any export task that performs queries on the database and produces as output some documents to be used in the data visualizations (think JSON, CSV, etc.).

ooni-pipeline's People

Contributors

hellais avatar oripka avatar kudrom avatar aagbsn avatar asn-d6 avatar

Watchers

Dr. K. D. Murray avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.