GithubHelp home page GithubHelp logo

connectin's Introduction

ConnectIn project

About

ConnectIN is a CIRA CIP funded program that evaluates the internet connectivity of First Nation communities in western Canada, and led by representatives from the Manitoba First Nations Education Resource Centre, the First Nations Technology Council, and the First Nations Technical Services Advisory Group. Additional support is provided by Cybera and the University of Alberta.

For more information visit here.

Set up

Follow these instructions to set-up the data analysis tools for the project. It assumes that devices are already set-up, collecting data and saving it to MS SQL database.

Credentials

Copy creds.env.example to creds.env:

cp creds.env.example creds.env

Update creds.env with your MS SQL host ip address, database name, user and password:

MSSQL_HOST=
MSSQL_DATABASE=
MSSQL_USER=
MSSQL_PASSWORD=

In creds.env also update the following from default values:

INFLUXDB_ADMIN_PASSWORD INFLUXDB_READ_USER_PASSWORD DASH_PASSWORD

Docker

Run docker-compose up to create Docker containers that contain the following components:

  • Influxdb: a container for the timeseries database to store data locally. The database and two users will be created to write and read data.
  • Cronjobs: a container to run scripts every night to update influxdb with new data from MS SQL, and upload MS SQL tables as csv files into the "data" dir. (First scripts will run when the conatiner is built and then every day at 21:30 and 20:30 UTC).
  • Jupyter: a container to run the Jupyter notebook service that is accessible at http://localhost:8888/ with a token from the docker-compose output.
  • Dash: a container to run the Dash analytics dashboard that is accessible at http://localhost:8050/ and is updated with data within 5 mins after docker-compose has completed.

Notebooks

Jupyter notebook service is accessible at http://localhost:8888/ with a token (copy and paste) from the docker-compose output.

Interactive notebooks

Description of available data analysis notebooks:

  • Raw data, number of datapoints and monitoring intervals.ipynb - time series graphs of raw speedtest and iperf test data by device for the entire time period and trailing 6 months.
  • Aggregated data by year, month, day, hour.ipynb - graphs of aggregated speedtest and iperf test data (by month, year, hour, day of the week) by device for the entire time period and trailing 6 months.
  • Speedtest data by test server and service provider.ipynb - speedtest data analysis by test server and service provider by device for the entire time period and trailing 6 months.
  • Statistics and map.ipynb - summary statistics and map for speedtest and iperf data for the entire time period, trailing 6 months and last month by device and for all devices.

Original notebooks

Original analysis and data exploration in Jupyter notebooks as organized by stage of the project. A summary from each stage is included.

Location data

To be able to show devices on a map (on the dashboard and in some of the notebooks), the latitude and longitude coordinates of the devices should be saved in data_analysis/coordinates2.csv.

Use the example file coordinates2.csv.example as a template and make a copy and rename to coordinates2.csv. Then add the device geo-coordinates to the csv directly or use the notebook data_analysis/Interactive_notebooks/Coordinates helper.ipynb to add them interactively.

cp data_analysis/coordinates2.csv.example data_analysis/coordinates2.csv

Geo-coordinates used for the ConnectIn project are here (access is restricted).

Timezones

Common timezones for all the devices is set in config.json.

If some of the devices have different timezones - it can be specified by device number in data_analysis/timezone_by_device.csv. To do this, use the example file timezone_by_device.csv.example as a template and make a copy and rename to timezone_by_device.csv. Then add timezone data directly in the csv file.

cp data_analysis/timezone_by_device.csv.example data_analysis/timezone_by_device.csv

Dashboard

The analytics dashboard is accessible at http://localhost:8050/. It has basic authentication enabled. (Credentials are stored in creds.env file)

In order to use it - select metric (Upload/Download/Ping) and time interval. When you press "Get data" - data will be selected from the database and stored in browser cache. All plots in all tabs will then be populated with cached data. In order to get another metric and time interval from database - press the "Get data" button again.

Note: The dashboard works faster when it is run locally as opposed to hosting it on the web. When the dashbard is web hosted, cached data needs to be first transported over the network which slows it down.

InfluxDb structure

InfluxDB is a timeseries database that stores everything in measurements (similar to tables) using tags(metadata) and fields(values).

The InfluxDB scheme used for the project is stored in config.json.

There are separate measurements (tables) for Ping, Upload and Download data.

Both iperf and speedtest test results are stored in the same measurement (table) with different metadata.

Metadata

The following metadata is stored for every measurement :

  • Provider - ISP (Internet Service Provider) for speedtest tests, "iperf" for iperf tests (from FCT_SPEEDTEST MS SQL table),
  • IP - IP address of the device (from FCT_SPEEDTEST MS SQL table),
  • Test Server - name of the test server (from FCT_SPEEDTEST MS SQL table),
  • Province - province for speedtest tests,"iperf" for iperf tests, (from FCT_SPEEDTEST MS SQL table)
  • SK_PI - device number (from FCT_SPEEDTEST MS SQL table),
  • PI_MAC - device mac address(from DIM_PI MS SQL table)

Collectd data

Another set of tests stored in the MS SQL database are metrics coming from the collectd daemon.
These metrics are collected every 5 seconds and stored in MS SQL table FCT_PI.

The following metrics are collected:
'CONNTRACK', 'CONNTRACK_MAX', 'CONNTRACK_PERCENT_USED', 'ETH1_IF_DROPPED_RX', 'ETH1_IF_DROPPED_TX', 'ETH1_IF_ERRORS_RX', 'ETH1_IF_ERRORS_TX', 'ETH1_IF_OCTETS_RX', 'ETH1_IF_OCTETS_TX', 'ETH1_IF_PACKETS_RX', 'ETH1_IF_PACKETS_TX', 'ETH2_IF_DROPPED_RX', 'ETH2_IF_DROPPED_TX', 'ETH2_IF_ERRORS_RX', 'ETH2_IF_ERRORS_TX', 'ETH2_IF_OCTETS_RX', 'ETH2_IF_OCTETS_TX', 'ETH2_IF_PACKETS_RX', 'ETH2_IF_PACKETS_TX', 'ETH3_IF_DROPPED_RX', 'ETH3_IF_DROPPED_TX', 'ETH3_IF_ERRORS_RX', 'ETH3_IF_ERRORS_TX', 'ETH3_IF_OCTETS_RX', 'ETH3_IF_OCTETS_TX', 'ETH3_IF_PACKETS_RX', 'ETH3_IF_PACKETS_TX', 'PING_DROPRATE', 'PING_STDDEV', 'PING'.

These metrics were not used in the analysis.

If you want to use them, please replace config.json with config_full.json and recreate the Docker containers. It will import 3 additional metrics from MS SQL into InfluxDb: ping latency (PING), ping droprate (PING_DROPRATE) and number of connections (CONNTRACK). These are not included on the dashboard but some of the original Jupyter notebooks analyze these metrics.

connectin's People

Contributors

tatianameleshko avatar davidichan avatar byrcyb avatar

Watchers

Joe Topjian avatar Cameron Mann avatar Barton Satchwill avatar James Cloos avatar Preethi Kumar avatar  avatar  avatar Jerric avatar  avatar Anthony Killick avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.