GithubHelp home page GithubHelp logo

covidtoday / backend Goto Github PK

View Code? Open in Web Editor NEW
13.0 2.0 13.0 23.59 GB

Code for statistical methods which estimates outbreak indicators at Covid Today.

Home Page: https://www.covidtoday.in/

License: GNU General Public License v3.0

Jupyter Notebook 97.27% R 0.13% Python 0.78% Batchfile 0.01% HTML 0.07% Ruby 0.01% CSS 1.01% Shell 0.01% SCSS 0.73%
covid-19 india outbreak indicators reproduction positivity tests

backend's Introduction



Covid Today | www.covidtoday.in

Tracking India's progress through the coronavirus epidemic.

Unified_Data_Update




Latest datasets and API: https://covidtoday.github.io/backend/
Backend repo: https://github.com/CovidToday/backend
Frontend repo: https://github.com/CovidToday/FrontEnd
How to start contributing: CONTRIBUTING.md



As I write this, there is an eerie feeling that tells me that there are enough dashboards tracking COVID-19 in India. But none, like this one. Read me and join in if you find this worth something.


The Concept

It is beyond doubt that reckless relaxation of public health measures and social distancing can have a devastating effect in the form of a rapid rise in COVID-19 cases and deaths. Relaxation of control measures and return to normal social life must happen in a phased manner, based on continous tracking of public health metrics. (WHO, CDC) These metrics tell us if our hospitals are equipped enough to handle the patient load, whether our testing system is adequate to catch cases, and whether the rate of spread is under control so that public health measures can be optimally adjusted, both spatially and temporally.

Since no one is tracking these important public health indicators and making them available publicly, we are.

We intend to do this using every bit of publicly available data on COVID-19 in India. We determine public health indicators/metrics across three domains.

  1. Transmission: Measures how fast the virus is spreading eg: Reproduction Number, Doubling Time, Daily cases...more
  2. Testing: Measures if we are testing enough (Remember, there is no other data without testing) eg: Daily tests, Test positivity rate, Tests per million, Contact tracing metrics...more
  3. Healthcare System: Measures if our hospitals are ready and how deaths are being prevented eg: Case Fatality Rate (various kinds), Number and occupancy of hospital beds, ICU beds, ventilators, Number of quarantined...

Objectives

  1. Make a scientifically sound framework which can track the epidemic's progress across multiple relevant domains; based on latest scientific evidence, national advisories and WHO guidance.
  2. Choose appropriate indicators for each domain- Transmission (scale and speed of spread), Testing (testing ramp-up and is it enough), Healthcare system (capacity and outcomes of healthcare).
  3. Integrate required raw data from multiple sources (already existing datasets + de novo data collection).
  4. Analyse the raw data through statistically and epidemiologically robust methods to get the desired indicators.
  5. Visualise these indicators in an intuitive and easy-to-understand way on the website, so that people without in-depth knowledge of the topic can understand what we present.
  6. Provide periodical worded analyses of what the numbers show and what that means. We have started that on Twitter (@icart_india), but all ideas are welcome.
  7. Make the indicator datasets publicly available for use by citizens, epidemiologists, researchers, analysts and journalists.

Open Sourced. Covid Today needs collaborators!

This project also follows an 'open vision', which means although we are committed to the original objectives of the website, we are open to new ideas on how the website/platform can be expanded to better fight the pandemic.

Some areas where you can contribute

Anyone! Help us gather essential data from the ground-up. Contribute if you can collect any of the following data from your state source: Number of COVID Care Centres, Dedicated COVID Health Centres and Dedicated COVID Hospitals (hospital beds, ICU beds, ventilators- total and occupied), Number of quarantined,

Public health experts, Epidemiologists, Journalists, Medicos! Help us write short periodical analyses of the numbers we present, exploring and breaking down the trends that lie within and what they mean for the COVID-19 response. Curate insightful analyses on our twitter handle or write on your own platform.

Software engineers, Data scientists! Build and fine-tune the technological backbone. The backend is open sourced on github. Find new issues or have your pick of the existing ones, and help build the code that crunches the numbers and oils the pipeline. We are planning on open sourcing the UI too, get in touch if you want to pitch in.

Public health experts, Data analysts, Epidemiologists, ML experts, all others! Build on the concept. Innovate with us on what indicators to show next, how to better present the data for more insightful conclusions, and how to expand the platform to make it more resourceful. We are also exploring the data we see through ML.

Not on this list? Tell us how you can contribute.

How do I start contributing?

Collaborate on the tech work on Github- Work on an existing issue or create a new issue in the repository.
Do read CONTRIBUTING.md to know how to collaborate on Github.
Backend repo- the code which imports, cleans and analyses the data to output various indicators.
UI repo- the code for the website interface and data visualisations

If you want to collaborate on other stuff- Fill a short form here (You can report a bug, suggest an improvement, ask a question, or join as a collaborator). You can also get in touch at [email protected]

Please follow the Code of Conduct this project abides to: CODE_OF_CONDUCT.md.

Raw data sources

Raw data for cases and tests- www.covid19india.org (A brilliant crowdsourced platform which gathers this data from official state bulletins and dashboards)
Data for mobility index- www.google.com/covid19/mobility
Population data- www.uidai.gov.in/images/state-wise-aadhaar-saturation.pdf

About how we calculate the indicators

Visit the Methods page on www.covidtoday.in

Limitations of the method are listed under each indicator. If you have an idea to improve upon them, start an issue in this repo.

Project Admins

Mohak (AIIMS Delhi), Rishi (IISER Pune), Pratik (IIIT Hyd; Microsoft), Aditya (VIT Vellore; Barclays), Siddharth (IIIT Gwalior), Apurva (BFCET Bathinda), Abhinav.

initiative by iCART

India COVID Apex Research Team (iCART) is a volunteer research and development group which comprises professionals and students from multiple fields. We are always open to collaboration with any individual or organisation that shares our interests and vision- A Science Driven Pandemic Response. We started as a small group from AIIMS Delhi, and have since grown into a multi-disciplinary team of doctors, biomedical researchers, epidemiologists, students, tech developers and data scientists with the primary focus to act as a catalyst for a science driven response to the COVID-19 pandemic.
Our team is engaged in clinical and epidemiological research at some of the best hospitals in the country. In addition, we have developed a comprehensive digital COVID-19 platform spanning across communities, hospitals and laboratories, which is under pilot-testing.

You may follow us on Twitter @icart_india where we try to engage in meaningful discussions regarding the COVID-19 epidemic with fellow citizens, experts and journalists.

inspired by

Multiple projects around the globe that have used data and technology in the war against COVID-19. Some of them: www.covidexitstrategy.org, www.covidtracking.com, www.covid19india.org, www.rt.live, www.covid19-projections.com

backend's People

Contributors

dependabot[bot] avatar divyanshsinghvi avatar kzuri avatar mgmd-96 avatar neurorishika avatar pateldevarsh21 avatar pratikmandlecha avatar siddharthjain1611 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

backend's Issues

Yellow dash indicating no change- in table

Currently, changes in data vs 1 week ago is shown by red and green arrows. Sometimes, data may not change significantly and warrants a separate icon for that. A yellow dash icon may be suitable.
A cutoff for % change has to be decided. I think +-10% should be fine.

So, if a data point is within +-10% of the value 1 week ago, it would show a yellow dash alongside the data instead of a red or green arrow.

Mobility Transmission correlation

Find and quantify if there is any correlation between transmission (Rt/Growth Rate/other indicators) and mobility (individual/composite indices).

Possible tryouts:
assigning different weights to various mobility domains
a lag between mobility and transmission(except Rt) changes
trying data from apple mobility as well

Add hotspot districts

The following districts may be a good start, and we can expand to more districts later:
MH- Mumbai, Thane, Pune, Palghar, Aurangabad
TN- Chennai, Chengalpattu, Madurai
TG- Hyderabad
KA- Bengaluru Urban
GJ- Ahmedabad, Surat
HR- Gurugram
WB- Kolkata

Source api: https://api.covid19india.org/v3/data-all.json

Have to input populations of these districts to calculate population-adjusted indicators. I have compiled the same and put in the repo as Population_districts.csv

Will run the existing estimation scripts for these districts. Where data is available, indicators will be calculated.

Output district json and csv to be sub-structured into current state level json and csv. [OR] Output json and csv for districts to be separate from state json and csv.

Limitations

  • Testing data is not avail for most districts in source API. Can look for district bulletins independently to add that data.
  • Google mobility is not avail for district level.

Statewise lockdown relaxation dates

Currently the plots show vertical lines corresponding to national lockdown relaxation dates. As such, some states may have different relaxation dates due to differing local epidemic situations, which will affect the accuracy of the temporal relation between indicators and lockdown events.

  1. Volunteers who can provide the dates of lockdown relaxation events in their respective states.
  2. Dates of lockdown relaxation would then become variables for each state rather than nationally uniform as of now. This has to be reflected in statewide dataset (backend), and the website plots need to pick these dates from the json to show the state's vertical lines (frontend).

Clean and integrate Doubling time code and outputs

v1: Doubling time code (dev by @TechnoSAP) and output (dbt.json) added to reproduction-number-rt folder in repo.

For doubling time script

  • Change source API to the new one (data-all.json). This script uses cumulative confirmed cases timeseries. Input cumulative confirmed cases data from new source API in cumul_cases = in both India and States script inside doublingtime.ipynb
    (shift to new API necessary so that same code can also run on districts)
  • Implement global dictionary for dates
  • In the output json (dbt.json), some values of doubling time are Infinity or Nan. These output values to be replaced by empty value "" otherwise will interrupt plotting later.
  • Integrate this script into an existing script. (posrate) (@siddharthjain1611 will decide). Also output into existing json (posrate).
  • Add code to output dbt into csv as well (posrate csv).

For Rt script

  • Change source API to the new one (data-all.json). This script uses daily confirmed cases timeseries. (shift to new API necessary so that same code can also run on districts)

CSV for chloropleth maps

Create code that generates this csv file along with all other files. This code will go into the unified code only.

The generated csv will be linked to various choropleth data viz maps.

Properties of csv: https://github.com/CovidToday/backend/blob/master/Datawrapper%20Chloropleth%20Rt.csv
First row headers.
First column has to be states names (NOT state codes).
Rest columns need to MOST RECENT DATA for that metric.

Various metrics that go in separate columns:

  • Rt
  • daily cases per million
  • daily test posrate moving avg
  • daily tests per million
  • CFR

Doubling Time

TO CALCULATE TIMEVARYING DOUBLING TIME (NATIONAL AND STATEWISE)

Basic method: INCIDENCE DATA→ FIT REGRESSION MODEL TO ESTIMATE GROWTH RATE (r ) → USING r, CALCULATE DOUBLING TIME

Which regression model
Where to put 7 day windows (incidence data/growth rate/doubling time) -- I think growth rate


As another metric of outbreak progression, we estimated the rate of spread (r) using a quasipoisson regression model . The R^2 value of the regression fit was then used to assess the goodness-of-fit. In order to account for potential changes in the rate of spread over the course of the outbreak we used a 7-day sliding window to produce time-varying estimates of the rate of spread and the corresponding R^2. The doubling time was then estimated by calculating ln(2/r) for each estimate of the rate of spread.
[26] https://doi.org/https://doi.org/10.1016/j.epidem.2018.12.002


We have calculated the daily average doubling time by the length of a time period divided by the natural log (ln) of the relative growth in the numbers of reported cases during the same period. (Ref)
Average doubling time=(t1-t0)×ln(N0/N1)×ln(2),
where N1 and N0 are the number of cases at times t1 and t0, respectively where t1 and t0 represent consecutive days and hence the numerator is always 1. The units correspond to those used to measure the interval length t1-t0.

https://www.databrew.cc/double# -- code with doubling time calculation

Filter data option in table

In addition to current sorting feature, filter option is highly helpful when analysing data. eg- if I only want to compare states with say more than 1000 cases, I could do that.

Add indicator domain- Healthcare system

Proposed indicators under domain:
DEMAND:
•Active cases change
•Active cases/1000

SUPPLY: some overlapping data points may be there due to inconsistency in data availability from each state.
•Number of COVID facilities: COVID Care Hospitals, Dedicated COVID Health Centres, COVID Care centres
•Number of COVID beds - total, occupied
•Number of ICU beds - total, occupied
•Number of ventilators - total, occupied

OUTCOME:
•CFR - crude, outcome based, lag adjusted

OTHER RESPONSE INDICATORS

  • Number of people contact traced
  • Number o people quarantined

  1. Volunteers to gather data from each state
  2. OCR from bulletins
  3. Find already existing data in datasets such as covid19india.org initiative

Feature Changes- Metrics to be added

Following metrics need to be calculated from source data, and printed onto output json and csv.
(code to be added in PosRateNew.ipynb)

  • daily cases per million

  • daily tests per million

  • daily tests moving avg (7 day)

  • daily deaths moving avg (7 day)

Rationale:
daily cases per million: also called incidence per million tells us the rate of epidemic growth for that states size and population.
daily tests per million: is more important rather than the currently used cumulative tests per million, since cumulative TPM will always increase and is thus limited in utility.
daily tests moving avg (7 day) and daily deaths moving avg (7 day): moving avgs smooth out the weekly fluctuations which are artifacts due to the reporting system.

Make available in local languages

For a start, we could give a Google Translate option (tried in Tamil, worked pretty fine) to expand reach.

Later, we can work on formalising through more accurate crowdsourced translation.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.