RFC: Version 2 of API

Background

The COVID Tracking Project was founded in the early days of the COVID pandemic arriving in the US, and provided an API from day one. This API receives millions of requests per day, and is used by large and small organizations to inform their users. Our API expands our reach and mission by providing consistently high-quality data to others.

Since March, the data we collect has undergone several big changes. We have twice as many data fields. Definitions of data that seemed solid in March have changed considerably. Some data that was one field now needs more context, or is different from state-to-state.

Unfortunately, so many apps use our API that changing field names breaks things for our clients. We also support two formats of data: CSV and JSON, which means we can’t have nested or structured data if we want to keep the two formats in parity. We serve data endpoints like states/daily.json that are over 6MB in size, but cannot add pagination because CSV users would miss out on that data.

We get many feature requests like providing data as a percentage of population, or adding calculations like 7-day-rolling-average. While we have built internal tools to do this within our website, we are afraid of adding just more fields that may or may not need to be changed as means of analyzing the pandemic change, or our understanding of our own data improves.

API Proposal

Our proposal is to create a new, versioned API for our COVID data that improves time-to-release of data, prevents changes from breaking well-built applications, and gives space for things like computed fields.

The new API will be served from api.covidtracking.com/, while the old API at covidtracking.com/api/v1 will still be maintained and updated daily. On October 1, the V1 of the API will no longer receive updates, and will remain online until January 1, 2021.

Delivery

Remove CSV files from the API, provide CSV downloads

CSV files are necessary tools for researchers and the public, but they are the biggest source of issues filed about formatting problems. No modern API service delivers data in CSV format because it is a format for bulk migration of data, not real-time application messaging.

Instead, the covidtracking.com website will build CSV files for users to download from the various sections of our site. Researchers and other users will be able to use these generated CSV files to download the latest data, but these files will not have fields like computed values. We will make a best effort attempt to keep these files in line with the latest changes in the API.

BigQuery

We have been using BigQuery as a generalized datastore for non-core data, and have a public datastore of our own COVID data. Let’s add all our API data into a public BigQuery dataset that anyone can query against.

Schema

Our JSON data is currently a long JSON array of data with no structure or context. We propose standardizing all API responses based on JSONAPI:

{
   "links":{
      "self":"https://api.covidtracking.com/state/ca"
   },
   "meta":{
      "build_time":"2020-07-05T14:00:00Z",
      "data_definitions":"https://covidtracking.com/definitions/state",
      "license":"https://covidtracking.com/license",
      "version":2.1
   },
   "data":[

   ]
}

We would follow the following standards for naming and formats:

All names are in snake_case
All fields with dates or times are in full ISO format in UTC time zone

Every endpoint would provide the last time the API data was updated, a link to license and data definitions, and the API version.

Field definitions

All endpoints will include field definitions in the meta object. This will allow us to rename and flag fields for deprecation. Fields will include a formerly array that indicates what the field used to be named, and can be used as a fallback for applications in case a field changes its name.

Fields have an optional “unit” designation that indicates whether the field represents people or samples.

{
   "meta":{
      "field_definitions":[
         {
            "field":"cases.cases.current",
            "deprecated":false,
            "unit":"people",
            "formerly":[
               "positive",
               "positiveCurrent"
            ]
         }
      ]
   },
   "data":[

   ]
}

Row metadata

Each data element will have its own meta object that defines things like edit notes and last-update times:

{
   "data":[
      {
         "state":"CA",
         "date":"2020-04-05T00:00:00Z",
         "meta":{
            "last_update":"2020-04-06T05:00:00Z"
         }
      }
   ]
}

Data fields

All endpoints will have a data array of objects. Each object can be nested to group like data elements together. Each data element will have a computed object that includes 7-day averages and computed values as a percentage of the population.

Data elements will be nested as [category].[field].values

{
   "data":[
      {
         "state":"CA",
         "date":"2020-04-05T00:00:00Z",
         "cases":{
            "cases":{
               "current":{
                  "value":400,
                  "computed":{
                     "average_7_day":380,
                     "population_percent":0.06
                  }
               },
               "cumulative":{
                  "value":5000,
                  "computed":{
                     "population_percent":0.1
                  }
               }
            }
         },
         "tests":{
            "negative":{
               "current":{
                  "value":4500,
                  "computed":{
                     "average_7_day":4000,
                     "population_percent":0.06
                  },
                  "cumulative":{
                     "value":50000,
                     "computed":{
                        "population_percent":2.4
                     }
                  }
               },
               "pending":{
                  "current":{
                     "value":4500,
                     "computed":{
                        "average_7_day":4000,
                        "population_percent":0.06
                     },
                     "cumulative":{
                        "value":50000,
                        "computed":{
                           "population_percent":2.4
                        }
                     }
                  }
               }
            }
         }
      }
   ]
}

Field Disambiguation

Some fields, such as a simple total test results, are impossible to treat globally across all states. In this case, we will not provide a value for these fields, and instead give an object representing the most complete time series (since March 2020), and the most accurate time series (where we have data over 120 days):

[
  {
    "state": "CA",
    "date": "2020-09-01",
    "tests": {
      ...
      "total_test_results": {
        "complete_field": "tests.positive_negative",
        "preferred_field": "tests.viral.total"
      }
      ...
    }
  }
]

Option to disable computed values

Users who just want raw values can request endpoints that return simpler values instead by appending /simple to the URL:

[
   {
      "state":"CA",
      "date":"2020-04-05T00:00:00Z",
      "cases":{
         "cases": {"current":400,
         "cumulative":5000
}
      },
      "tests":{
         "negative":{
            "current":4500,
            "cumulative":50000
         },
         "pending":{
            "current":4500,
            "cumulative":50000
         }
      }
   }
]

Add state metadata to all endpoints

Users are making multiple API calls for state metadata and daily or current information. Instead, we can provide a single state endpoint that includes all state information, and then append the state metadata for each state to the beginning of all state API calls.

In addition, we will add unique slug metadata fields to all states and state endpoints.

Data cleanup

Fields currently marked as Deprecated in the V1 API will not be brought over to V2.

Endpoints

The new API will have the following endpoints (all prefixed by /v2/):

/changes - A running changelog of additions and changes to the API
/status - Information about the last build time and API health
/fields - A list of all fields, their definitions, and long-names
/states - A list of all states and their state metadata, same as our current State Metadata.
/states/history - A list of all historic records for all states
/state/[state-code] - All the state’s metadata, and their most recent data record
/state/[state-code]/history - All the state’s metadata, and a list of all historic records for that state
/us - The most recent record for the US
/us/history - All the US history

We will no longer use .json at the end of endpoint URLs.

Change control & community outreach

Changes to endpoints and API will be communicated through a dedicated Headway page and Twitter account. We will handle changes in fields or field definitions in a consistent manner:

New fields - Released and announced as soon as possible
Changes to field names - The field definitions will be updated immediately, and a new name of the field will be added. The old name of the field will remain and both will exist in parallel. Two weeks after launching the new name, the old name will no longer get updates, and three weeks after launching, the old field will be removed.
Removal of fields - If a field is no longer needed, it will be announced and not receive any further updates, zeroed out after two weeks, and removed after three weeks.

covid19tracking / covid-public-api-build-v2 Goto Github PK

covid-public-api-build-v2's People

Contributors

Stargazers

Watchers

covid-public-api-build-v2's Issues