cmu-delphi / delphi-epidata Goto Github PK

View Code? Open in Web Editor NEW

100.0 100.0 68.0 15.31 MB

An open API for epidemiological data.

Home Page: https://cmu-delphi.github.io/delphi-epidata/

License: MIT License

R 7.74% Python 88.73% HTML 0.40% Dockerfile 0.27% JavaScript 1.91% Shell 0.24% Makefile 0.71%

delphi-epidata's Issues

Consider adding COVIDcast support for HHS regions

Currently, data ingestion accepts the following geographical types:

county
hrr
msa
dma
state

We currently have an indicator waiting in the wings whose ideal geographical type is HHS region. To begin serving it from the API, ingestion needs to know how to validate HHS region ids.

python client integration test is redundant with server tests

the python client integration test consists of hitting the covidcast endpoints. with #67 , the covidcast integration tests will use the python client to hit the server. these are equivalent, and redundant.

it might make more sense for the client integration test to validate non-source-specific behavior (e.g. handling of lists and ranges, etc) instead of particular endpoints.

Consider basing COVIDcast metadata min/max/stdev based on specific date range

One issue raised repeatedly about the COVIDcast map is that the color scale is based on the minimum, maximum, and standard deviation of the entire signal history. If a signal has changed significantly over time, the map can hence be poorly scaled.

Figuring out how to adjust the scale dynamically when the user switches days is one problem; but before it can do that, the covidcast_meta endpoint would have to provide metadata about a specific date or date range. For example, if we said the scale is based on the variation over the past month, we'd need to be able to request the scale for 2020-04-20 and get the variation over the preceding month.

Unfortunately this would break the caching strategy we currently use, so it also remains to be seen whether there is an efficient way to do this. I wonder whether the strategy might end up being "just stick Varnish in front of the API server" instead of a clever caching system, but I don't know what will prove best.

What does value (statistic) indicate?.

Hello,
This is a question about understanding the data presented for covidcast endpoint. I'm a newbie to this api and trying to understand the data.

I'm trying to read daily data of us counties (fb-survey) and the results I'm getting from the api are value and sample_size. There's not much information available about what does 'value' indicate.

The website https://covidcast.cmu.edu/ shows us map with percentages. I did not understand what percentage are we looking at and how to get percentage from value?

how to pull unrevised ILI rates at the HHI or state level?

Hello there!

Thanks for this amazing package. I would like to know if I can use delphi-epidate to retrieve the original, unrevised ILI rates estimates available during a given week (at the HHI or State level).

That is, the equivalent of https://www.cdc.gov/flu/weekly/weeklyarchives2013-2014/data/senAllregt08.htm but at the HHI or state level (instead of at the national level).

Is this information available through this package?
Thanks!

Increase the speed optimizations for computing direction on covidcast signals

One of the recent direction runs took 90 minutes to complete, which seems excessive.

I suspect we are doing something that was easy to implement but not terribly scalable, like updating the direction for all dates and all geo_ids when only a small number of them were invalidated, and perhaps also using a separate query for each time series with updates (there were 72546 of those in the 90-minute run).

If it turns out the database code isn’t the issue, we might have to think about moving the direction computation outside of epidata and into the individual indicators.

consider switching from PHP to Python

e.g. using https://fastapi.tiangolo.com/ which is a nice layer, ready for production and good integrations as well as testing capabilities.

consider adding a "flat" return format

atm. each result is wrapped in a {result, epidata, message} construct. It would simplify things if there is a mode in which the array is directly returned as a flat list.

result and messages could be transported using regular http codes (e.g., https://httpstatuses.com/401 not authorized), or custom http response headers (e..g, has_more flag).

Compress COVIDcast API responses

Especially since county-level responses are super large.

Is this API here the same that supports FB visualization?

Hi there!

My name is Bin and I work for Verily Life Sciences. I graduated from CMU (SCS 2013) so I'm very glad to see the news today that Delphi and FB are working together on the self reported symptom survey map:
https://covid-survey.dataforgood.fb.com/

I have then played with the API listed here. My main question: Is this API the same as that supports the FB map? For example, this API only gives me a subset of the zipcode:
https://delphi.cmu.edu/epidata/api.php?source=covidcast&data_source=fb-survey&signal=cli&time_type=day&geo_type=county&time_values=20200406-20200410&geo_value=*

As an example, the result does not contain Denver area (80xxx).

Thanks in advance for your help!

Feature Request: Support streaming rather than truncate results

problem: all the db queries resulting from an api request include a LIMIT that effectively truncates the results returned when over a fixed limit.

impact: as the data and queries for that data grow, this may mean that customers won't always get all the data they expect, and have no way to get what they missed. this is exacerbated by the lack of guarantees around the order of data they get, meaning they might have holes in the data without realizing it that might impact the validity of how they use the data.

proposal: add support for pagination to the api. example reference: https://www.allphptricks.com/create-simple-pagination-using-php-and-mysqli/

Make more user-friendly COVIDcast API clients

The current API clients, in src/client/, are pretty generic and apply to the whole Epidata API. It would be great to build on these to have more fully-featured COVIDcast clients. Specifically they should include:

a feature like get_daily_data_df: return an R or Pandas data frame for a specific signal
utility functions to get a whole list of signals for specific regions (automatically splitting into multiple API calls underneath)
handling of return code 2, "too many results", perhaps by automatically splitting queries over a large date range into separate queries, so users don't worry about getting incomplete data
proper docstrings or rOxygen-style documentation
some example code

We should aim for R and Python, since those will be the most common use cases.

Consider accepting "format=csv"

The API is very inconvenient for one of our users because they don’t use R or Python and they’re literally running API queries manually, then running the JSONs they find through online converters to get CSVs.

For now, we can put up a python server somewhere and have it do the transformation as a middleman, to make their workflow a little less precarious.

Long-term we should consider supporting CSV formatted output directly. What might make it tricky is the tight integration with the rest of Epidata, because this is how api.php currently ends:

// send the response as a json object
header('Content-Type: application/json');
echo json_encode($data);
?>

COVIDcast API should not hang during the acquisition cycle

This is a more serious problem with larger data imports, but even for small updates it can interact poorly with automated jobs, causing them to spuriously fail. It's not clear why this happens, and testing it outside of production may prove tricky.

It's particularly bothersome for us at the moment, because the COVIDcast indicator pipelines depend on API calls for validation; when an automation job halts due to this issue it often means losing work (though not losing data, to my knowledge), which then requires human attention to fix and re-run manually.

consider adding support for ISO date format: yyyy-mm-dd

it is is a common format for defining date data both for parameters and return values.

consider adding support for filtering covidcast metadata

filtering options:

which data_sources/id combinations to return
which fields to return.
which geo_types to return
time_type

would reduce the metadata size from around 98kB to 15kB for the current signals used in the website.

Include 'cached=true' in JS API request for covidcast_meta

delphi-epidata/src/client/delphi_epidata.js

Lines 473 to 477 in 8921bd0

 Epidata.covidcast_meta = function(callback) { 

 return _request(callback, { 

 'source': 'covidcast_meta' 

 }); 

 };

how to check if the data is correct?

Hello, I was looking at the official https://gis.cdc.gov/grasp/fluview/fluportaldashboard.html and my understanding is that only the final ILI values are reported.

Despite one's best efforts, sometimes there are mistakes or parsing errors. How can I check that the historical (those that are revised over time) ILI values are correct when using delphi-epidata?

I am thinking about running some manual checks for a few random dates.

Thanks again for this great API!

make client integration tests hit local docker web server

problem: the python client integration tests currently hit the prod api since the endpoint is hardcoded into the client.

impact: lack of isolation. the success of the tests depend on whether prod is up and working. also, the tests unnecessarily add test load and risk to prod.

proposal: make client integration tests hit the local docker web server instead of the prod api so as to provide some isolation and make things more self-contained.

Chrome cookie issue

in the latest chrome I get the following issue warning:

Info on the covidcast `consensus` signal

Hello, thanks for all your work here. I was wondering if there is information on the source and nature of the consensus data source and signals in the COVIDcast API. I haven't found it anything across any of the repos and websites.

UCSF syndromic surveillance data

UCSF runs COVID19 Citizen Science (https://covid19.eurekaplatform.org), which also collects daily syndrome data. It's a much smaller data source, but we appear to have more granular symptom data, and would like to consider contributing. I am one of the PIs (not a developer), so pardon my ignorance. Questions:

We could produce aggregated data on a daily basis that could be another data source, perhaps "ucsf-survey". How would we actually make that data available through your API and keep it refreshed?
We could provide signals including the cli symptom cluster you've already defined, but also signals for individual symptoms (e.g., loss of smell, red eyes, gi symptoms, etc) and other clusters (combinations of signals). Could we define whatever signals we want?
We would easily be able to provide spatial resolution at the level of zip code. Can we define that as a geo_type?
What would be the next step?

`cached` parameter for `covidcast_meta` in client libraries

The covidcast_meta endpoint takes ~10 seconds to respond, and this latency will increase as the covidcast collection grows. To enable visualization on the web, it's possible to fetch a cached version of the metadata by using an optional parameter cached, which takes only ~200 ms.

Compare:

Exposing the cached parameter in client libraries would be useful for developers using the programmatic interface to the API.

Some notes:

the cache is refreshed every 60 minutes and immediately after new covidcast data is uploaded; it should never be stale, unless e.g. the database is manually modified
the covidcast_meta endpoint will not return cached data that's older than 75 minutes; instead it will gracefully fallback to the live, non-cached data

Consider supplying both smoothed + raw values in each COVIDcast signal

Most COVIDcast signals come in two flavors: smoothed and raw.

doctor-visits: no, for datasource privacy reasons
fb-survey: yes; CLI, ILI, WCLI, WILI, HH community CLI, no-HH community CLI
google-survey: yes
ght: yes
quidel: yes; percent negative, tests per device
jhu: in testing for 1.4

Considering cmu-delphi/www-covidcast#TBD, it might be nice to explicitly combine raw+smoothed pairs of signals instead of publishing them separately, and provide them as an extra column in the API response (even if they are still stored separately in the database).

This would also help with cmu-delphi/covidcast-indicators#67 and cmu-delphi/covidcast-indicators#36, since it would permit us to display the raw signal in time series charts without interfering with map coloring. The raw time series does not display the misleading sawtooth pattern.

Allow encoding of non-numbers (e.g. missingness, deletions, ranges)

Since moving to a data versioning scheme, there is no longer any way to remove a row from COVIDcast without removing all previous versions of that row as well (so that it's as if it was never published at all). This is hazardous -- leaving the row in is inaccurate, and removing the row gives forecasters access to future-privileged information that will not match realtime usage.

We are developing a survey of different kinds of missingness and deletions that occur in the different COVIDcast sources here to help spec out an encoding system.

Some additional conversation on this is in a thread on the first set of performance fixes, but it looks like the column additions mentioned there didn't actually make it into staging this time around.

"Compute Missing/Stale Covidcast Direction" job dies on missed data_stdevs key

The database has a 32-character limit on the length of signal names. This is irregularly enforced, and it causes Automation to fail when the limit is transgressed.

Signal names are stored in the data_stdevs[source][signal] dictionary exactly as they are read from an ingested filename. When they are inserted into the database, they get truncated to 32 characters. The "Compute Missing/Stale Covidcast Direction" job reads signal names out of the database and expects to find them in the data_stdevs[source][signal] dictionary, but the truncated names are not listed there. The job fails with a KeyError:

Traceback (most recent call last):
  File "/home/automation/.pyenv/versions/3.4.10/lib/python3.4/runpy.py", line 170, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/automation/.pyenv/versions/3.4.10/lib/python3.4/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/automation/driver/delphi/epidata/acquisition/covidcast/direction_updater.py", line 189, in <module>
    main(get_argument_parser().parse_args())
  File "/home/automation/driver/delphi/epidata/acquisition/covidcast/direction_updater.py", line 177, in main
    update_loop_impl(database)
  File "/home/automation/driver/delphi/epidata/acquisition/covidcast/direction_updater.py", line 137, in update_loop
    data_stdev = data_stdevs[source][signal][geo_type]
KeyError: 'wip_confirmed_7day_avg_cumulativ'

We should either extend the character limit in the database schema (perhaps at the same time we add in issue dates), or truncate the signal name after it gets read out of the filename and before it gets loaded into data_stdevs[source][signal].

consider using sqlalchemy

https://www.sqlalchemy.org/ provides a nice level on top of any python DB connection and there are several libraries on top of it.

Why I’m not answering the CMU Delphi survey on Facebook

I appreciate what you’ve done and continue to do in epi-forecasting. But it seems to me like the problem in front of us is not so much lack of information as is it is lack of spine. I think epidemiologists need to join together and say that we’re (our governments are) doing the exact worst thing. I mean is there an epidemiologists out there who thinks that the current approach is appropriate? It seems to me like anyone with expertise and foresight must believe that we’re either doing too much or not enough, and the shift in either direction would be better than the status quo.

What’s the plan?
•••••••••••••••••
Plan? MY view: We MUST either step up and tighten (w/quarantines AND/or testing) our state and/or country borders AND get serious about tracing AND other steps to wipe it out OR allow a controlled spread, by unlocking, from youngest to oldest as that is the fastest approach to restoring some kind of normality. politician approval rates are terrible for good reason. I HATE our empowered politicians, who are (AFAIK) ALL too spineless to do anything but think unbelievably short term. So we’ve got the worst of both worlds. We’re not wiping out the virus AND we’re destroying the economy. We’re doing the one thing stupider than opening up the economy further, which is not tightening it up enough to wipe out the virus and keep it out. As Mr. Rogers, said I’m so angry I could bite. I don’t hear anybody asking our president or governor or mayor what the long-term plan is!!! We/ Our media must ASK them and more and demand answers and report! Leaders and health officials seem too ignorant or power-hungry to understand how a privacy-preserving app could be very effective; their myopia is maddening!

Please forgive me if i’m unable to keep entirely to the issue system guidelines a wee bit here. I just had to get this off my chest. No I’ve never done this before, and expect I won’t again. Crazy times. You can just close this. Please don’t delete it. Or do, but a friendly word would really mean a lot right now. Thanks 🙏.

Some of the URL links in README.md are broken.

covidcast_meta API Down?

Hey, noticed this morning that https://delphi.cmu.edu/epidata/api.php?source=covidcast_meta is not responding, has anything changed with the way to access this endpoint?

Have been using the sample string provided at: https://cmu-delphi.github.io/delphi-epidata/api/covidcast_meta.html

Thanks!

Add "soft delete"/"hide" to covidcast ingestion

We occasionally want to remove a span of data from the covidcast API with the following constraints:

avoid blowing away our existing estimate history for that span
avoid the API erroneously reporting 0 for that span

For the moment, we're just sacrificing the first constraint and having an admin do a DELETE FROM, but it would be less strain on the rapidly-shrinking devops team if sensor groups could do this themselves through the existing ingestion infrastructure. Maybe a magic value?

This would require the API serving routine and the metadata generator to be aware of whatever we design, so that the relevant spans are excluded.

Permit caching of COVIDcast signals for a few hours

None of the signals update more than once a day, so we could get a substantial performance boost in the map if we allowed caches to stay good for a few hours.

Putting clients in package indices

Hello,

I was wondering if its possible to put the client code as packages on corresponding language repositories (pypi. npm and cran). That would make using/setting them easier than the current method where users need to pull in updates to code manully.

Allergy Responses

I am not certain of the self reporting parameters you use in this "data" gathering exercise. Usually, if the data environment is not closely controlled - the very essence of self reporting - the data collected are between nearly useless and totally useless. Let's take Navajo and Apache Counties in Arizona in mid April. We are in such a high pollen environment, we get pollen danger notifications on our smartphones. This is mostly for the respiration challenged and the allergic but it is useful to all. Symptoms include difficulty breathing, mild to severe sinus headaches, persistent dry cough, occasional mild fevers, etc. Sound familiar? How many of those people have responded with positive covid symptoms out of simple fear that their allergic response is covid 19? What we need rather than more of this near useless "data" is massive nationwide testing. Perhaps Zuckerberg could throw a billion or two at that problem. Dr. Ronald L Rabie, Los Alamos National Laboratory, Retired.

question regarding publish docs to gh-pages

as far as I can see you have a github action that copies the /docs directory to the gh-pages branch.

However, github directly let you select the main branch and the /docs directory underneath it as the gh-pages source, so there would be no need for that:

clients incorrectly requiring auth parameter for sensors source

problem: the auth parameter is optional for the sensors api:

delphi-epidata/src/server/api.php

Lines 1309 to 1314 in c48af8f

 if(require_all($data, array('names', 'locations', 'epiweeks'))) { 

 if(!array_key_exists('auth', $_REQUEST)) { 

 $auth_tokens_presented = array(); 

 } else { 

 $auth_tokens_presented = extract_values($_REQUEST['auth'], 'str'); 

 }

however, the clients are all incorrectly requiring the auth parameter to access sensors, e.g.:

delphi-epidata/src/client/delphi_epidata.py

Lines 459 to 463 in c48af8f

 def sensors(auth, names, locations, epiweeks): 

 """Fetch Delphi's digital surveillance sensors.""" 

 # Check parameters 

 if auth is None or names is None or locations is None or epiweeks is None: 

 raise Exception('`auth`, `names`, `locations`, and `epiweeks` are all required')

impact: customers of the clients will be incorrectly restricted from using the sensors source without auth.

proposal: make auth optional for sensors in the clients.

fluview_clinical queries return duplicate results

problem: fluview_clinical queries seem to always return duplicate results, even accounting for the same release date.

from dfarrow0@ offline:

I checked, and can confirm they are duplicated in the database. I think is is not supposed to be this way. I suspect problem with unique key constraint.

impact: if customers don't expect duplicates, their usage of the data may be incorrect.

proposal: if the duplicates are unexpected, fix the bug causing them. if they are expected, update the documentation for this source to make it clear to customers.

example: https://delphi.midas.cs.cmu.edu/epidata/api.php?source=fluview_clinical&regions=nat&epiweeks=202001 currently returns:
{"result":1,"epidata":[{"release_date":"2020-04-13","region":"nat","issue":202014,"epiweek":202001,"lag":13,"total_specimens":64980,"total_a":5651,"total_b":9647,"percent_positive":23.5426,"percent_a":8.69652,"percent_b":14.8461},{"release_date":"2020-04-16","region":"nat","issue":202014,"epiweek":202001,"lag":13,"total_specimens":64980,"total_a":5651,"total_b":9647,"percent_positive":23.5426,"percent_a":8.69652,"percent_b":14.8461},{"release_date":"2020-04-11","region":"nat","issue":202014,"epiweek":202001,"lag":13,"total_specimens":64980,"total_a":5651,"total_b":9647,"percent_positive":23.5426,"percent_a":8.69652,"percent_b":14.8461},{"release_date":"2020-04-14","region":"nat","issue":202014,"epiweek":202001,"lag":13,"total_specimens":64980,"total_a":5651,"total_b":9647,"percent_positive":23.5426,"percent_a":8.69652,"percent_b":14.8461},{"release_date":"2020-04-12","region":"nat","issue":202014,"epiweek":202001,"lag":13,"total_specimens":64980,"total_a":5651,"total_b":9647,"percent_positive":23.5426,"percent_a":8.69652,"percent_b":14.8461},{"release_date":"2020-04-15","region":"nat","issue":202014,"epiweek":202001,"lag":13,"total_specimens":64980,"total_a":5651,"total_b":9647,"percent_positive":23.5426,"percent_a":8.69652,"percent_b":14.8461},{"release_date":"2020-04-10","region":"nat","issue":202014,"epiweek":202001,"lag":13,"total_specimens":64980,"total_a":5651,"total_b":9647,"percent_positive":23.5426,"percent_a":8.69652,"percent_b":14.8461},{"release_date":"2020-04-14","region":"nat","issue":202014,"epiweek":202001,"lag":13,"total_specimens":64980,"total_a":5651,"total_b":9647,"percent_positive":23.5426,"percent_a":8.69652,"percent_b":14.8461},{"release_date":"2020-04-11","region":"nat","issue":202014,"epiweek":202001,"lag":13,"total_specimens":64980,"total_a":5651,"total_b":9647,"percent_positive":23.5426,"percent_a":8.69652,"percent_b":14.8461},{"release_date":"2020-04-15","region":"nat","issue":202014,"epiweek":202001,"lag":13,"total_specimens":64980,"total_a":5651,"total_b":9647,"percent_positive":23.5426,"percent_a":8.69652,"percent_b":14.8461},{"release_date":"2020-04-13","region":"nat","issue":202014,"epiweek":202001,"lag":13,"total_specimens":64980,"total_a":5651,"total_b":9647,"percent_positive":23.5426,"percent_a":8.69652,"percent_b":14.8461},{"release_date":"2020-04-16","region":"nat","issue":202014,"epiweek":202001,"lag":13,"total_specimens":64980,"total_a":5651,"total_b":9647,"percent_positive":23.5426,"percent_a":8.69652,"percent_b":14.8461},{"release_date":"2020-04-10","region":"nat","issue":202014,"epiweek":202001,"lag":13,"total_specimens":64980,"total_a":5651,"total_b":9647,"percent_positive":23.5426,"percent_a":8.69652,"percent_b":14.8461},{"release_date":"2020-04-14","region":"nat","issue":202014,"epiweek":202001,"lag":13,"total_specimens":64980,"total_a":5651,"total_b":9647,"percent_positive":23.5426,"percent_a":8.69652,"percent_b":14.8461},{"release_date":"2020-04-11","region":"nat","issue":202014,"epiweek":202001,"lag":13,"total_specimens":64980,"total_a":5651,"total_b":9647,"percent_positive":23.5426,"percent_a":8.69652,"percent_b":14.8461},{"release_date":"2020-04-15","region":"nat","issue":202014,"epiweek":202001,"lag":13,"total_specimens":64980,"total_a":5651,"total_b":9647,"percent_positive":23.5426,"percent_a":8.69652,"percent_b":14.8461}],"message":"success"}
notice, for example, the row for release_date 2020-04-14 occurs 3 times.

Pulling ght (google health trend) data by county

Hey there!

So I can use Python library to pull the ght data by state:
delphi_epidata.Epidata.covidcast('ght', 'smoothed_search', 'day', 'state', [delphi_epidata.Epidata.range(int(start_date), int(end_date))], 'CA')

However, pulling by county doesn't return any result:
delphi_epidata.Epidata.covidcast('ght', 'smoothed_search', 'day', 'county', [delphi_epidata.Epidata.range(int(start_date), int(end_date))], '08013')

// Previously i tried '*' to pull all county data from the fb signal, which doesn't work here either.

Could you please help on what is the correct API call? Thanks!

consider add option to return only a subset of fields

e.g., when querying for a specific geo_value, one might want to exclude the geo_value from the response to save space. like an optional field in which the users defines the list of fields to return.

exception for public health lab data

The scraper raises an exception for public health lab data saying that the header row has changed. I temporarily disabled scraping here.

Stack trace (some info stripped):

Traceback (most recent call last):

  [...]

  File ".../epidata/acquisition/fluview/fluview_update.py", line 541, in <module>
    main()
  File ".../epidata/acquisition/fluview/fluview_update.py", line 538, in main
    update_from_file_public(issue, date, filename, test_mode=args.test)
  File ".../epidata/acquisition/fluview/fluview_update.py", line 390, in update_from_file_public
    data = [get_public_data(row) for row in rows]
  File ".../epidata/acquisition/fluview/fluview_update.py", line 390, in <listcomp>
    data = [get_public_data(row) for row in rows]
  File ".../epidata/acquisition/fluview/fluview_update.py", line 267, in get_public_data
    raise Exception('header row has changed for public health lab data.')
Exception: header row has changed for public health lab data.

Please take a look at your convenience.

consider not overwriting ingested files

problem: after ingestion of a covidcast file, the file is archived. if another file already exists in the archive with the same name, that file is overwritten, whether the ingestion succeeded or failed.

impact: while this may be acceptable for failed ingestions given the current logging, i am a little concerned about the potential silent overwrite in the case of success. for example, if you somehow get a truncated version of a file that was already ingested, having both versions archived could be useful for both determining the extent of the problem and for quickly back-filling the lost data.

proposal: perhaps consider adding a timestamp to archived successfully ingested files?

Originally posted by @pedritom-amzn in #70

R, js, and coffee clients missing support for wiki source's required language parameter

problem: the api for the wiki source now requires the language parameter:

delphi-epidata/src/server/api.php

Lines 1204 to 1205 in c48af8f

 } else if($source === 'wiki') { 

 if(require_all($data, array('articles', 'language'))) {

however, while the python client supports it, none of the other 3 clients (R, js, coffee) do. for example:

delphi-epidata/src/client/delphi_epidata.R

Line 240 in c48af8f

wiki <- function(articles, dates, epiweeks, hours) {

impact: customers won't be able to use these three clients for accessing the wiki source.

proposal: add the language parameter as a required parameter for the wiki source in these three clients.

use epidata client in covidcast integration tests

the covidcast integration tests should use the epidata client now that it supports the covidcast endpoints

Specify `source=covidcast` in COVIDcast API docs

We never mention source:

https://cmu-delphi.github.io/delphi-epidata/api/covidcast.html#constructing-api-queries

Consider adding support for 'group_by' in COVIDcast queries

The proposed tree format for multi-signal queries is really a group-by. We should consider permitting a group_by parameter that would generate this structure for any parameter that accepts multiple values (time_value certainly; geo_value might hurt).

Broken link in docs

On https://cmu-delphi.github.io/delphi-epidata/api/, the link to the definition of epiweeks ("this page") is broken.

Catch missing '$' and other bad PHP

In theory the integration tests should have caught this typo; in practice everything passed and then promptly crashed in production. We need to

Figure out what happened
If it's something we can test for using the current rig, do so
If not, figure out what is needed for adequate testing, and build out the infrastructure to support it

Missing log message for failed COVIDcast CSV files

There is some failure case that's not being adequately logged:

handling  /common/covidcast/receiving/jhu-csse/20200828_state_deaths_7dav_cumulative_prop.csv
deaths_7dav_cumulative_prop False
archiving as failed - jhu-csse

Correct handling of a failed CSV is logged like this:

handling  /common/covidcast/receiving/jhu-csse/20200828_county_deaths_incidence_num.csv
deaths_incidence_num False
 invalid value for Pandas(geo_id='.0000', val='7.0', se=nan, sample_size=nan) (geo_id)
archiving as failed - jhu-csse

Update the version of Kramdown we're using

A Kramdown vulnerability came out over the weekend:

https://github.com/cmu-delphi/delphi-epidata/network/alert/docs/Gemfile.lock/kramdown/open

I attempted to use the auto tool to update it, but dependabot wasn't able to find its way out of a dependency conflict.

Is there a limit to the number of rows retrieved with DELPHI API?

I am trying to retrieve a lot of data from the DELPHI API, but am having trouble getting all the data I am requesting. Is there a limit that I am running into? For example, I run

source("https://raw.githubusercontent.com/cmu-delphi/delphi-epidata/master/src/client/delphi_epidata.R")
res <- Epidata$fluview(regions = list("nat", "hhs1", "hhs2", "hhs3", "hhs4", "hhs5", "hhs6", "hhs7", "hhs8", "hhs9", "hhs10"), 
                       epiweeks = list(Epidata$range(199740, 201653)),
                       issues = list(Epidata$range(199740, 201653)))
df <- do.call(rbind, lapply(res$epidata, rbind))

And I end up with data from only regions "nat" and "hhs1" and only until epiweek 200450. The total number of rows in the resulting dataframe is 3650.

	Epidata.covidcast_meta = function(callback) {
	return _request(callback, {
	'source': 'covidcast_meta'
	});
	};

	if(require_all($data, array('names', 'locations', 'epiweeks'))) {
	if(!array_key_exists('auth', $_REQUEST)) {
	$auth_tokens_presented = array();
	} else {
	$auth_tokens_presented = extract_values($_REQUEST['auth'], 'str');
	}

	def sensors(auth, names, locations, epiweeks):
	"""Fetch Delphi's digital surveillance sensors."""
	# Check parameters
	if auth is None or names is None or locations is None or epiweeks is None:
	raise Exception('`auth`, `names`, `locations`, and `epiweeks` are all required')

	} else if($source === 'wiki') {
	if(require_all($data, array('articles', 'language'))) {

cmu-delphi / delphi-epidata Goto Github PK

delphi-epidata's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs