davidbau / covid-19-chart Goto Github PK
View Code? Open in Web Editor NEWChart of current COVID-19 time series data. Enables a variety of county- state- and nation-level comparisons and data exploration.
Home Page: https://covid19chart.org/
Chart of current COVID-19 time series data. Enables a variety of county- state- and nation-level comparisons and data exploration.
Home Page: https://covid19chart.org/
Everybody asks for this: what's the ratio of deaths to confirmed cases?
I am unconvinced that this ratio is a meaningful number (because both deaths and confirmed have huge sources of noise; and the time of death is shifted from the time of onset by a couple weeks, while meanwhile under exponential growth, during those two weeks the infection rate may have increased by 30-fold). A ratio will amplify the all these problems and my intuition is that such a ratio may dramatically undercount the actual peril, and will give people a very dangerous false sense of security.
At best, the number will be be almost uninterpretable.
But it is requested so often that maybe it should be an option for "advanced" mode.
Also some analysis have suggested "death divided by confirmed cases 2 weeks ago" is a better measure of something. Suggestions welcome.
LOVE what you've done! Very nice.
I may have to borrow your excellent ideas and make a version of my program that has additional views including a single graph.
I don't mind being overwhelmed with data. Rather be overwhelmed than not have enough.
Maybe provide another option to choose the number of states to graph? I don't know how my state is doing for example because it's not (yet) in the top 10.
Here's my graphs by country for example. I don't always want to see that many, but it's nice to be able to.
CDS is doing an increasingly good job... To merge data sources and to support things like population normalization, we need to normalize location names and data formats between CDS and JHU and us.
Our conventions (and conversions from JHU) are encoded in lib/flatten.js
(exercised in test_csse.js
). Each locality is either a country, "state", or "county", and is named as a suffix of"[county], [state], [country]" except in the US where we drop the country and just call it "[county], [state]". US States are all called by two-letter abbreviations.
And county names follow the JHU convention of not saying "County" at the end (maybe we should change this for clarity).
We should have
lib/flatten.js
creates for CSSE data.Some context on Twitter: https://twitter.com/fearthecowboy/status/1248622602257489920
I'm not sure how interesting this is, but it's worth thinking about. i.e. chart the number of cases per sq km. Or maybe some fancy formula that somehow takes both area and population into account.
Thinking aloud, I'd expect that for a given population, the situation would get worse as the area gets smaller, as people are intrinsically closer together. e.g. to take extremes, DC and Alaska have similar populations but massive size difference. Of course, things get tricky because large territories tend to have their population in big clusters. e.g. more than half of Alaska's population in in Anchorage metro, while extremely large areas are completely unpopulated (and sort of irrelevant).
Anyway, let's discuss whether this is a direction that might make sense.
One of the goals of the visualization is to show how the logarithmic and linear views are different sides of the same story. Ideally, we could animate between the two views when switching.
Similarly, ideally we could animate between >=
threshold and fixed-date views also.
This type of transition is common in D3 - but we are not currently built on D3.
Not sure how feasible it is while staying with chartist.js.
The infection rate in the U.S. is getting high enough (19821 per day, which is about one every five seconds) that it can be visualized in real-time, and even listened to (esp global aggregates).
We should consider modifying the plot to communicate this:
(1) The next "point" should be estimated in real-time based on the rate from yesterday, or the last few days; as you look at the graph, it should be counting up. The unreported numbers should be shown with a dotted line to show they are estimates.
(2) Every estimated new case should be announced by an audible sound, to make it clear the significance of every individual being infected. Like the significance of a single bullet strike in a war.
County data is shown here:
https://covid19chart.org/test_county_map.html
Many counties are shown as missing data.
To debug this we should
(1) eliminate testdata/county_dates.csv
and hook up the test directly to the live feed.
(2) track down whether the missing data is due to missing fips codes, or actual absent data in JHU feed
If the data is actually missing, should investigate if merging feeds will solve it - the CDS and NYT feeds are independently sourced and may have data JHU does not.
The virus is a biological meme, 30K of RNA base-pair sequences copied from host to host, randomly mutating in a gradual process to improve its fitness. The population-wide human response to a virus is also done by spreading memes. But here each meme is a piece of information about how the virus works and how to stop it, transmitted from person to person, processed and synthesized intentionally. The dynamics of the response seem pretty different.
So far we have only plotted time series for the virus "bad guys".
It would be interesting to see time series for the "good guys" - the researchers ideas circulating about understanding the disease and possible treatments.
Here is a dataset that contains the text of 63627 covid research papers so far. https://www.semanticscholar.org/cord19/download
I have not yet seen time series visualizations of this data. We could simply plot number of cumulative papers by keyword every day. Or we could plot daily appearances of words within paper text or citations.
Questions that should be able to be answered: How many papers a day are mentioning Remdesvir, HCQ, etc? What are this week's biggest percentage gainers?
With the exception of a Domain change, changing settings does not affect the legend. So ideally, changing non-Domain settings should not reset the legend selection.
i.e. I should be able to select US & France, and then tweak things like scale, start, confirmed/death, etc... without losing my country selection.
Currently there are various options that are not clearly explained in the UI. With the current interface it is hard to expect people to
We could arrange the select options to grow to show longer names when dropped down, which might help with issues (1) and (2). We could add more explanatory text.
As requested here, testing data is helpful to know:
https://twitter.com/O_2the_L/status/1245114253524307969?s=20
Some states report number of tests done, so tests done in a locality could be plotted on a time series in some cases.
Available in the CDS data, not from JHU, so depends on issue #2.
We depend on the JHU CSSE feed being served by raw.githubusercontent.
That website is currently down.
We should not depend on this directly. We should
These are hard to fix quickly - it looks like github pages is also not propagating at the moment.
We currently pick the same "top N" entities based on total count, rather than using per-population count if that's what's being graphed.
Use some autocomplete interface to make it possible to easily add county/state/country-level series when you don't know exactly which ones are available.
A widget like this https://autocomplete.trevoreyre.com/#/ could help.
Possible design: after a graph is plotted, we populate the widget with the locality names that are actually nonzero (i.e., not all 3000 counties at first). Selecting one will add it to the "include" list. Maybe add a reset button to clear them.
Should have a selectable map view for LHS, where an SVG U.S. map is used to show states or counties.
When hovering on a date on the graph, ideally the current date timeseries stats should also be shown as a chloropleth.
I would like to add onto the desktop application I built for worldwide data to include data about the individual states.
The CSV file I'm using for my current code is this time-series file:
https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv
The entire dataset is in the single CSV file.
For this project's data, are you aggregating the data that is contained in the individual daily reports?
https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_daily_reports
It seems like that's the only way to get data at a state level.
Currently the reset button reset the "include" state. But it's odd to click reset and see it show a small subset of the default series.
Reset should probably reset the "selected" state also, so the plot can go back to default settings.
The point of covid19chart.org is to make it easy for local officials to see and compare what is happening in relevant localities. To deliver on this, we should add by-county rollups.
Ideally:
We should also load the data from an alternate data source so that the system is more robust.
This one looks good:
https://coronadatascraper.com/timeseries-byLocation.json
Ideally we would always load from both CSSE and CDS and merge the data (e.g., take the max cumulative for a day when both report?) And ideally we would be robust to either data source going down or publishing un-parseable data.
I have factored load_csse.js as a first step.
The daily deltas are very noisy. A 7-day sliding window delta would be a smoother signal, and probably worth adding as an option.
It would be ideal to change the way the subtraction is done so that the 1st 6 days are not left blank on the visible plot.
Related to #12, which will work better with smoothed data.
Might be interesting to @davidebbo
@bleroy reported going to https://covid19chart.org/#/?stat=7day&scale=linear&include=WA&top=0&series=deaths&start=3%2F1%2F20&ratio=&advanced=1 and getting a broken graph on a OnePlus 6T.
May need to get more details.
Hovering over legend can highlight multiple series when there are more than 26 shown. This is due to the recycling of class names in chartist.
To see the effect, visit https://covid19chart.org/#/?top=53 and hover over the legend.
CSSE added a population column in this table a few days ago; we should use it as our source for population.js instead of CDS
This would make them persistent when refreshing that page, and would allow sharing permalinks with a subset of localities selected.
I'm not 100% sure about this, but...
When viewing population-normalized data, a >=N start rule should apply to the normalized series, not the unnormalized ones. That is, it should be interpreted as ">=N per-million".
Why?
I'd be happy to make the change, if you felt that it would be helpful! (I would also adjust the ">=N" options to be more appropriate to "/pop" mode, when it is selected.)
Thanks a bunch.
With relative starts (e.g. >= 30), the locality is displayed on the right side of each line. For some reason, this doesn't happen with absolute start times (e.g. 3/23/2020).
Suggestion from Jennifer Frazier
https://twitter.com/frazierarchive/status/1245172300200067072?s=20
Hi! I'd love it if your data (county, zip code) was overlaid on the more familiar "baselines" (US, NYC, etc). Then you can do more of a comparative study. Yes - clickable boxes so you could add/toggle views.
We can do this by implementing the following state machine.
Search box. There should be three cases for the behavior of the search box, to optimize for usability after the first item is added.
Now the only problem is that in the common case, there are lots of empty boxes for unselected things cluttering the view even when the user does't want them. So also we change the reset button: The reset button to now have two roles
Alan Warren wrote to observe that the >=30 starting threshold visualizes some weird artifacts having to do with Diamond Princess accounting. Basically, more than 30 passengers were flown to the US in mid-Feb, and JHU starts counting these as US cases on 2/23 (arbitrarily), which means that's the date we start counting the US as passing the threshold.
I do think these US cases should be included and plotted, but I agree that that part of the plot shows the effects of a different policy regime where every case was being well-separated and scrutinized, unlike today.
Maybe it would be clearer to have default starting threshold should be >=80, which skips past the time period with these differences. All the countries and states have a day where they have (logarithmically) a bit more than 80 known cases, so it's a good tightly-clustered threshold.
Idea from here, https://www.youtube.com/watch?v=54XLXg4fYsc
This idea should probably be plotted on a different webpage.
The idea of this plot is to understand whether a locality is still on the exponential domain or if we have succeeded in leaving it (or if we are resuming it). The idea is to plot log(recent changes) on the y axis versus log(less-recent changes) on the x axis. While we're on exponential growth, this will be on the line given by the exponent, but will quickly depart when not.
The youtube video described graphing log(weekly change) vs log(total cumulative) - but maybe having both sides be a delta would be able to show when (e.g., in Japan) exponential growth restarts after the society resumes social behavior too early.
Thanks for the very useful tool!
Coronadatascraper has better data for California; it matches that from the local health departments. I'd love to have the option of selecting CDS as a data source (instead of JHU). I see that it has a CSV file in JHU format. I haven't compared the files in detail; I'm sure that there are some minor differences. Even so, I confess I trust CDS (at least for California) more.
thanks!
The chart can get very crowded and difficult-to-read, esp in log threshold mode where all the time series are (informatively) drawn right on top of each other. But to see the locality that you care about in the pile, it should be made visible when hovering over the legend. (In the below - which is FL vs LA or CA vs WA?)
We should consider adding a legend hover event that:
(1) Dims all the non-hovered lines, e.g., adding an opacity < 1 to them.
(2) Brings the hovered line to z-index 10 to be in front.
(3) Makes the label on the hovered sequence more contrasty and more visible (might require looking at theme css).
Idea from Alan Warren.
Add an (advanced) mode so that the log >=100 plot can normalize all series so the first day is 1.0.
Currently whatever threshold >=80, >=100 etc you choose, the first day has a large artificial vertical offset since the daily growth rates are so high. E.g., first day in Michigan >= 100 is already 334, so the whole Michigan line floats over the others, even with a similar slope.
One solution is to normalize everything by that day, i.e., report "Total cumulative cases (log scale, normalized to first day)." This should be an option, at least in advanced mode.
In the U.S. there is a substantial primarily Spanish-speaking community. We should factor the page to support localization, and then have a spanish-language localization version.
The second largest non-English community is Chinese.
NY is chosen as the "top state" even if a graph starts on 6/1:
https://covid19chart.org/#/?advanced=-4&start=6%2F1%2F20&top=1&stat=daily
when from
https://covid19chart.org/#/?advanced=-4&start=6%2F1%2F20&top=10&stat=daily
it clearly shouldn't be.
Now that total cumulative cases are a less-useful metric, could it make sense to try to estimate current spread by computing the ration between this week's growth and the previous week's?
Repro:
(1) Default view, US states.
(2) Select NY.
(3) Enter "RI" into the search box.
Observe: RI is unselected, no graph line.
Suggestion: maybe instead of resetting selection when domain changes, reset selection when nothing selected is included in the plot.
We should add an option to plot statistics in terms of population density.
This might reveal, for example, the terribly high rates of infection in Colorado springs, which are not revealed with currently graphed stats.
The coronadatascraper feed comes with by-county population information, and so could be done as part of #2 and #3.
A bit confusing for colors to switch around when the same localities appear in different views - the graph would be more useful if colors were stable.
We should add some state to track colors and minimize color changes where possible.
https://covid19chart.org/#/?top=0&include=NY&series=deaths&start=4%2F1%2F20&scale=linear&advanced=1
I assume this is a problem with the data source. Either that or zombies.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.