datameet / covid19 Goto Github PK
View Code? Open in Web Editor NEWNovel Corona Virus - COVID-19 India Datasets by DataMeet. Sunsetted on 2022-10-21. Will not update anymore.
Home Page: https://projects.datameet.org/covid19/
License: Other
Novel Corona Virus - COVID-19 India Datasets by DataMeet. Sunsetted on 2022-10-21. Will not update anymore.
Home Page: https://projects.datameet.org/covid19/
License: Other
The data contained in data/mohfw.json has a slight error. the numbers for kerala are
"report_time":"2020-07-30T08:00:00.00+05:30","cured":11365,"death":68,"confirmed":21797
"report_time":"2020-07-31T08:00:00.00+05:30","cured":12159,"death":70,"confirmed":2203,
"report_time":"2020-08-01T08:00:00.00+05:30","cured":13023,"death":73,"confirmed":23613
for date 31 there is a slight inconsistency maybe with the scraping
@all-contributors please add @thejeshgn for code,data,doc
Just a heads up, the cumulative testing numbers for the 6th of Nov are incorrect in the icmr_testing_status.json โ 20864750 instead of the 11,54,29,095 in the ICMR report.
Hi, I noticed that Our World in Data points to this repository for data on India's test positivity rate. Do you know where I can find this data here? I was able to find the total number of tests provided by ICMR, but not the number or % of tests that are positive. Is this calculated by combining MOHFW data on the number of cases with the ICMR testing numbers? Thanks.
JSON doc format could be.
{
"_id": "2022-01-22T09:00:00.00+05:30|positivity-rates|wikidataId",
"report_time": "2022-01-22T09:00:00.00+05:30",
"district":"wikidataId/<wikidataId>",
"positivity": null,
"rtpcr_contribution": null,
"rat_contribution": null,
"source": "icmr",
"type": "district-wise-weekly-positivity-rates"
}
report_time
also is the end of the week for the data point. Do we need start date?Do you have district wise data for Indian states? The data on MOHFW where district wise PDF is maintained is not updated. So, trying my luck if you have any source that can be reference or you already have the data.
Also, a request, I hope you have seen my dashboard, if not then please take a look and can you refer it in your website under "Projects Using this dataset", it would be really great :) Thanks in advance...
https://app.powerbi.com/view?r=eyJrIjoiNWEyNThlZTItYTY3MC00NDM5LWEyYTgtZDBiMzc4MmNlNDdiIiwidCI6ImM4ZWNhM2NhLTEyNzYtNDZkNS05ZDlkLWEwZjJhMDI4OTIwZiIsImMiOjl9
There is an entry in the non-virus deaths that has the date '2010-04-12' it is probably '2020-04-12'.
I sorted the data according to dates and that's why that popped out. Thanks ๐
Hi, I think the data for 11/20 might be missing a digit: https://github.com/datameet/covid19/blob/master/data/icmr_testing_status.json#L244
It is 10x lower than the two surrounding dates.
Error: #32
Should I fix the raw downloaded JSON file
2021-04-13T08:00:00.00+05:30_md5_fe0e82466266d5fe3b71e051d289810b.json
As of now I have manually fixed the parsed data.
At https://github.com/datameet/covid19/tree/master/downloads/icmr-backup
I missed downloading ICMR_testing_update_24Aug2020.pdf
If you have it, can you send me a PR.
@thejeshgn
I'm running a cron job for a project that fetches the testing data from icmr_testing_status.json at 05:00 every morning.
The job failed yesterday due to missing data for the day. Just wanted to know whether this was a one-off error or do I need to add some data validation to the script.
Appreciate the resource you guys have put together.
Update : Error was in the script.
1stdose
to first_dose
2nddose
to second_dose
Check @jishnupdas project.
https://github.com/jishnupdas/Covid-19-IND/blob/master/data_update.sh#L9
Currently for 2020-11-09 report_time is of the format 2020-11-09T08-00-00-00+05-30
, it should be 2020-11-09T08:00:00.00+05:30
Missing https://www.mohfw.gov.in/pdf/CummulativeCovidVaccinationReport18thjuly2021.pdf
Hi All,
If you follow the datameet/covid19[1] Github repo, you would have seen that we download[2] the cumulative covid vaccination report daily. As of now, we are extracting only Total Doses at the India level and adding to the JSON[3]. The data looks like this, for each day.
{
"_id": "2021-09-12T09:00:00.00+05:30|vaccinations",
"report_time": "2021-09-12T09:00:00.00+05:30",
"total":738207378,
"source": "mohfw",
"type": "vaccinations"
}
Now, I have written a script to extract other parts of the PDF and the state level. The dataset will be backward compatible( shouldn't break any of your data pipelines). It will look like this at India level. Two additional attributes, "1stdose" and "2nddose".
{
"_id": "2021-09-12T09:00:00.00+05:30|vaccinations",
"report_time": "2021-09-12T09:00:00.00+05:30",
"total":738207378,
"1stdose":561101965,
"2nddose":177105413,
"source": "mohfw",
"type": "vaccinations"
}
There will be new records at the state level, which will look like this.
{
"_id": "2021-09-12T09:00:00.00+05:30|vaccinations|ka",
"report_time": "2021-09-12T09:00:00.00+05:30",
"state": "ka",
"total": 47445632,
"1stdose":35196111,
"2nddose":12249521,
"source": "mohfw",
"type": "vaccinations"
}
for Unassigned or Miscellaneous state will be
{
"_id": "2021-09-12T09:00:00.00+05:30|vaccinations|unassigned",
"report_time": "2021-09-12T09:00:00.00+05:30",
"state": "unassigned",
"total": 3458791,
"1stdose":1556469,
"2nddose":12249521,
"source": "mohfw",
"type": "vaccinations"
}
Currently, I have been parsing and loading the old data (since 2021-03-08). It should be available by this weekend.
Once this is done. I will look into parsing and loading the District wise positivity rates.
You can follow the progress of this data load here.
[1] https://github.com/datameet/covid19/
[3] https://github.com/datameet/covid19/blob/master/data/mohfw_vaccination_status.json
Add "first_dose_15_17" : none
to all the missing days. Currently its not present in the JSON.
Scraper failed for some reason. I will add manually.
In the daily cases of MP state for 13-04-2021 the number of cases is wrong.
{
"id": "2021-04-12T08:00:00.00+05:30|mp",
"key": "2021-04-12T08:00:00.00+05:30|mp",
"value": {
"_id": "2021-04-12T08:00:00.00+05:30|mp",
"_rev": "1-933316a7408ba6a730c23622d73e30f3",
"state": "mp",
"report_time": "2021-04-12T08:00:00.00+05:30",
"cured": 298645,
"death": 4184,
"confirmed": 338145,
"source": "mohfw",
"type": "cases"
}
--
},
{
"id": "2021-04-13T08:00:00.00+05:30|mp",
"key": "2021-04-13T08:00:00.00+05:30|mp",
"value": {
"_id": "2021-04-13T08:00:00.00+05:30|mp",
"_rev": "1-a5060878199d1a56e015cf657efecad6",
"state": "mp",
"report_time": "2021-04-13T08:00:00.00+05:30",
"cured": 301762,
"death": 4221,
"confirmed": `34464`,
"source": "mohfw",
"type": "cases"
}
--
},
{
"id": "2021-04-14T08:00:00.00+05:30|mp",
"key": "2021-04-14T08:00:00.00+05:30|mp",
"value": {
"_id": "2021-04-14T08:00:00.00+05:30|mp",
"_rev": "1-80b865800fe853b2ee610ea6b1b43d25",
"state": "mp",
"report_time": "2021-04-14T08:00:00.00+05:30",
"cured": 305832,
"death": 4261,
"confirmed": 353632,
"source": "mohfw",
"type": "cases"
}
Identify the official source for each state and write a crawler to grab the press release/update every day.
Some of the sources are listed at:
https://telegra.ph/Covid-19-Sources-03-19
How frequently is this data refreshed?
Thank you for this wonderful piece of work. It is really good to have a source of data which truly represents what the government agencies put out.
The mohfw.json uses 2-letter words to represent names of states. These keys are however never explained. For example, does 'la' point to Lakshadweep or does it point to "Ladakh"? From the ministry, it is apparent that 'la' means Lakshwadeep. And, it appears that in your dataset, 'la' means Ladakh. I hope you can also provide a document which maps the 2-letter state codes to actual state names.
On 20th its still linking to this which gives 404.
http://mohfw.gov.in/pdf/ClativeCovidVaccinationCoverage19thJune2021.pdf
They are all returning 404 now.
I will put them here
https://github.com/datameet/covid19/tree/master/icmr
{
"_id": "2021-01-20T09:00:00.00+05:30|vaccinations",
"_rev": "1-c557793c024b73c7870406896138f504",
"report_time": "2021-01-20T09:00:00.00+05:30",
"total": 674835,
"source": "mohfw",
"type": "vaccinations"
}
Its gettting published as part of bulletin here
There is a drop. Check and verify.
I can see that mohfw is adding State Wise Vaccination Coverage.
https://www.mohfw.gov.in/pdf/CumulativeCOVIDVaccinationCoverage7thMarch2021.pdf
Its backed up daily inside covid19/downloads/mohfw-backup/cumulative_vaccination_coverage
I will look for old ones.
Documents related recommended hospital service costs by various state/city goverments. Collect and scrape the data.
https://github.com/datameet/covid19/tree/master/downloads/hospital-service-costs-backup
MoHFW is adding "District-wise COVID-19 test positivity rates" as an excel everyday. I noticed it yesterday, So I have it for 9th and 10th of June.
I don't know when they started. We will have to get the old ones. But they seem to remove the old ones. For example 9th June xls is not accessible anymore.
For 10th June its
https://www.mohfw.gov.in/pdf/COVID19DistrictWisePositivityAnalysis10thJune.xlsx
Currently I am not parsing the XLS, I am just archiving them. They are available here
If you have the older documents, can you send me a MR?
From: https://stopcoronavirus.mcgm.gov.in/
Attached document:
Containment-Zones.pdf
Last file that was updated for mohfw.json, something went wrong and it has incomplete data.
It terminates abruptly at line 439.
You guys are really doing amazing work and I used your json file to create a dashboard. however, it is failing now due to incomplete json file.
I would really appreciate if you can fix this.
Thanks in advance.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.