Keys in data tables don't match the csv/json file content about data HOT 6 CLOSED

open-covid-19 commented on September 27, 2024

Keys in data tables don't match the csv/json file content

from data.

Comments (6)

owahltinez commented on September 27, 2024

At least both index.csv & index.json differ from documentation.

I'll investigate. Are you seeing additional columns, or are there columns missing?

thanks for still keeping the old data.json

Backwards compatibility FTW!

Data being in arrays of ordered values forces me to see/write a schema to read the data. And every time the order of those values changes, data reading breaks. It would be more robust to provide a nested object or an array of objects.

This is a good point. I'm not happy with the current format either for JSON. It is currently a "stopgap" because several tables went over the 100MB limit of GitHub pages if the previous format is used, but we are looking to move the files to (versioned!) cloud storage very soon. Would you still prefer the record-based format if it meant that files are 2-3X bigger?

from data.

jmullo commented on September 27, 2024

At least country name wasn't in the correct column, datacommons is missing from the README.

Didn't know there's such a limit. Personally I don't really care about the size, currently data.json is about 2.5 MB gzipped, not THAT huge. Of course someone else might want to think about slower (mobile) network users.

Another option would be to include column names & order in the response, similarly as in the CSV header row.

from data.

owahltinez commented on September 27, 2024

Didn't know there's such a limit. Personally I don't really care about the size, currently data.json is about 2.5 MB gzipped, not THAT huge. Of course someone else might want to think about slower (mobile) network users.

That's because data.csv does not have any administrative level 2 data. The equivalent to that in the v2 files is master.csv which is ~75MB and master.json is already too big for GitHub pages which is why it's currently missing.

Another option would be to include column names & order in the response, similarly as in the CSV header row.

I thought that was already the case. Depending on column order is way too fragile and puts too much burden on the documentation. Let me see if we can find a better way to format the JSON files without adding too much overhead, but if it comes down to it we might just go back to the old format once we have cloud storage ready.

from data.

owahltinez commented on September 27, 2024

@jmullo I have changed the format of the JSON files. Can you please take a look and tell me if it's reasonable? If you have any suggestions or feedback, I'm open to that as well.

from data.

jmullo commented on September 27, 2024

I think those are now easy enough to transform into whatever user wants:

const transformed = response.data.reduce((result, values) => {
    const obj = Object.assign(...response.columns.map((key, index) => ({ [key]: values[index] })));
    result[obj.key] = obj;
    return result;
}, {});

from data.

owahltinez commented on September 27, 2024

Great to hear, I'll close this out for now.

FYI we have finally finalized the Cloud Storage setup and the new endpoint for the files is: https://storage.cloud.google.com/covid19-open-data/v2/ (we'll update the documentation shortly)

Sadly the old endpoint for v2 will have to go away since we can't upload many of the files due to GitHub limits, but the old v1 files will continue to be updated.

Please beware that we renamed the master table to main in the new v2 endpoint :-)

from data.

Keys in data tables don't match the csv/json file content about data HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs