GithubHelp home page GithubHelp logo

Comments (30)

tlorusso avatar tlorusso commented on August 18, 2024 6

We're looking into it. In a first step, we will harmonize the structure of the example file as well as the cantonal files (this afternoon). In a next step, we will set up a validation process, @metaodi is thinking about doing it with CSVlinter and github actions as you suggested @herrstucki.

from covid_19.

jstcki avatar jstcki commented on August 18, 2024 5

@tlorusso @metaodi I opened a PR in #30 where I added schema validation. The schema is a standard JSON Schema. It's currently incomplete.

from covid_19.

jstcki avatar jstcki commented on August 18, 2024 5

Hi @tlorusso! Thanks for merging the PR. I noticed that right now the schema validation is a bit useless 😅 because a) the schema is quite incomplete and b) the files were moved.

I think two things would be useful as next steps:

  1. Change the script so that it can work with multiple schemas (probably one schema per CSV folder would make sense?)
  2. Write the schema(s)

I can do 1. and give 2. a shot, but I'm not sure what the progress on standardizing the format(s) is at the moment.

Also @rokroskar once this is working I think it would be OK to remove the schema-less validation step again?

from covid_19.

jstcki avatar jstcki commented on August 18, 2024 2

Es wäre wahrscheinlich nicht zu aufwändig, das mit einer GitHub Action und z.B. CSVLint für PRs zu automatisieren.

from covid_19.

rokroskar avatar rokroskar commented on August 18, 2024 2

@tlorusso a first csv validation action was already merged in #24 - it's running on all pushes, but it's not required for merges. Even before setting up schemas etc this will catch and prevent manual edit errors.

from covid_19.

jstcki avatar jstcki commented on August 18, 2024 2

@tlorusso OK, sounds good. I'll work on it when I find some time. Also thanks to you and everyone else for managing this effort! 🙏

from covid_19.

rokroskar avatar rokroskar commented on August 18, 2024 1

@zdavatz I didn't see this issue before - I opened a PR to do automatic validation - see #24. I used a js library I found, but I'm sure it could be done with awk, I just don't have enough skills :)

from covid_19.

fabian avatar fabian commented on August 18, 2024 1

They are currently working on a standardised structure: https://twitter.com/OpenDataZH/status/1240934043971211264

from covid_19.

tlorusso avatar tlorusso commented on August 18, 2024 1

@herrstucki @rokroskar In the 'total_fallzahlen'-Folder, the first 11 columns are now identical in each file, as this allow us to merge them. However, we want to allow for the collection of further information beyond these 11 columns, if necessary (thats why some differ in column length..). Do you guys have a suggestion? Could you help us defining a schema for the validation based on the shared columns?

By the way. Thank you all for your efforts. You guys are great!

from covid_19.

tlorusso avatar tlorusso commented on August 18, 2024

from covid_19.

zdavatz avatar zdavatz commented on August 18, 2024

Ich würde irgendein Bash-Script machen. Seit ihr auf Linux?

from covid_19.

zdavatz avatar zdavatz commented on August 18, 2024

Könnt ihr bitte diese Vorlagen verwenden zum abfüllen der Daten pro Kanton?

https://github.com/zdavatz/covid19_ch/blob/master/data-cantons-csv/dd-covid19-ch-cantons-latest.csv

hier noch das File für die ganze Schweiz

https://github.com/zdavatz/covid19_ch/blob/master/data-switzerland-csv/dd-covid19-ch-switzerland-latest.csv

from covid_19.

zdavatz avatar zdavatz commented on August 18, 2024

Hier noch das README dazu: https://github.com/zdavatz/covid19_ch/blob/master/README.md#data-per-canton

from covid_19.

zdavatz avatar zdavatz commented on August 18, 2024

@herrstucki Finde ich eine sehr gut Idee! Weisst Du wie das geht?

from covid_19.

zdavatz avatar zdavatz commented on August 18, 2024

https://stackoverflow.com/questions/33523362/how-to-compare-two-csv-files-in-windows/45591193

from covid_19.

zdavatz avatar zdavatz commented on August 18, 2024

Hier ein awk Befehl für Linux, welcher die Headers von zwei CSV Files vergleicht und die Headers anzeigt wenn sie übereinstimmen:
awk 'NR==FNR{A[$1]++;next}A[$1]' COVID19_Fallzahlen_Kanton_AR_total.csv COVID19_Fallzahlen_Kanton_ZH_total.csv

from covid_19.

zdavatz avatar zdavatz commented on August 18, 2024

Great, whatever works, works! Just has to be used. Thank you!

from covid_19.

rokroskar avatar rokroskar commented on August 18, 2024

But, it doesn't compare the structure to some template - it just checks that all the rows are valid given the header in the same file

from covid_19.

rokroskar avatar rokroskar commented on August 18, 2024

So I think what you are after can be a second step of this validation - happy to add it if you hand me a script

from covid_19.

zdavatz avatar zdavatz commented on August 18, 2024

@rokroskar are you also validating the name of the column header? If not, please do that too. Spelling has to be correct as well as the Upper and Lowercase.

from covid_19.

zdavatz avatar zdavatz commented on August 18, 2024

With awk you do awk 'NR==FNR{A[$1]++;next}A[$1]' file_1 file_2 if they match it prints the header of the files as the header should be the same.

from covid_19.

jstcki avatar jstcki commented on August 18, 2024

The advantage of using something more sophisticated like CSVLint is that you can also validate the content of each cell against a defined schema, i.e. validate numbers, date formats etc.

I could give it a shot this afternoon unless someone's already working on it!

from covid_19.

rokroskar avatar rokroskar commented on August 18, 2024

@herrstucki certainly using something more sophisticated would be nice - at the moment there is no schema, afaik? And does CSVLint exist only as a service/API or is there a command-line tool that could be used?

@zdavatz yes we should validate the headers - what I did right now was a very quick solution to the problems caused by manual data inputs (corrupted TI data yesterday, for example). I guess that the headers, once in place, won't change much.

from covid_19.

zdavatz avatar zdavatz commented on August 18, 2024

@rokroskar still, I would always validate them! Typos happen fast!

from covid_19.

rokroskar avatar rokroskar commented on August 18, 2024

Could one of the maintainers (@andreasamsler @tlorusso) make the validation check mandatory for merges to master?

from covid_19.

zdavatz avatar zdavatz commented on August 18, 2024

Lets merge it in ;). Great work.

from covid_19.

zdavatz avatar zdavatz commented on August 18, 2024

@tlorusso Please use these samples:

  1. https://github.com/zdavatz/covid19_ch/blob/master/data-cantons-csv/dd-covid19-ch-cantons-20200319-example.csv

  2. https://github.com/zdavatz/covid19_ch/blob/master/data-switzerland-csv/dd-covid19-ch-switzerland-20200319-example.csv

It is the Italian standard, and it is really good.

from covid_19.

rokroskar avatar rokroskar commented on August 18, 2024

@herrstucki I would say that once the schema validation is in place, then indeed my check is not needed - I think until the schema is in-place, however, the schema validation action needs to be disabled and the simpler one verifying that at least the individual CSV files are correctly formatted should be a required status check. I've lost track a bit of the schema discussions tbh unfortunately... has it been decided to deviate from what is there atm?

from covid_19.

zdavatz avatar zdavatz commented on August 18, 2024

The new README is good! Thank you!

from covid_19.

lakay avatar lakay commented on August 18, 2024

We are also making great progress with completing and cleaning up the forms.
A small PoC in Python to extract the form data is already there, more feature updates can be expected today.
pls have a look and watch at:
https://github.com/lakay/COVID-19_PDF-Reporting

Testing of the forms and later of the code is greatly appreciated. We are working currently with 3 people to make fast progress, contributors are welcome to join.

from covid_19.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.