GithubHelp home page GithubHelp logo

Comments (7)

KatrinaHoffert avatar KatrinaHoffert commented on August 20, 2024

How the parser would work:

  1. Iterate through CSV files in a folder.
  2. Each file is a location. Generate the location ID and the SQL INSERT code to create that location.
  3. For each row in the file, group by inspection date. Each group is an inspection. Generate the inspection ID and create the SQL INSERT code for that inspection.
  4. For each row in the group (inspection), create an SQL INSERT for the violation. Note that you must refer to a violation_type. So that must be created first.
  5. Repeat point 3 for each inspection (generating a new ID).

NB: There's some rows in the CSV files that don't have a violation (the fields are blank). Eg, see FoodInspectionReport (10).csv. Here, there would be an inspection, but no violations. So just check if there's only one record in the group and skip processing violations if the record is blank where violation data is.

Handling the violation types

  1. Before all the above, you should figure out what all the violation types are and hardcode them.
  2. Then whenever we need to make a reference to a violation type, you can parse the violation type string and just get the ID from the string (since it's always the first number in the string).

Example input and output

Input:

textbox6,textbox10,textbox35,textbox17,textbox43,textbox20,textbox39,textbox23,textbox9,textbox16
"Sunday, January 25, 2015",7 - ELEVEN CENTRAL,"Thursday, December 11, 2014",1100 A Central AVE,Follow-up,"Saskatoon, SASKATCHEWAN     S7N 2H1",Low,Saskatoon Health Authority,,
"Sunday, January 25, 2015",7 - ELEVEN CENTRAL,"Thursday, November 20, 2014",1100 A Central AVE,Routine,"Saskatoon, SASKATCHEWAN     S7N 2H1",High,Saskatoon Health Authority,Critical Item,1 - Refrigeration/Cooling/Thawing (must be 4°C/40°F or lower)
"Sunday, January 25, 2015",7 - ELEVEN CENTRAL,"Thursday, November 20, 2014",1100 A Central AVE,Routine,"Saskatoon, SASKATCHEWAN     S7N 2H1",High,Saskatoon Health Authority,General Item,15 - Construction/Maintenance and/or Cleaning of Premises
"Sunday, January 25, 2015",7 - ELEVEN CENTRAL,"Thursday, November 28, 2013",1100 A Central AVE,Routine,"Saskatoon, SASKATCHEWAN     S7N 2H1",Low,Saskatoon Health Authority,General Item,15 - Construction/Maintenance and/or Cleaning of Premises

Output:

-- Note that the 42 is a generated ID. Can generate by having a static int starting
-- at 1 that is incremented each time.
INSERT INTO location
    (id, name, address, postcode, city, rha)
    VALUES(42, "7 - ELEVEN CENTRAL", "1100 A Central AVE", "S7N 2H1", "Saskatoon", "Saskatoon Health Authority");

-- Note that the 20 is generated. The 42 was previously generated for the location.
INSERT INTO inspection
    (id, location_id, inspection_date, inspection_type, reinspection_priority)
    VALUES(20, 42, "2014-12-11", "Follow-up", "Low");

INSERT INTO inspection
    (id, location_id, inspection_date, inspection_type, reinspection_priority)
    VALUES(21, 42, "2014-11-20", "Routine", "High");

-- 21 was generated for the inspection
INSERT INTO violation
    (inspection_id, violation_id)
    VALUES(21, 1);

INSERT INTO violation
    (inspection_id, violation_id)
    VALUES(21, 15);

INSERT INTO inspection
    (id, location_id, inspection_date, inspection_type, reinspection_priority)
    VALUES(22, 42, "2013-11-28", "Routine", "Low");

INSERT INTO violation
    (inspection_id, violation_id)
    VALUES(22, 15);

Note that there might be minor errors in above SQL, as I did not run it. Also note that the first inspection does not have any violations. Not shown is the insertion of the violation_types.

There's a small degree of parsing that must be done. I don't know if Postgres can accept the date format in the CSVs, so you might have to parse it to convert the dates into ISO 8601. There's probably libraries that can help with this.

I didn't include the conversion of uppercase strings to title case. You can do that if time allows, but it's not a big deal right now. The city and postal code must be extracted from textbox20. To simplify development, I'd leave them blank until everything else is working.

from eatsafe.

KatrinaHoffert avatar KatrinaHoffert commented on August 20, 2024

Also, be sure to test that you can manually join up all the necessary tables. So you should be able to do something like:

SELECT *
    FROM location l, inspection i, violation v, violation_type vt
    WHERE name = "7 - ELEVEN CENTRAL" AND l.id = i.location_id AND
            i.id = v.inspection_id AND v.violation_id = vt.id;

And that should get the location with the name "7 - ELEVEN CENTRAL" and all its inspections and violations (the query actually misses inspections without violations, but don't worry about that -- the point is merely to make sure that the tables can be joined correctly).

from eatsafe.

LujieDuan avatar LujieDuan commented on August 20, 2024

please review the commit

from eatsafe.

KatrinaHoffert avatar KatrinaHoffert commented on August 20, 2024

Output looks right, although I still haven't actually tested it. Will do soon. I'm assuming that you have?

I think you need to fix the formatting of the files for the parser, though. You've got mixed tabs and spaces for indentation and inconsistent brace styles. There's also inconsistency with spacing (eg, is it if (...) or if(...)?). See the style guide issue for my proposal (although we haven't really decided on a style to use, yet).

Do the dates get inserted correctly? Because they don't look like a valid Postgres DATE (the format isn't listed in the manual).

Regarding the escaping of single quotes (which you're currently replacing with -), shouldn't it be sufficient to replace them with '' (two single quotes)? So the string "It's showtime" would become "It''s showtime". This is simply an escape measure. In the database, the strings would be correctly stored with a single single quote.

As an aside, don't forget that we need documentation on how individual group members will setup the database so that the program will work for them.

from eatsafe.

KatrinaHoffert avatar KatrinaHoffert commented on August 20, 2024

Ok, I've had a chance to test it now and I see that the dates are indeed working.

Everything inserts correctly with no errors.

Everything seems to be linked up properly. So there's just the issue of quotes and formatting. I'll do the documentation, since I've already started.

from eatsafe.

LujieDuan avatar LujieDuan commented on August 20, 2024

please review commit.
about the date: The postgre accepts the format, which is generated by java.util.Date, as you mentioned in #24 . Do I need to make any more changes to the format of date?

from eatsafe.

KatrinaHoffert avatar KatrinaHoffert commented on August 20, 2024

Formatting looks good. No change is required to the date, as Postgres accepts the format. Closing this as complete. Note: should consider integrating this system with the rest of the application. Make sure it's built the same way, has tests, etc.

from eatsafe.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.