GithubHelp home page GithubHelp logo

Comments (23)

raytula avatar raytula commented on July 19, 2024

Hi @finnshort
Today @JessyBarrette @jenjax2 @trollpete and myself were reviewing the steps needed to complete manual QC of Hakai CTD casts. It is (still?) appropriate for Jen to download a cast list from the EIMS and to set the QC column for each measurement in each cast to 'AV', 'SVD', etc and then upload the updated file? Also, what is the best way for Jen to add a comment for the overall cast? (ie. can/should Jen update the Comment column to include her personal comment? If so, what happens with any existing comment that may have been entered in the field)
Thanks, Ray

from hakai-datasets.

finnshort avatar finnshort commented on July 19, 2024

Hi @raytula, the QC feature is still live/working- I just gave it a once over on the development server and I don't see anything that's changed in the CTD schema in the past couple years that has affected it. I do recommend checking the results to make sure they are coming through as expected (as with any data changes).

I know @fostermh and @jodiew have been working on implementing changes to the CTD flags so maybe they can comment on whether they anticipate this affecting the QC flagging feature at all.

The comments can still be updated through the CTD QC excel sheets. If Jen deletes an existing comment and adds her own then the old one will be removed (not recommended). It would be better if she adds her own comment after the existing one so that both are saved.

Let me know if that answers your questions!

from hakai-datasets.

fostermh avatar fostermh commented on July 19, 2024

I was under the impression the old QC workflow was scraped. level 1 and 2 QC flags are being added to support the workflow of Jessy populating them automatically from a script. We have not moved on to anything else as we are waiting for a clearly defined workflow. To date, this has been a problem and I'm hesitant to endorse any development without it.

from hakai-datasets.

JessyBarrette avatar JessyBarrette commented on July 19, 2024

@fostermh @finnshort @jodiew @raytula Sorry for the late reply on this. I think we have now a clearly defined workflow. Just to make things clear for everyone: Changes will be done on the database side to reflect the following items:

Database Changes

  1. The data used through all this comes from the database view 'ctd/views/file/cast/data'. I'm not sure if you want to create a new view for this project, I'll leave that up to you but for now the QC tool is using this view as an input.
  2. All the *_flag variables available within that view should be replaced by a level_1 and level_2 QC flag.
    1. Level 1: QARTOD Flag number [1,2,3,4,9] (Update: column should be called *_UQL. UQL=UNESCO Quality Level [QARTOD])
    2. Level 2: Aggregated String description of the test results from the FAIL and SUSPECT Flag (Update: likely be call *_flags, we could just keep the already existing *_flag column)
  3. A position Level 1 and position Level 2 flag should be added is associated with the different tests that are applied the latitude/longitude data versus the expected site location.
  4. A Station_Latitude and Station_longitude should be added and corresponds to the latitude/longitude of the associated site within the database.

Workflow

  1. The CTD data itself is only populated by the seabird and RBR processing tool only. Not change is done by the QCing tool on that data.
  2. The QCing tool will only affect the level_1 and level_2 flag columns.
    1. The following section of the QCing tool explains how to install the package
    2. The QCing tool can be triggered by either providing an Hakai CTD profile ID or a json string of the data to be qced, See here for more detail.
  3. UPDATE:IGNORE GREY LIST: The QCing tool retrieves manual inputs contained in the view 'eims/views/output/ctd_flags' a csv file present within the repository hakai-profile-qc
    1. A hakai_id and 'query' column should be added this view.
    2. Those will be manually populated by QC reviewer or automated QC tool to overwrite tests results from the automated QCing tool if needed.
    3. We could instead rely on a simple csv file within the QCing tool, if that option is preferred by the Hakai IT. This is what we'll do!
  4. Research Dataset: The research dataset will be generated based on the CTD QC log view ('need to find the endpoint') NetCDF files with exclusively the hakai_id and variables associated with a QARTOD flag =1 and an AV value in the CTD QC log
    • This view won't have any direct interactions with the CTD profile dataset on the database anymore.

Let me know if you have any questions!

from hakai-datasets.

JessyBarrette avatar JessyBarrette commented on July 19, 2024

@jenjax2 @finnshort mentionned that we need to have your permission to get access to the CTD qc log. This log will be use to generate Research Ready NetCDF files which will then be uploaded to the Hakai CTD Research Dataset.

from hakai-datasets.

jenjax2 avatar jenjax2 commented on July 19, 2024

@JessyBarrette @finnshort you have my permission to get access to the CTD qc log.

When you say Research Ready NetCDF files, do you mean these NETCDF files will be created after I qc the data?

from hakai-datasets.

JessyBarrette avatar JessyBarrette commented on July 19, 2024

That's right Jenn, once you have a set of data QCed, a simple script will read your QC log and generate static NetCDF files with the profiles and variables you QCed as AV and the QARTOD flags is GOOD ==1.

Those NetCDF files once uploaded to our server would never be changed (unless we want to), other variables could potentially be added though. This is probably the only way I think we can make sure that the reviewed data remain constant overtime even though the whole database as an example gets rebuild for whatever reason.

from hakai-datasets.

JessyBarrette avatar JessyBarrette commented on July 19, 2024

@raytula @jenjax2 @trollpete A first draft of the research dataset is now available on the development ERDDAP:
https://goose.hakai.org/erddap/tabledap/HakaiWaterPropertiesInstrumentProfileResearch.html

For the moment, I kept all the variables that will be available either on the provisional or research dataset. We will likely remove the flag columns from the research dataset since exclusively data flagged as GOOD is kept here. Please, review data and metadata.

Once we all agree on the variables and associated attributes, this will be carried over to the provisional dataset.

The example, regroup all the dataset and associated variables which were flagged as AV by the reviewer and associated with QARTOD Flag =1 [GOOD].

I will make a little jupyter notebook to review the conflicting results between the reviewer and QARTOD flags in the next few days.

from hakai-datasets.

JessyBarrette avatar JessyBarrette commented on July 19, 2024

I forgot to mention you can also review the generated NetCDF files which are used behind ERDDAP and accessible here:
https://goose.hakai.org/erddap/files/HakaiWaterPropertiesInstrumentProfileResearch/

The objective is to have a long-term file format that regroups all the information for each profile within a single file. This may evolve a bit over the near future, but hopefully we'll get something pretty stable soonish.

from hakai-datasets.

n-a-t-e avatar n-a-t-e commented on July 19, 2024

I switched this dataset from a view to a table that is recreated nightly. This is to avoid issues with rebuilds in the EIMS system, now it has no connection to EIMS tables (eg they could be dropped without affecting erddap). Since CTD data doesn't come in that quick anyways I figured only updating nightly won't bother people. See 5bcb6d9

from hakai-datasets.

raytula avatar raytula commented on July 19, 2024

The ERDDAP links @JessyBarrette provided are not currently working. Is that perhaps due to the delayed EIMS rebuild today?

from hakai-datasets.

JessyBarrette avatar JessyBarrette commented on July 19, 2024

@raytula sorry that was my bad the dataset is back online. FYI Goose ERDDAP restart every 15mins and this dataset is among the last ones to appears when the servers refresh.

from hakai-datasets.

n-a-t-e avatar n-a-t-e commented on July 19, 2024

@raytula sorry that was my bad the dataset is back online. FYI Goose ERDDAP restart every 15mins and this dataset is among the last ones to appears when the servers refresh.

goose ERDDAP should only restart if there are changes to datasets.xml though, checks every 15. Though on production it will just reload the one dataset

from hakai-datasets.

JessyBarrette avatar JessyBarrette commented on July 19, 2024

@jenjax2 @raytula The research CTD profile dataset is now available on goose erddap:
https://goose.hakai.org/erddap/tabledap/HakaiWaterPropertiesInstrumentProfileResearch.html

Since we're only presenting data flagged as 1 and reviewed. We may want to omit any of the flag columns.

from hakai-datasets.

JessyBarrette avatar JessyBarrette commented on July 19, 2024

This dataset is using netcdf files generated for each profiles are available here: https://goose.hakai.org/erddap/files/HakaiWaterPropertiesInstrumentProfileResearch/

Grouped by work_area/station

from hakai-datasets.

raytula avatar raytula commented on July 19, 2024

Looks good to me. Yes, no need for flag columns from the research datasets.

from hakai-datasets.

raytula avatar raytula commented on July 19, 2024

FYI. The link was working yesterday, but not today (I think)
https://goose.hakai.org/erddap/files/HakaiWaterPropertiesInstrumentProfileResearch/

Error {
    code=404;
    message="Not Found: Currently unknown datasetID=HakaiWaterPropertiesInstrumentProfileResearch";
}

from hakai-datasets.

JessyBarrette avatar JessyBarrette commented on July 19, 2024

@raytula thanks! I made a mistake last night while trying to remove the flags from the dataset.xml. Should be back on in the next 15min or so

from hakai-datasets.

JessyBarrette avatar JessyBarrette commented on July 19, 2024

@jenjax2 @raytula All station except QU39 and KC10 data was removed from the research dataset.

from hakai-datasets.

raytula avatar raytula commented on July 19, 2024

@jenjax2 @raytula All station except QU39 and KC10 data was removed from the research dataset.

Great. Thanks @JessyBarrette

from hakai-datasets.

jenjax2 avatar jenjax2 commented on July 19, 2024

Thanks @JessyBarrette !

from hakai-datasets.

JessyBarrette avatar JessyBarrette commented on July 19, 2024

This dataset is now available online. We can close that issue!

from hakai-datasets.

JessyBarrette avatar JessyBarrette commented on July 19, 2024

A DOI was added to this metadata record https://doi.org/10.21966/6cz5-6d70

This will also get added to the ERDDAP datasets

from hakai-datasets.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.