hakaiinstitute / hakai-datasets Goto Github PK
View Code? Open in Web Editor NEWHakai Datasets that are going into https://catalogue.hakai.org/erddap/
Hakai Datasets that are going into https://catalogue.hakai.org/erddap/
Below are listed all the different steps related to the initial submission of a dataset.
A more detailed written and visual description of every step is available respectively
here and here.
Dataset Development Branch Revision
The intention of this issue is to make a nutrient dataset created and used by Hayley as part of a research paper Findable and Accessible via the Hakai metadata catalogue. Similar to other synthesized/paper specific datasets, this dataset included data from multiple sources that has been aggregated and processed in ways specific a particular research project/paper.
Related examples include:
Below are listed all the different steps related to the initial submission of a dataset.
A more detailed written and visual description of every step is available respectively
here and here.
Below are listed all the different steps related to the initial submission of a dataset.
A more detailed written and visual description of every step is available respectively
here and here.
Select which NetCDF files generated by the algae explorer needs to be made available on ERDDAP
Generate DOI associated with Hakai CKAN dataset page
COMPLETED
Below are listed all the different steps related to the initial submission of a dataset.
A more detailed written and visual description of every step is available respectively
here and here.
Below are listed all the different steps related to the initial submission of a dataset.
A more detailed written and visual description of every step is available respectively
here and here.
oa
Below are listed all the different steps related to the initial submission of a dataset.
A more detailed written and visual description of every step is available respectively
here and here.
ERDDAP dataset is now available here
https://goose.hakai.org/erddap/tabledap/HakaiNearShoreStandAloneRaw.html
A test dataset IYS-chlorophyll was created a long time ago on the development branch of the hakai datasets. I'm not too sure what is the status of it. I believe we should just remove it. Just wanna make sure with you @timvdstap or @Br-Johnson
This dataset fails to produce a dataset within ERDDAP https://goose.hakai.org/erddap/status.html
See IYS-chlorophyll.xml at:
https://github.com/HakaiInstitute/hakai-datasets/blob/development/datasets/IYS-chlorophyll.xml
Below are listed all the different steps related to the initial submission of a dataset. This dataset is generated by the Department of Fisheries and Oceans Canada and is to be used as proof of concept to demonstrate integration of OBIS and CIOOS.
A more detailed written and visual description of every step is available respectively
here and here.
Below are listed all the different steps related to the initial submission of a dataset.
A more detailed written and visual description of every step is available respectively
here and here.
https://goose.hakai.org/erddap/tabledap/HakaiChlorophyllSampleProvisional.html
https://cioos-siooc.github.io/metadata-entry-form/#/en/hakai/7U7b8oPpeTN6gjvXlUCTGJr5pga2/-McQFPAf457LB4-SWmyL
Below are listed all the different steps related to the initial submission of a dataset.
A more detailed written and visual description of every step is available respectively
here and here.
CKAN: https://cioos-siooc.github.io/metadata-entry-form/#/en/hakai/tV5qE0aUgaOjSVmgPgiZ6MyHuSy1/-MsbCMOYj2L_7dgICnzw
ERDDAP: https://goose.hakai.org/erddap/tabledap/HakaiMooredTimeSeriesResearch.html
Below are listed all the different steps related to the initial submission of a dataset.
A more detailed written and visual description of every step is available respectively
here and here.
The KC buoy platform receive and is about to receive new sensors that are captured on the hakai database but still need to be made available on the different Hakai Data platforms. All the new sensors are feeding data through the the SBE16 unit mounted below the buoy.
We would need to add those respective data feeds to the following data plateforms:
Below are listed all the different steps related to the initial submission of a dataset.
A more detailed written and visual description of every step is available respectively
here and here.
Generate DOI associated with Hakai CKAN dataset page
COMPLETED
Below are listed all the different steps related to the initial submission of a dataset.
A more detailed written and visual description of every step is available respectively
here and here.
The dataset running in production is running temporary from NetCDF files produced occasionally. The last step is to connect the ERDDAP dataset directly to the database. To do this
Below are listed all the different steps related to the initial submission of a dataset.
A more detailed written and visual description of every step is available respectively
here and here.
This is a more a broader issue to clearly identify the flagging convention used and accepted within both the sensor network and EIMS. Once agreed this work will get integrated within the Jupyter Notebook data QCing tools in development.
I have been told by @jdelbel that not all the terms listed below are accepted within EIMS. @fostermh can you confirm that?
Code | Name | Comments/description | Sensor Network | EIMS | QARTOD Mapping |
---|---|---|---|---|---|
AV | Accepted value | Has been reviewed and looks good | GOOD (1) | ||
SVC | Suspicious value - caution | Value appears to be suspect, use with caution | SUSPECT (3) | ||
SVD | Suspicious value - reject | Value is clearly suspect, recommend discarding | FAIL (4) | ||
EV | Estimated value | Value has been estimated | |||
NA | Not available | No value available | MISSING (9) | ||
MV | Missing value | No measured value available because of equipment failure, etc. | UNKNOWN (2) | ||
LB | Low battery | Sensor battery dropped below a threshold | |||
CD | Calibration due | Sensor needs to be sent back to the manufacturer for calibration | |||
CE | Calibration expired | Value was collected with a sensor that is past due for calibration | |||
IC | Invalid chronology | One or more nonโsequential date/time values | |||
PV | Persistent value | Repeated value for an extended period | |||
AR | Above range | Value above a specified upper limit | |||
BR | Below range | Value below a specified lower limit | |||
SE | Slope exceedance | Value much greater than the previous value, resulting in an unrealistic slope | |||
SI | Spatial inconsistency | Value greatly differed from values collected from nearby sensors | |||
II | Internal inconsistency | Value was inconsistent with another related measurement | |||
BDL | Below detection limit | Value was below the established detection limit of the sensor | |||
ADL | Above detection limit | Value was above the established detection limit of the sensor |
Source: https://docs.google.com/spreadsheets/d/1NZcwn7zPZ-98za4HpxQH705uw3tsEgKUwQhkPl0UiS8/edit?usp=sharing
Below are listed all the different steps related to the initial submission of a dataset.
A more detailed written and visual description of every step is available respectively
here and here.
Below are listed all the different steps related to the initial submission of a dataset.
A more detailed written and visual description of every step is available respectively
here and here.
Below are listed all the different steps related to the initial submission of a dataset.
A more detailed written and visual description of every step is available respectively
here and here.
As Hakai starts to use more and more ERDDAP has a primary platform to share and make accessible data. We're hoping to provide access to some of our data to our different internal Hakai groups through some non-public ERDDAP datasets. (see #92 as a first example)
Those datasets will be generated to help the different Hakai groups to get access to their data, access the data quality and problems, and eventually either make this specific dataset available or create another data product that will be made available.
ERDDAP provides the ability to keep some datasets behind an authentication wall (see here for documentation). A number of methods can be used to authenticate. Among those the most interesting are:
The method used needs to be:
Hakai ingests processes and makes available Water Properties Vertical Profiles collected by other organizations. All those datasets are available within the Hakai database system within the same workflow as the Hakai own data.
However, that data should not be presented within the Hakai dataset and should be treated separately for each individual organization as two datasets: research and provisional.
Those datasets will follow a very similar workflow to the Hakai datasets #7 and #8.
Where those datasets should be hosted?
The development of each dataset should be made available here.
The development should each dataset should be made available here.
Below are listed all the different steps related to the initial submission of a dataset.
A more detailed written and visual description of every step is available respectively
here and here.
Below are listed all the different steps related to the initial submission of a dataset.
A more detailed written and visual description of every step is available respectively
here and here.
ERDDAP expects two-digit hour format for the Quadra BoL Research dataset however it seems like the time variable has a 1-2 digit hour format.
Fix within dataset.xml
Below are listed all the different steps related to the initial submission of a dataset.
A more detailed written and visual description of every step is available respectively
here and here.
Dataset Development Branch Revision
Add a description of the update to be completed on the dataset id.
Things to do:
HakaiPruthDockProvisional fails to provide the shifted tide data which is shifted to the average tide height
This is affecting all the variables that are using the row.columnFloat() operator in ERDDAP
<sourceName>=row.columnFloat(PruthDock:TideHeightPLS_Avg)-2.742</sourceName>
The error given is
Error {
code=500;
message="Internal Server Error: ERROR from data source: org.apache.commons.jexl3.JexlException$Parsing: gov.noaa.pfel.erddap.dataset.EDDTable.convertScriptColumnsToDataColumns:3648@1:26 parsing error in ':'";
}
This shift was applied in order to match an existing CF standard_name
Ignore the shift in the data and provide the data as is.
Below are listed all the different steps related to the initial submission of a dataset.
A more detailed written and visual description of every step is available respectively
here and here.
Here's the finalized dataset to present to the data administrator:
dev ERDDAP (there's an issue with the mooring_name variable, to review)
Dataset Development Branch Revision (Reviewer Label)
Below are listed all the different steps related to the initial submission of a dataset.
A more detailed written and visual description of every step is available respectively
here and here.
The following Hakai datasets will need to have a DOI generated. Some of them are not quite yet available within the Hakai CKAN system and would also need to be made available there:
Dataset | Hakai Metadata Record | CKAN | DOI |
---|---|---|---|
Hakai Nutrient Research | Metadata | CKAN | |
Hakai Chlorophyll Research | Metadata | CKAN | https://doi.org/10.21966/wsvt-ew96 |
Hakai Water Properties Profiles Research | Metadata | CKAN | https://doi.org/10.21966/6cz5-6d70 |
Dosser et al. (2021) Hakai Nutrients | Metadata | CKAN | https://doi.org/10.21966/j3j5-wt70 |
** This table will be updated as the different components are available.
Some submissions have been made to NCEI that are not reflected on ERDDAP
A clear and concise description of what the bug is.
Descript a possible correction to apply
Add any other context about the problem here.
The dataset running in production is running temporary from NetCDF files produced occasionally. The last step is to connect the ERDDAP dataset directely to the database. To do this
Through working through details related to the Quadra BoL research datasets, I realize that the initial dataset titles, abstracts, etc. we chose for the provisional BoL datasets are pretty technical and not necessarily user friendly.
Perhaps it would be better to reuse aspects of the NCEI dataset titles, abstracts and other metadata attributes to update and improve the provisional BoL metadata records. At a minimum, it should be very clear that the datasets include provisional data that should not be use for research purposes. We can likely keep the abstract/metadata pretty simple though, and mainly focus the provisional state and intended use of the data.
Below are listed all the different steps related to the initial submission of a dataset.
A more detailed written and visual description of every step is available respectively
here and here.
Things to do:
Below are listed all the different steps related to the initial submission of a dataset.
A more detailed written and visual description of every step is available respectively
here and here.
Generate DOI associated with Hakai CKAN dataset page
COMPLETED
Below are listed all the different steps related to the initial submission of a dataset.
A more detailed written and visual description of every step is available respectively
here and here.
ERDDAP Dataset
CIOOS Metadata form
Hakai uses the following stack filters to retrieve a fraction size measurement of the chlorophyll samples:
From those samples, the Chlorophyll-a and phaeopigments concentrations are retrieved by filtration, methanol acetone extraction, and fluorometry.
Following this protocol the two vocabularies are missing the following terms:
Completed | standard name | Canonical units |
---|---|---|
mass_concentration_of_miscellaneous_phytoplankton_expressed_as_phaeopigments_in_sea_water | kg m-3 | |
mass_concentration_of_nanophytoplankton_expressed_as_phaeopigments_in_sea_water | kg m-3 | |
mass_concentration_of_phytoplankton_expressed_as_phaeopigments_in_sea_water | kg m-3 | |
mass_concentration_of_picophytoplankton_expressed_as_phaeopigments_in_sea_water | kg m-3 | |
In Process | mass_concentration_of_phaeopigments_in_sea_water | kg m-3 |
Include the latest data from the Rivers Inlet mooring dataset.
Handle all the different metadata issues detected by the IOOS Compliance checker runner
https://cioos-siooc.github.io/erddap-compliance-runner/catalogue.hakai.org/
Below are listed all the different steps related to the initial submission of a dataset.
A more detailed written and visual description of every step is available respectively
here and here.
CIOOS Metadata Form
ERDDAP development
For quiet some times now, the limpet CTD data is now not feeding directely within the sensor network and only retrieved every few months. The data is then shared through emails accross the different groups.
As suggested by @shawn-hateley, it would be good to create a Limpet specific repository which could be use as primary location to host the data and the different information associated with the platform. This data could then be harvested by sensor-network directely and through ERDDAP.
Below are listed all the different steps related to the initial submission of a dataset.
A more detailed written and visual description of every step is available respectively
here and here.
Finalized Datasets for presentation to the data administrator:
Dataset Development Branch Revision (Reviewer Label)
Below are listed all the different steps related to the initial submission of a dataset.
A more detailed written and visual description of every step is available respectively
here and here.
Below are listed all the different steps related to the initial submission of a dataset.
A more detailed written and visual description of every step is available respectively
here and here.
Hakai Sensor Network data presents for each individual record different statistics associated to a recorded time interval. The CF convention suggests using the cell_method attribute to clearly define the time range of the cell associated to a variable and record and the statistic method used.
I believe there's a way also to describe the where within that cell the time variable correspond to (ex: start, middle, average, end)
As described within the Pruth mooring dataset
The time variable should have :
time: point (interval: 5.0 minutes comment: time correspond to the end of the interval)
While the other variables are described by following the convention
time: mean (interval: 5.0 minutes)
time: median (interval: 5.0 minutes)
This would need to be added to every sensor network datasets and potentially more details too (need to be reviewed)
More detail can be found here
The Limpet underwater platform just received a new permanent sensor which is measuring Temperature and Pressure. We will need to add this data to the ERDDAP dataset. Here's the steps:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.