inbo / data-publication Goto Github PK
View Code? Open in Web Editor NEWπ Open biodiversity data publication by the INBO
Home Page: https://ipt.inbo.be
License: Creative Commons Attribution 4.0 International
π Open biodiversity data publication by the INBO
Home Page: https://ipt.inbo.be
License: Creative Commons Attribution 4.0 International
Other than the migration of files and issues (#12), here are some other things we should do to make this repository accessible:
/occurrences
to guidelines/occurrences/
or use guidelines
and rename containing files.I think it would be a good idea to add a preferred citation to our dataset repositories on GitHub, so users know how to cite them. Here's what I did for watervogels:
Want to use this repository in a scholarly publication? You can cite it as:
Brosens D, Desmet P, Devos K (2014) Watervogels - Wintering waterbirds in Flanders, Belgium. https://github.com/LifeWatchINBO/watervogels-occurrences (accessed yyyy-mm-dd)
Note: once we agree on this, we could change the LICENSE for those repositories from MIT
(which really is for software) to CC0. That would mean everything, including the metadata on the repository also becomes CC0 (while the data paper might be licensed CC BY). An alternative is to specify which parts (data, scripts, metadata) have which license, but that is more complicated.
In the gull tracking dataset on GBIF, one or more points are positioned above Madagascar (50,98 / 0,00): http://www.gbif.org/dataset/83e20573-f7dd-4852-9159-21566e1e691e. Those points are included in the download as well (I noticed it in QGIS)
These might be outliers that should have been filtered out. I'll try to find the ID of those coordinates.
The bird tracking gulls dataset shows 1.6 million records on GBIF: http://www.gbif.org/dataset/83e20573-f7dd-4852-9159-21566e1e691e, while there are only 1.1. Might have something to do with the changed occurrenceID
s.
The dataset contains over 440,000 occurrences, recorded in 2013 by 27 GPS tags mounted on 22 Lesser Black-backed Gulls and 5 Herring Gulls breeding at the Belgian coast.
The dataset contains tracking data from 22 Lesser Black-Backed Gulls (Larus fuscus) and 5 Herring Gulls (Larus argentatus) breeding at the Belgian coast.
The birds breed at the Belgian coast in two colonies: in the port of Zeebrugge and in the city of Ostend. Their foraging range includes the West of Belgium, Northern France, the North Sea, and the English Channel. The Lesser Black-backed Gulls migrate south in winter, hibernating in the South of Spain, Portugal and Northern Africa.
Update coordinates of bounding box
Add new formation period (e.g. winter season 2013-2014)
Update end date
Most birds were trapped on their nest using a walk-in cage. In 2013 and 2014 respectively 22 and 24 ground-nesting LBBG were caught in the port of Zeebrugge and respectively 5 and 8 HG on the roof of the Vismijn in Ostend. Additionally, in 2014 one ground nesting HG was caught in the port of Zeebrugge and 3 HG were caught with a small canon net when feeding on the Visserskaai in Ostend. We took biometrics of all captured gulls (bill length, bill depth, tarsus length, wing length, and body mass) and a feather sample to determine the sex. The UvA-BiTS GPS trackers were attached to the back of the gull using a harness of Teflon tape.
This bird tracking network was funded for LifeWatch by the Hercules Foundation (http://www.herculesstichting.be/in_English/), with additional contributions from the Terrestrial Ecology Unit (TEREC) at the University of Ghent.
Note: if more species or breeding locations are added, a more substantial review of the metadata is in order
legend: mandatory, recommended/useful, optional, don't use
Current format:
Emilie Gelaude;Seth Martens;Yves Jacobs
Standard format (notice space):
Emilie Gelaude|Seth Martens|Yves Jacobs
If we really care about the localities as a dataset that could be used, I propose to standardize it as a data package. Advantages:
We can start with this creator and look at this example.
Note: data packages in a subfolder of a GitHub repository can currently not be read by the tools out there. Should we create a new repository?
The latest publication (March 6) has 3,638,726 records, while GBIF indicates 3,638,729 (3 more). This might be due to an old problem where GBIF records where not deleted, see: http://dev.gbif.org/issues/browse/POR-2343
Changes to apply:
occurrenceID
license
instead of rights
accessRights
.datasetID
(see #21)informationWithheld
recordedBy
(e.g. encoding) and identifiedBy
recordedBy
?individualCount
: x 2 because these are pairseventDate
: 22 records with 1969-03-01
(km/punt onderzoek)eventDate
: Some records from 2003 (km/punt onderzoek)samplingProtocol
(was surveys on both 5km x 5km and 1km x 1km scales and a total of 645 squares; Bird Census News 2004:1/2:35-47
): now Bird Census News 2004 1/2 p.36
or loose observations
samplingEffort
(was uncomplete 5X5 UTM survey; observation hours=29
): now {"observationHours":29}
verbatimLocality
verbatimCoordinates
verbatimCoordinateSystem
: Where UTM squares are describedverbatimSRS
lat
and long
are not switched.coordinateUncertaintyInMeters
is blank on empty coordinatesgeoreferenceProtocol
georeferenceSources
: Where UTM squares are describedgeoreferenceRemarks
georeferenceVerificationStatus
taxonRank
for non-speciesscientificNameAuthorship
for Ringtaling
and Kleine barmsijs
vernacularName
for Soepeend'
and Soepgans'
(has extra quote)All our datasets have DOIs assigned by GBIF, which link to their dataset page on GBIF. We propose the DOIs as the link for our datasets: in our resource citations and usage norms.
With the release of IPT 2.2, we can now mint our own DOIs, which would then link to the resource page on our IPT. That is arguably a better representation of the source of our datasets.
Here's what needs to happen:
legend: mandatory, recommended/useful, optional
Note: Data verified for version 14.4 on 2015-10-01. Metadata has been completely verified, updated, and republished to comply with our new guidelines regarding usage norms, recourse citation, etc.
Changes to apply for occurrence core:
occurrenceID
instead of GUID
modified
language
to en
: #25rights
to license
accessRights
datasetID
(see #18)datasetName
to Visherintroductie - Reintroduction of the fishes chub, dace, burbot, and brown trout in Flanders, Belgium
recordedBy
verbatimDate
: it is exactly the same as eventDate
sex
: #27verbatimCoordinateSystem
to Belgian Lambert 72
: #31verbatimSRS
to Belgian Datum 1972
: #31electrofishing
(one word) in samplingProtocol
: #28identifiedBy
to Tom Van den Neucker
: it is correctdateIdentified
taxonRank
: are those all species?vernacularName
Changes to apply for measurement extension:
measurementType
to use lowercase: #32modified
Questions:
Does Tom know what sampling effort was used to catch the fish?
Dimitri Brosens 0000-0002-0846-9116
Peter Desmet 0000-0002-8442-8025
Tim Adriaens 0000-0001-7268-4200
Willem Bouten 0000-0002-5250-8872
Anny Anselin
Bert Van Der Krieken
Dirk Maes
Eric Stienen
Filiep T'Jollyn
Gerlinde Van Thuyne
Francisco Hernandez
Frederic Piesschaert
Gilles San Martin
Glenn Vermeersch
Hendrik Devriese
Hugo Verreycken
Jan Breine
Jan GabriΓ«ls
Jan Stevens
Jorg Lambrechts
Koen Devos
Koen Lock
Kris Decleer
Luc Lens
Marc Herremans
Tom De Boeck
Tom Van Den Neucker
From:
Electro Fishing
unknown
To:
electrofishing
(leave blank)
Is currently set to undetermined
for all record. Decided to leave blank in these cases.
A bunch of occurrence records (around one lake) fall outside the bounding box defined in the metadata: http://www.gbif.org/dataset/823dc56e-f987-495c-98bf-43318719e30f Are these correct? Should we expand the bounding box?
We currently populate datasetID
with something like: http://dataset.inbo.be/vis-inland-occurrences. This links to the resource on our IPT. This however, is not the dataset key used in the GBIF portal and through the GBIF API, which is 823dc56e-f987-495c-98bf-43318719e30f
. Should we change to this GBIF UUID?
Changes to apply:
occurrenceID
instead of GUID
license
instead of rights
accessRights
.datasetID
(see #21)recordedBy
: uses pipesindividualCount
individualCount
georeferenceProtocol
: http://git.io/vvDVR.georeferenceProtocol
in metadata.mdgeoreferenceProtocol
in emlgeoreferenceSources
: http://git.io/vvDVL.georeferenceSources
in metadata.mdgeoreferenceSources
in emlgeoreferenceSources
in blog postidentifiedBy
: uses pipesdateIdentified
.taxonRank
: are those all species?See issue #2
See also this IPT issue
From:
verbatimCoordinateSystem: Lambert 72
verbatimSRS: BD72
To:
verbatimCoordinateSystem: Belgian Lambert 72
verbatimSRS: Belgian Datum 1972
It seems verbatimLocality is concatenated in the SQL:
Zeeploop - stroomopwaarts hoekpaal wei coniferenhaag
Just like for VIS, maybe we could provide waterBody
as well and have verbatimLocality concatenated with ,
(see https://github.com/LifeWatchINBO/vis-inland-occurrences/issues/4)
waterBody: Zeeploop
verbatimLocality: Zeeploop, stroomopwaarts hoekpaal wei coniferenhaag
Note: Data verified for unpublished version on 2015-10-06. Metadata has been not yet been updated (waiting for gull data paper).
Changes to apply:
datasetID
(once issued)Questions:
Is the datasetName
OK?
What do we choose as organismID
? Ideally it should be a code that is used internationally and across projects. For gull
we chose ring_code
. https://github.com/LifeWatchINBO/bird-tracking-wmh-occurrences/issues/6 Candidates:
```
device_info_serial: 586
bird_name: Mia
color_ring_code: -
ring_code: H173481
```
Do all the fields in bird_tracking_devices
make sense in the context of wmh?
Do outlier criteria make sense for wmh?
There is some confusion on who to include where for personnel, especially the difference between principal investigator and point of contact.
There is some good documentation in this issue.
Also, review datasets to include the proper fields.
Note: Data verified for version 9.2 on 2015-10-07. Metadata has been completely verified, updated, and republished to comply with our new guidelines regarding usage norms, recourse citation, etc.
Changes to apply:
occurrenceID
instead of GUID
rights
to license
accessRights
datasetID
(see #18)estuary
for habitat
individualCount
individualCount
verbatimCoordinateSystem
to Belgian Lambert 72
: #31verbatimSRS
to Belgian Datum 1972
: #31taxonRank
: are those all species?Questions:
bibliographicCitation
? It might unnecessarily increase the size. Note: the data paper DOI is mentioned in the resource citation.Since we started using the IPT 2.2 and republished a dataset, the source website
appears twice on GBIF:
This is already the case for:
The IPT .rtf
output, the VASCAN datapaper, our datapapers, and regular papers all have a different order for the sections. I think we should settle for an order that we think is the most logical and use that from now on.
Here's the current order we use, based on the VASCAN paper:
Title
Authors
Affiliations
Corresponding author
Review dates
Citation
Resource citation
Abstract
Keywords
Data published through
Project details
Project title
Personnel
Funding
Study area description (default in IPT = short, while in paper this is long)
Design description
Purpose
Additional information (default = extra data, such as measurements)
Taxonomic coverage
*No title: description of taxonomic coverage*
Taxonomic ranks
Common names
Spatial coverage
*No title: description of spatial coverage*
Bounding box for covered area
Temporal coverage
Sampling methods
Study extent description (default in IPT = longer, while in paper this is short)
Sampling description
Quality control description
Method step description
Dataset
*No title: Description of dataset*
*No title: Norms*
*No title: List with characteristics*
Suggestion citation for the latest version of the dataset
External datasets (we don't use this)
Acknowlegdements
References
@DimEvil, here's my suggestion for the titles of the invasive datasets: Grouping name - Vernacular name (scientific name) in Flanders, Belgium
. At first I thought of using IAS
as a grouping name, but a Google search on this doesn't reveal much. I think Invasive species
is more informative.
invasive-duck-occurrences
invasive-bullfrog-occurrences
invasive-crab-occurrences
invasive-muntjac-occurrences
invasive-raccoon-occurrences
invasive-geese-occurrences
FYI, the original format was Invasive Oxyura jamaicensis - Ruddy duck occurrences in Flanders
.
Is this OK for you?
All occurrence records seem to fall outside the bounding box defined in the metadata: http://www.gbif.org/dataset/0ac24b3c-feb9-48d5-bf02-da4a103f024e. One of the two is incorrect, probably the metadata. This Yser dataset has a correct bounding box: best just copy that one?
Do we have coordinates for the kilometerhokken
. If not, I need to update some location fields.
Note: Data verified for version 9.2 on 2015-10-07. Metadata has been completely verified, updated, and republished to comply with our new guidelines regarding usage norms, recourse citation, etc.
Changes to apply to the data:
occurrenceID
instead of GUID
rights
(just keep license
)accessRights
(non GitHub)datasetID
(see #18)individualCount
individualCount
verbatimCoordinateSystem
to Belgian Lambert 72
: #31verbatimSRS
to Belgian Datum 1972
: #31taxonRank
: are those all species?Questions:
bibliographicCitation
? It might unnecessarily increase the size. Note: the data paper DOI is mentioned in the resource citation.In the extension, please use:
weight
length
for measurementType
.
The two VIS datasets currently do not have a DOI issued by GBIF. One option - just like for Florabank - would be to give it the DOI of the datapaper (http://doi.org/10.3897/zookeys.475.8556). The difference would be that the two datasets would then share the same data paper DOI.
@DimEvil, thoughts? If OK, I'll contact GBIF.
These grids are missing coordinates:
31UFS8674 90 km hok
31UFS88B 1 atlas hok
31UFT2107 4 km hok
31UFT49B 52 atlas hok
Suggestion by @DimEvil:
Now we publish only the active areas! What about publishing the non active areas as an historical dataset. We should be able to georeference most of the localities... (UTM5). It's about 31000 records.
Suggestion by @timrobertson100: something like 891-20150301112532
(device info serial and date time) has the advantage that it is recognizable and sortable, which a hash is not.
This issue groups metadata issues of the old repository and includes some new ones.
Changes to apply:
Questions:
In data, use en
See this tutorial. Steps:
accessRights
to data mapping and populate with: http://www.inbo.be/en/norms-for-data-use
license
to data mapping and populate with: http://creativecommons.org/publicdomain/zero/1.0/
rights
in the data mapping.Remove the dataset labels for which:
IPT is now providing an automatic citation, which we will use from now on. Don't forget to add the DOI as the Resource citation identifier. The result should be this:
Vermeersch G, Anselin A, Devos K, Herremans M, Stevens J, GabriΓ«ls J, Van Der Krieken B, Brosens D, Desmet P (2014): Broedvogels - Atlas of the breeding birds in Flanders 2000-2002. v1.5. Research Institute for Nature and Forest (INBO). Dataset/Occurrence. https://doi.org/10.15468/sccg5a
Notes:
Example for a resource citation with data paper:
Breine J, Verreycken H, De Boeck T, Brosens D, Desmet P (2013): VIS - Fishes in estuarine waters in Flanders, Belgium. Research Institute for Nature and Forest (INBO). Dataset/Occurrence. https://doi.org/10.15468/estwpt Data paper: https://doi.org/10.3897/zookeys.475.8556
Install IPT 2.1 on:
Bugs and issues to verify by @DimEvil and @peterdesmet
occurrenceID
collectionCode
Note: this issue was first reported on July 25, 2014
I noticed a massive drop in the temporal precision of our data, that can only be contributed to a change in this (big) dataset.
We indeed started to use ISO 8601 date ranges about a year ago and it seems that GBIF has problems interpreting those: 2004-05-01/2004-05-31
is interpreted as having no date, while it should be interpreted as 2004-05
with a precision up to a month
. See this record for an example.
Issue recorded for GBIF at http://dev.gbif.org/issues/browse/POR-2339
Some occurrence records fall out the bounding box defined in the metadata: http://www.gbif.org/dataset/b2d0f29e-4614-4001-93c8-f651878a86d2
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.