GithubHelp home page GithubHelp logo

gbif-api's Introduction

GBIF API

The GBIF API library provides:

  • The model objects used by the GBIF service interfaces and the internal messaging systems
  • Enumerations representing standardized vocabularies (country codes, databased enumerations etc)
  • The Java interface definitions for the public GBIF API (note: each implementation is responsible for mapping to the RESTful URL)
  • Utilities to simplify common operations when working with model objects (JSON serialization, filtered iterators etc)

To build the project

mvn clean install

Policies

  • All changes must go to the dev branch for testing before merging to master.
  • A pre-commit peer review on all commits, ideally referencing the review in the commit message. Simple changes can be committed without review.
  • All commits must reference a GitHub issue to which they relate
  • PR are preferred for complex functionality. Please target the dev branch.

Dev and master versions must be different to avoid issues with many work-in-progress tasks. When master version is released increment patch version, when released version is merged with development, increment minor version manually.

Example of releasing dev branch:

  • Current dev and master versions
    1) dev version    - 1.7.0-SNAPSHOT
    2) master version - 1.6.0-SNAPSHOT
    
  • Merge changes into master
    1) dev version    - 1.7.0-SNAPSHOT
    2) master version - 1.7.0-SNAPSHOT
    
  • Release master and increment patch version
    1) dev version    - 1.7.0-SNAPSHOT
    2) master version - 1.7.1-SNAPSHOT
    
  • Merge changes into dev
    1) dev version    - 1.7.1-SNAPSHOT
    2) master version - 1.7.1-SNAPSHOT
    
  • Bump dev minor version
    1) dev version    - 1.8.0-SNAPSHOT
    2) master version - 1.7.1-SNAPSHOT
    

Example of releasing a fix for master branch:

  • Current dev and master versions
    1) dev version    - 1.8.0-SNAPSHOT
    2) master version - 1.7.1-SNAPSHOT
    
  • Release master and increment patch version
    1) dev version    - 1.8.0-SNAPSHOT
    2) master version - 1.7.2-SNAPSHOT
    
  • Merge changes into dev, without version bumping
    1) dev version    - 1.8.0-SNAPSHOT
    2) master version - 1.7.2-SNAPSHOT
    

Change Log

Change Log

Documentation

gbif-api's People

Contributors

aalbatross avatar ahakanzn avatar ansell avatar cgendreau avatar fmendezh avatar gbif-jenkins avatar marcos-lg avatar mattblissett avatar mdoering avatar mike-podolskiy90 avatar muttcg avatar omeyn avatar timrobertson100 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gbif-api's Issues

Make GBIF API compatible with semantic web a.k.a. Linked Open Data (LOD)

As you probably know, LOD is a way to create semantic links between datasets, with relatively standard auto-describing vocabularies.
As with ordinary web API's, this is also a way to decouple the data and the application(s).
So, connecting GBIF data to standard LOD syntaxes and vocabularies can lead to many unforeseen use cases. My current first use case is to make maps with both my LOD botanical observations and GBIF observations.

I started mapping the GBIF JSON to standard LOD vocabularies in this project, using JSON-LD technique (see also https://json-ld.org/ ), which provides a LOD "reading grid" for the plain JSON data:
https://github.com/jmvanel/rdf-convert/tree/master/gbif.org
I hope I have your blessing.

To make things smoother technically, there are a few things that you could do.

  • The more annoying is that most id's in your JSON are integers; it would be more convenient for JSON-LD tools if they were strings.
  • Also, you could add new JSON-LD keys in the API : @id , @type .
  • After that, you could expose the JSON-LD side of your JSON URL's by one of the ways here:
    https://www.w3.org/TR/json-ld/#modifying-behavior-with-link-relationships
    or simply add an "@context" key in the JSON.

I hope I made an issue in the relevant github project .

NOTE: I also wrote a translator from GBIF DW archive to RDF (Turtle) :
https://github.com/jmvanel/dwca-rdf
but having RDF URL's like I propose here is more in line with the Semantic Web architecture.

Extend Rank enum with ZooBank ranks

ZooBank as the zoological nomenclator contains a few very rare ranks that we should nevertheless add to our rank enumeration:

altera: Cyclopterus liparis altera minor Fabricius, 1780

type: Serrivomer sector type brevidentatus Roule & Bertin, 1929
-> Serrivomer sector typ. longidentatus Roule & Bertin, 1929

facies: Salmo trutta major facies rhodanensis Roule, 1923

Changes to support building with Java-9

A few minor changes needed to support Java-9, similar to other packages. Includes the motherpom update, and a few test changes to replace ClassLoader.class with getClass()

Decide on the future of unused continent related occurrence remarks

We currently have 3 occurrence remarks related to continent that are not used (based on Solr facets):

  • CONTINENT_COUNTRY_MISMATCH
  • CONTINENT_INVALID
  • CONTINENT_DERIVED_FROM_COORDINATES

Considering there is no official list of continents, we could consider deprecating CONTINENT_COUNTRY_MISMATCH and CONTINENT_DERIVED_FROM_COORDINATES.

CONTINENT_INVALID could be kept to flag values that can not be matched against the dictionary.

What and how to return Occurrence eventDate?

Please see the comment below with a proposal for changes.

The comment below this line is unchanged since 2016.


The goal of this issue is to document and collect feedback about the future response of the API related to the date to which an occurrence record occurred. This is strictly related to the response of the API.

Supported date ISO formats

The following ISO date formats, based on the granularly provided by the data publisher, will be returned by the API:

Pattern Example value
yyyy 2016
yyyy-mm 2016-03
yyyy-mm-dd 2016-03-05
yyyy-mm-dd'T'hh:mm:ss 2016-03-05T13:03:07

Date intervals using the formats above separated with a slash (/): 2016-03-05/2016-03-06.

How should it be returned by the API?

Option 1 - Update/reuse the current eventDate

This option consists in reusing the current content of the JSON field eventDate returned by the API to return all supported ISO format dates.
Following this change, the eventDate could contains values like:

"eventDate":"2012-03/2012-04"
"eventDate":"2012-03-01/2012-04-02"
"eventDate":"2012-03-01"
"eventDate":"2012-03"
"eventDate":"2016-03-05T13:03:07"

Option 2 - introduce new field(s)

This option consists in adding new field(s) to the JSON response.
For example :

"isoEventDate":"2012-03/2012-04"

The current eventDate could be one of the following:

  • maintained but always empty
  • maintained and provided only for non date range
  • maintained and always filled (for date ranges it would contains the start of the range)

What to do with year, month, day

This is an open question for both options.

The different options are:

  • remove them to avoid confusion
  • maintain them but provide values only for non date range
  • maintain them and always return a value(for date ranges it would contains the start of the range)

API number of results limit by default

I am having problems with de API´s calling to obtain information on the datasets published by GBIF Colombia's node. It currently has a limit of 1000 records by default and that number of datasets was exceeded a few weeks ago ;), I would like to know if you can change the default value since the "limit" parameter is being overwritten such as API documentation describe, or maybe give me some advice about how I can get this information otherwise.

The API calling is:
http://api.gbif.org/v1/node/7e865cba-7c46-417b-ade5-97f2cf5b7be0/dataset?limit=10000

Move InterpretationRemarkSeverity from occurrence-common

In relation of Issue #5
The gbif-api should also define the default "severity" of each interpretation remarks.

The severity of an interpretation remarks can be different depending on the context. Here, we only want to define the default severity.

Support very long (≥8k) search requests

Users would like to search using long polygons or many taxon keys. To support this with the current search API, a long (>8k character) URL must pass through:

  1. The user's web browser or other client
  2. Potentially a not-very-good proxy (corporate or education filter etc)
  3. Varnish
  4. a. occurrence-ws
    b. vectortile-ws / mapnik-server
  5. SOLR

4.a. is easily fixed for gbif-microservice, 4.b. can be fixed for Dropwizard with

  applicationConnectors:
    - type: http
      port: 7001
      maxRequestHeaderSize: 1MiB

although there are then issues somewhere in Jersey's regex handling.

  1. is probably OK since using HTTPS should avoid most proxies from modifying the request

  2. requires regexes in Varnish to use .*? rather than .* for the maps rules, and there's a related note in Varnish saying needing this is "madness".

That leaves 1. That's a concern from a Jetty developer suggesting all of this is a bad idea, for compatibility and security.

So we need some way to communicate the search terms without using >8kiB, at least for website and API. We could:

  • Use POST and cache POST requests in Varnish
    • No longer possible to share URLs, easily switch between website and API, etc, but the reasonable length limit is very high
  • Use POST requests to get a key (time limited?), presumably stored in a database somewhere, which maps the key to the search string.
    • Allows sharing etc, but adds more complexity
  • Compress the search parameters in the URL:
    • xz compression followed by Base64 encoding reduces a 11.5kiB string (polygon of Brandenburg) to 2.5kiB
    • not an incredible saving
  • Make a protocol buffers format for encoding the query
    • A quick try, using part of geobuf.proto for the geometry then base64 encoding, uses 5.8kiB
    • even less of a saving

Introduce Occurrence isoEventDate field(s)

The current Occurrence eventDate field is a Java Date field and it can not always represent the value that was provided. For example, the value "2013-02" is a valid ISO date but can not be stored in a Date field without introducing a false precision: "2013-02-01:00:00:00". The issue at this point is that we have no mechanism to return to the user the date at the same level of granularity that was originally provided.

I suggest we introduce fields isoEventDateStart and isoEventDateEnd to store the rendered ISO date in String. The start and end concept will be used to support date ranges.

field "issue" not returned in species API

Searching occurrences by issue via API returns field issues (see example with issue DEPTH_UNLIKELY) as expected. The same is not true while searching names/species (see example1 with issue NO_SPECIES and example2 with issue ORIGINAL_NAME_DERIVED). The data are clearly well filtered, but it would be nice to get the field issues as for occurrences as well.
Or am I missing something? Thanks.

Add occurrence count from latest harvest to endpoints in /dataset

As in subject. This would be immensely useful for me to help prevent unnecessary generation of DwC-As of all GBIF specimen-based occurrences if I can first discover any datasets that have dropped a significant number of records since I last downloaded them. For example, it appears https://www.gbif.org/dataset/4ce8e3f9-2546-4af1-b28d-e2eadf05dfd4 has mistakenly dropped half its 4.5M occurrences from its DwC-A at some time between today and two weeks ago. I have been in touch with Niels Klazenga to see if he can get them restored. And then I'd generate yet another 65GB DwC-A file. Incidentally, I am faced with 12hr+ download times for such files. If however I create a Droplet on DigitalOcean in Amsterdam to download the file before hopping the pond, I can get it to my machine in NA in approx. 1.5hrs.

Enumeration for literature topics

Is there an enumeration available through the REST API which contains the topics applied to literature monitored by the GBIF DOI tracking programme? View source on the web UI (https://www.gbif.org/resource/search?contentType=literature) shows a reference to enums.cms.vocabularyTerms.topics but I can't see this listed under https://gbif.github.io/gbif-api/apidocs/org/gbif/api/vocabulary/package-summary.html - I assume only those listed here are available through the API access to enumerations: https://www.gbif.org/developer/summary#enumerations

Revise NomenclaturalStatus

The NomenclaturalStatus enumeration is missing an entry for a correct/valid name and uses confusing terminology in some places. There should be 2 clear entries for a correct name and a validly published/available name.

As botanical and zoological terminology is using the same term valid for different things we should avoid it.

Botanical / Zoological terms:

  • correct (incorrect) / valid (invalid)
  • valid (invalid) / available (unavailable)

Suggest to use:

  • CORRECT & INCORRECT
  • AVAILABLE & UNAVAILABLE

All other more specific status values should have two methods to indicate whether that more specific status is correct and available in the above sense: isCorrect(), isAvailable()

Search for occurrences without date isn't an option

Search for occurrences without date isn't an option

total 703,245,140

592,565,753 occ before 2020
(zero invalid and zero unlikely)

56,748,687 recorded date invalid or unlikely or mismatch
(8.6 mil mismatch duplicates with before 2020 ignored)

592 mil + 56 mil is far from 703 mil. This I assume is the occurrences without date at all. But wouldn't it be practical to have them searchable as well?

I guess one could argue, that if you want a date you set a filter. And if you are a publisher you can look at your own data. Whereas the other issues has to do with our processing. But then again we add no_taxon_match when no taxon is provided. and flag occurrences without coordinates. And our issue filters are targeting the publishers in the first place.


fbitem-4cd51ab3c35a7b3305c9ec1422d26bc93a21a482
Reported by: @MortenHofft
System: Chrome 55.0.2883 / Mac OS X 10.10.5
Referer: https://demo.gbif.org/occurrence/search?issue=RECORDED_DATE_INVALID&issue=RECORDED_DATE_UNLIKELY&issue=RECORDED_DATE_MISMATCH

Accept old and new forms for EML userId

Following this issue on the EML repository, it looks like when the IPT and registry/GBIF API were written there was no guidance on how to format an ORCID in the EML.

The related commit on EML for v2.2 (not yet released) suggests it should be like this:

<userId directory="https://orcid.org/">https://orcid.org/0000-0003-0623-6682</userId>

But the IPT generates it like this:

<userId directory="http://orcid.org/">0000-0003-0623-6682</userId>

We will need to handle both, and probably also the mixture:

<userId directory="http://orcid.org/">https://0000-0003-0623-6682</userId>

Improve Interpretation remarks definition

Interpretation remarks are also known as "issues" (e.g. OccurrenceIssue).

Currently for occurrences this information is defined in InterpretationRemarksDefinition.

In order to make the definition more explicit we should bring it to the gbif-api project. The same mechanism is also required to identify remarks in ChecklistBank.

The idea would be to have an interface like the following:

public interface InterpretationRemark {
  Set<Term> getRelatedTerms();
}

OccurrenceIssue and NameUsageIssue would then implement it.

Representing imprecise dates in the Java object model

There's some similar discussion on #2, for occurrences.

The motivation is to fix gbif/portal-feedback#1676 and similar issues around dates in metadata properly.

The pubDate and temporalCoverage on a Dataset are very often only given as a year, and we should retain that. Changing the JSON response is straightforward enough, I now have:

"pubDate": "2016",

"temporalCoverages": [{
    "@type": "range",
    "start": "2013",
    "end": "2015"
}],

instead of the current

"pubDate": "2015-12-31T23:00:00.001+0000",

"temporalCoverages": [{
    "@type": "range",
    "start": "2012-12-31T23:00:00.001+0000",
    "end": "2014-12-31T23:00:00.001+0000"
}],

I could fix only the time zone issue of pubDate, and leave the 1 extra millisecond which seems to be an undocumented way to say this is a year-precision date. However, that still leaves the end of the range one year too soon — though I suppose it could be serialized as 2015-12-31T00:00:00.001+0000.

Anyway, for this I used a TemporalAccessor, since it can represent a Year, YearMonth, LocalDate (=YMD) etc. However, there's a strong warning against using this class since it can also represent things like JapaneseDate which break the usual assumptions we have about ISO dates. It's also a bit cumbersome to use — to get the year means checking it holds a year, then requesting it. So, it's much looser than we require.

I think what @mdoering wrote in #2, of creating an IsoDate class, makes most sense. This would only represent a date (not a date range), either as a year, year and month, or year month and day. I think it would be a fairly simple wrapper around Year, YearMonth, and LocalDate, e.g. returning the most precise available, or fetching a year. It can serialize into a single, ISO 8601 format field of 4, 6 or 8 digits.

I'm not too concerned with @cgendreau's concerns. Deserialization of three well-defined formats (YYYY, YYYY-MM and YYYY-MM-DD) is easy and fast. (Faster than deserializing an ISO date!)

What does everyone else think?

  1. One millisecond hack to say it's only the year
  2. TemporalAccessor
  3. New IsoDate class
  4. Something else

The Varnish logs suggest two regular users of the Java API for this endpoint. (Plazi and GBIF Japan.)

gbif api (and portal?) should tell about usage of extensions

I just discovered the regular API calls are not exposing concepts published through the usage of extensions (or at least, some extensions).

An example from a taxon checklist extended with vernacular names, species distribution, literature References and MeasurementOrFact extensions:

The portal entry overview tab shows at least some of those linked concepts (for some reason, only the distribution data but nothing else): https://www.gbif.org/species/164940120
But the equivalent API normal call doesn't show anything: https://api.gbif.org/v1/species/164940120

If you go to the verbatim versions, then no problem to find all linked data:
Portal verbatim tab: https://www.gbif.org/species/164940120/verbatim
API verbatim call: https://api.gbif.org/v1/species/164940120/verbatim

Of course a user who is browsing the portal overview tab (the default one), can always click the verbatim tab in order to see if there could be some extended information in there. In my opinion, there should be a visual notification in the page, to tell user about those additional extensions data available when clicking the verbatim tab.

But when you are programming a web app which uses the API, this notification is much more important: it would make sense to have some API flag notifying about that (something like extensionsUsage: boolean, or even better usedExtensions: [array of extensions], for example).
With such a notification, the app could decide wether or not to make a second API call to retrieve the verbatim data (only if those extensions are relevant for what the app needs to show).
Otherwise, the only way to retrieve those extended data is to always make verbatim calls. Which in most cases would be an useless bandwith waste.

@abubelinha

Any place where api's response structures are explanained?

Hello!

I'm looking for GBIF API's responses documentation. If this is not the place to ask, please, address me to the correct one.

I'm currently working with the GBIF API, and I don't understand some of the responses. For example, when asking for dataset metrics GET /v1/dataset/d7dddbf4-2cf0-4f39-9b2a-bb099caae36c/metrics I get that response:

{
    "key": 210029,
    "datasetKey": "d7dddbf4-2cf0-4f39-9b2a-bb099caae36c",
    "usagesCount": 5858142,
    "synonymsCount": 2361744,
    "distinctNamesCount": 5844412,
    "nubMatchingCount": 3805144,
    "colMatchingCount": 2845782,
    "nubCoveragePct": 64,
    "colCoveragePct": 48,
    "countByConstituent": ,
    "countByKingdom": {
        "ANIMALIA": 1529953,
        "PLANTAE": 523372,
        "FUNGI": 201853,
        "CHROMISTA": 15357,
        "BACTERIA": 10847,
        "PROTOZOA": 7063,
        "INCERTAE_SEDIS": 787,
        "ARCHAEA": 439,
        "VIRUSES": 119
    },
    ...
}

What is the meaning of properties such as nubMatchingCount, colMatchingCount, nubCoveragePct, etc?? In the website I can see the API documentation for making request, not explaining responses though.

I will appreciate any help Thanks!

Add citations to registry API for datasets

Since you're now tracking citations, it would be nice to obtain these via a GET to /dataset/{UUID}/citations. This need not be particularly verbose, perhaps nothing more than an array of DOIs because metadata for these can be had elsewhere such as Crossref's API.

Clearly specify how geometry polygons wrapping a pole or crossing the antimeridian should be specified

A polygon can cross the antimeridian (e.g. Fiji) or cross the antimeridian when wrapping around the north or south pole (e.g. around the Arctic).

I need to investigate a bit more, but either WKT doesn't really define how these polygons should be specified, or there are conflicting implementations.

Within this API project, we validate WithinPredicates using the JTS library, which lacks support. Some ways of specifying the polygon happen to work.

We should work out what we want to support, and convert as necessary to our backend systems.

As an example of the differences between implementations, locationtech/spatial4j#46 says:

Strict interpretation of WKT/OGC rules refer to the "right-hand-rule" which basically says the outer shell is counter-clockwise order. Based on that Spatial4j has a bug and your test shows it. WKT/OGC, AFAIK, also don't talk about the notion of a dateline. There is a special case in Spatial4j that when given a 4-point rectangular polygon (NOT multipolygon), then it is processed according to the right hand rule. And hence your test will pass if the shape is trivially converted. But I'm planning on removing that or making it optional in the next release because numerous people have complained, even if it meets specs.

Spatial4j does it this way because it seemed a real-world polygon wouldn't have longitudinal points >= 180 degrees apart…
…yet it's reasonable to search GBIF based on a polygon around Eurasia.

(Spatial4J is a derivative of the JTS library we use.)

This is the secondary issue from https://dev.gbif.org/issues/browse/POR-3042/ , which is a polygon of the Arctic. It's a reasonable polygon, but doesn't validate with JTS. It can be made into a rectangle -180 90…180 90,-180 90 which does validate, but we either need to do this ourselves, or clearly document it.

Change GADM search API

We've found some limitations with the initial, basic GADM search implementation.

One is an inconvenience. A query "the island of Ireland" seems very reasonable, but because different query terms are ANDed (and multiple values for the same term are ORd) this is the necessary query — every appropriate Level 1 GID:

https://api.gbif.org/v1/occurrence/search?gadmLevel1Gid=IRL.1_1&gadmLevel1Gid=IRL.2_1&gadmLevel1Gid=IRL.3_1&gadmLevel1Gid=IRL.4_1&gadmLevel1Gid=IRL.5_1&gadmLevel1Gid=IRL.6_1&gadmLevel1Gid=IRL.7_1&gadmLevel1Gid=IRL.8_1&gadmLevel1Gid=IRL.9_1&gadmLevel1Gid=IRL.10_1&gadmLevel1Gid=IRL.11_1&gadmLevel1Gid=IRL.12_1&gadmLevel1Gid=IRL.13_1&gadmLevel1Gid=IRL.14_1&gadmLevel1Gid=IRL.15_1&gadmLevel1Gid=IRL.16_1&gadmLevel1Gid=IRL.17_1&gadmLevel1Gid=IRL.18_1&gadmLevel1Gid=IRL.19_1&gadmLevel1Gid=IRL.20_1&gadmLevel1Gid=IRL.21_1&gadmLevel1Gid=IRL.22_1&gadmLevel1Gid=IRL.23_1&gadmLevel1Gid=IRL.24_1&gadmLevel1Gid=IRL.25_1&gadmLevel1Gid=IRL.26_1&gadmLevel1Gid=GBR.2_1

A similar case, "Andalucia and Gibraltar", cannot be done:

https://api.gbif.org/v1/occurrence/search?gadmLevel1Gid=ESP.1_1&gadmLevel0Gid=GIB

Gibraltar doesn't have level 1 subdivisions, so there's no way to make a suitable OR-query.

Secondly, allowing queries by name will probably lead to confusion. Bolívar is the name of 30 level 2 areas, in 6 countries: https://api.gbif.org/v1/occurrence/search?gadmLevel2Name=Bol%C3%ADvar&limit=0&facet=gadmLevel0Name plus additional level 1 and 3 areas.


Proposal:

  1. Just remove the name parameters. Allowing their use is too fragile. We don't accept country=Ireland anyway.

  2. An additional search parameter gadmGid, so the above queries can be represented like this:

https://api.gbif.org/v1/occurrence/search?gadmGid=IRL&gadmGid=GBR.2_1

https://api.gbif.org/v1/occurrence/search?gadmGid=ESP.1_1&gadmGid=GIB

Question 1

This still leaves the facets. Do we still need gadmLevel1Gid etc terms to return in facets, or should something else be done?

Working in a similar way to searching taxa would suggest we have both, as we can already facet on both kingdomKey, phylumKey etc and taxonKey.

To help anyone who can ponder this, these are the GADM areas for Ireland + Northern Ireland:
gid_0 gid_1 gid_2 gid_3 name_0 name_1 name_2 name_3
GBR GBR.2_1 GBR.2.10_1 GBR.2.10.1_1 United Kingdom Northern Ireland Newry, Mourne and Down Down
GBR GBR.2_1 GBR.2.10_1 GBR.2.10.2_1 United Kingdom Northern Ireland Newry, Mourne and Down Newry and Mourne
GBR GBR.2_1 GBR.2.11_1 GBR.2.11.1_1 United Kingdom Northern Ireland North Down and Ards Ards
GBR GBR.2_1 GBR.2.11_1 GBR.2.11.2_1 United Kingdom Northern Ireland North Down and Ards North Down
GBR GBR.2_1 GBR.2.1_1 GBR.2.1.1_1 United Kingdom Northern Ireland Antrim and Newtownabbey Antrim
GBR GBR.2_1 GBR.2.1_1 GBR.2.1.2_1 United Kingdom Northern Ireland Antrim and Newtownabbey Newtownabbey
GBR GBR.2_1 GBR.2.2_1 GBR.2.2.1_1 United Kingdom Northern Ireland Armagh, Banbridge and Craigavon Armagh
GBR GBR.2_1 GBR.2.2_1 GBR.2.2.2_1 United Kingdom Northern Ireland Armagh, Banbridge and Craigavon Banbridge
GBR GBR.2_1 GBR.2.2_1 GBR.2.2.3_1 United Kingdom Northern Ireland Armagh, Banbridge and Craigavon Craigavon
GBR GBR.2_1 GBR.2.3_1 GBR.2.3.1_1 United Kingdom Northern Ireland Belfast Belfast
GBR GBR.2_1 GBR.2.4_1 GBR.2.4.1_1 United Kingdom Northern Ireland Causeway Coast and Glens Ballymoney
GBR GBR.2_1 GBR.2.4_1 GBR.2.4.2_1 United Kingdom Northern Ireland Causeway Coast and Glens Coleraine
GBR GBR.2_1 GBR.2.4_1 GBR.2.4.3_1 United Kingdom Northern Ireland Causeway Coast and Glens Limavady
GBR GBR.2_1 GBR.2.4_1 GBR.2.4.4_1 United Kingdom Northern Ireland Causeway Coast and Glens Moyle
GBR GBR.2_1 GBR.2.5_1 GBR.2.5.1_1 United Kingdom Northern Ireland Derry and Strabane Derry
GBR GBR.2_1 GBR.2.5_1 GBR.2.5.2_1 United Kingdom Northern Ireland Derry and Strabane Strabane
GBR GBR.2_1 GBR.2.6_1 GBR.2.6.1_1 United Kingdom Northern Ireland Fermanagh and Omagh Fermanagh
GBR GBR.2_1 GBR.2.6_1 GBR.2.6.2_1 United Kingdom Northern Ireland Fermanagh and Omagh Omagh
GBR GBR.2_1 GBR.2.7_1 GBR.2.7.1_1 United Kingdom Northern Ireland Lisburn and Castlereagh Castlereagh
GBR GBR.2_1 GBR.2.7_1 GBR.2.7.2_1 United Kingdom Northern Ireland Lisburn and Castlereagh Lisburn
GBR GBR.2_1 GBR.2.8_1 GBR.2.8.1_1 United Kingdom Northern Ireland Mid and East Antrim Ballymena
GBR GBR.2_1 GBR.2.8_1 GBR.2.8.2_1 United Kingdom Northern Ireland Mid and East Antrim Carrickfergus
GBR GBR.2_1 GBR.2.8_1 GBR.2.8.3_1 United Kingdom Northern Ireland Mid and East Antrim Larne
GBR GBR.2_1 GBR.2.9_1 GBR.2.9.1_1 United Kingdom Northern Ireland Mid Ulster Cookstown
GBR GBR.2_1 GBR.2.9_1 GBR.2.9.2_1 United Kingdom Northern Ireland Mid Ulster Dungannon
GBR GBR.2_1 GBR.2.9_1 GBR.2.9.3_1 United Kingdom Northern Ireland Mid Ulster Magherafelt
IRL IRL.10_1 Ireland Kilkenny
IRL IRL.11_1 Ireland Laoighis
IRL IRL.12_1 Ireland Leitrim
IRL IRL.13_1 Ireland Limerick
IRL IRL.14_1 Ireland Longford
IRL IRL.15_1 Ireland Louth
IRL IRL.16_1 Ireland Mayo
IRL IRL.17_1 Ireland Meath
IRL IRL.18_1 Ireland Monaghan
IRL IRL.19_1 Ireland Offaly
IRL IRL.1_1 Ireland Carlow
IRL IRL.20_1 Ireland Roscommon
IRL IRL.21_1 Ireland Sligo
IRL IRL.22_1 Ireland Tipperary
IRL IRL.23_1 Ireland Waterford
IRL IRL.24_1 Ireland Westmeath
IRL IRL.25_1 Ireland Wexford
IRL IRL.26_1 Ireland Wicklow
IRL IRL.2_1 Ireland Cavan
IRL IRL.3_1 Ireland Clare
IRL IRL.4_1 Ireland Cork
IRL IRL.5_1 Ireland Donegal
IRL IRL.6_1 Ireland Dublin
IRL IRL.7_1 Ireland Galway
IRL IRL.8_1 Ireland Kerry
IRL IRL.9_1 Ireland Kildare

Question 2

Should there be any special behaviour with the existing country parameter? At the moment, a query like https://api.gbif.org/v2/map/occurrence/adhoc/5/30/[email protected]?srs=EPSG:4326&bin=hex&hexPerTile=17&style=classic.poly&country=IE&gadmLevel1Gid=GBR.2_1 returns only those occurrences in both NaturalEarthMarineRegions Ireland and GADM Northern Ireland, i.e. the errors and inaccuracies from the data for occurrences close to the border between the two.

CORS?

Can we please have CORS?

Definition of eventDate returned by the API

Definition of eventDate returned by the API

According to DarwinCore the eventDate is, "the date-time or interval during which an Event occurred. For occurrences, this is the date-time when the event was recorded.".

This brings 2 concepts:

  1. The date and time relative to the location of the event
  2. The exact moment the event occurred on the timeline

To illustrate the example, lets take the worst case scenario:

eventDate : 2016-09-15T00:05:00+1400 (LINT, Kiritimati, Kiribati - Christmas Island, UTC+14)

At the exact same moment, it is:

  • 2016-09-14T10:05:00+0000 (UTC)
  • 2016-09-14T11:05:00+0100 (BST, London, United Kingdom, UTC+1)
  • 2016-09-13T00:05:00-1200 (AoE, Baker Island, US Minor Outlying Islands, UTC-12)

In the worst case, a single eventDate 2016-09-15 can actually be on 3 different dates depending where you are on Earth.

This raises some questions:

  • When a user search for all records that occurred on 2016-09-14, what should be returned?
  • When no timezone is provided, which one should be used?

The tradeoff solution that answer the 2 questions above is to handle eventDate as a local date time. This means the eventDate is relative to its location and the time zone is therefore ignored. When you query GBIF for a specific date, you then get all records that were recorded on that date relatively to their location. The counter side of such solution is that it makes it very difficult to ordered eventDate based on the timeline (in the order they exactly happened in the real world) with a precision higher than 26 hours (biggest difference between 2 timezones).

Please let us know if you have good reason to think that handling eventDate as local date time could affect you program/work.

Extend Language to ISO 639-3 codes

Based on gbif/checklistbank#73 we need to deal with more languages than the current enum based on ISO 639-2 codes holds. The simplest option would be to extend the Language enum with all ~7700 codes and store both 2 and 3 letter codes as enum properties. But is a large enum like this still a good idea? Enum values are only initialised when they are first used, so should not be a JVM issue

Add version to Dataset object

In order to generate the citation described by gbif/registry#4 , we need to store the version of the dataset (if available).

The version can be extracted from the packageId in the EML document.

API analytics

It is estimated that the use of api.gbif.org has been increasing by specialized users and this situation deserves to be reported and analyzed in particular, perhaps as much as possible similar to how the use of the portal is tracked using Google Analytics.

In particular, we are interested in collecting basic metrics like:

  • Geographic distribution of API users.
  • Usage per sub-domain: occurrence. dataset, species, etc.
  • Average response times and data traffic.

There are existing solutions and services to manage, track and analyze APIs, however, such solutions are, in our case, unviable because of cost:

  1. https://aws.amazon.com/api-gateway/
  2. https://cloud.google.com/apigee/
  3. https://www.mulesoft.com/platform/api/anypoint-analytics
  4. https://www.ibm.com/dk-en/cloud/api-connect

Other solutions can be explored and extended to achieve similar results, for example:

  1. https://github.com/Netflix/zuul
  2. https://spring.io/projects/spring-cloud-gateway
  3. https://konghq.com/kong/

Interpretation remarks on Occurrence for absence of taxa, coordinates or date

This is a discussion to harmonize how we handle absence of data for taxa coordinates and date on Occurrence records.

Currently it is different in all 3 cases:

  • If no taxa is provided, we flag a TAXON_MATCH_NONE
  • If no coordinates are provided, we have a HAS_COORDINATE = FALSE in our search index (and in the table behind)
  • If no date is provided, we do nothing

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.