gbif / gbif-api Goto Github PK

View Code? Open in Web Editor NEW

27.0 26.0 5.0 227.19 MB

GBIF API

License: Apache License 2.0

Java 100.00%

gbif-api's Introduction

GBIF API

The GBIF API library provides:

The model objects used by the GBIF service interfaces and the internal messaging systems
Enumerations representing standardized vocabularies (country codes, databased enumerations etc)
The Java interface definitions for the public GBIF API (note: each implementation is responsible for mapping to the RESTful URL)
Utilities to simplify common operations when working with model objects (JSON serialization, filtered iterators etc)

To build the project

mvn clean install

Policies

All changes must go to the dev branch for testing before merging to master.
A pre-commit peer review on all commits, ideally referencing the review in the commit message. Simple changes can be committed without review.
All commits must reference a GitHub issue to which they relate
PR are preferred for complex functionality. Please target the dev branch.

Dev and master versions must be different to avoid issues with many work-in-progress tasks. When master version is released increment patch version, when released version is merged with development, increment minor version manually.

Example of releasing dev branch:

Current dev and master versions

1) dev version    - 1.7.0-SNAPSHOT
2) master version - 1.6.0-SNAPSHOT

Merge changes into master

1) dev version    - 1.7.0-SNAPSHOT
2) master version - 1.7.0-SNAPSHOT

Release master and increment patch version

1) dev version    - 1.7.0-SNAPSHOT
2) master version - 1.7.1-SNAPSHOT

Merge changes into dev

1) dev version    - 1.7.1-SNAPSHOT
2) master version - 1.7.1-SNAPSHOT

Bump dev minor version

1) dev version    - 1.8.0-SNAPSHOT
2) master version - 1.7.1-SNAPSHOT

Example of releasing a fix for master branch:

Current dev and master versions

1) dev version    - 1.8.0-SNAPSHOT
2) master version - 1.7.1-SNAPSHOT

Release master and increment patch version

1) dev version    - 1.8.0-SNAPSHOT
2) master version - 1.7.2-SNAPSHOT

Merge changes into dev, without version bumping

1) dev version    - 1.8.0-SNAPSHOT
2) master version - 1.7.2-SNAPSHOT

Change Log

Documentation

gbif-api's People

Contributors

Stargazers

Watchers

Forkers

jongiddy gbifargentina gustibimo ymgan okeamah

gbif-api's Issues

Make GBIF API compatible with semantic web a.k.a. Linked Open Data (LOD)

As you probably know, LOD is a way to create semantic links between datasets, with relatively standard auto-describing vocabularies.
As with ordinary web API's, this is also a way to decouple the data and the application(s).
So, connecting GBIF data to standard LOD syntaxes and vocabularies can lead to many unforeseen use cases. My current first use case is to make maps with both my LOD botanical observations and GBIF observations.

I started mapping the GBIF JSON to standard LOD vocabularies in this project, using JSON-LD technique (see also https://json-ld.org/ ), which provides a LOD "reading grid" for the plain JSON data:
https://github.com/jmvanel/rdf-convert/tree/master/gbif.org
I hope I have your blessing.

To make things smoother technically, there are a few things that you could do.

The more annoying is that most id's in your JSON are integers; it would be more convenient for JSON-LD tools if they were strings.
Also, you could add new JSON-LD keys in the API : @id , @type .
After that, you could expose the JSON-LD side of your JSON URL's by one of the ways here:
https://www.w3.org/TR/json-ld/#modifying-behavior-with-link-relationships
or simply add an "@context" key in the JSON.

I hope I made an issue in the relevant github project .

NOTE: I also wrote a translator from GBIF DW archive to RDF (Turtle) :
https://github.com/jmvanel/dwca-rdf
but having RDF URL's like I propose here is more in line with the Semantic Web architecture.

Extend the IsoDateParsingUtils to support the Java 8 time API

There are already implementations in occurrence and registry.

We should standarize them and put them in the gbif-api in order not to duplicate them.

Extend Rank enum with ZooBank ranks

ZooBank as the zoological nomenclator contains a few very rare ranks that we should nevertheless add to our rank enumeration:

altera: Cyclopterus liparis altera minor Fabricius, 1780

type: Serrivomer sector type brevidentatus Roule & Bertin, 1929
-> Serrivomer sector typ. longidentatus Roule & Bertin, 1929

facies: Salmo trutta major facies rhodanensis Roule, 1923

Changes to support building with Java-9

A few minor changes needed to support Java-9, similar to other packages. Includes the motherpom update, and a few test changes to replace ClassLoader.class with getClass()

Decide on the future of unused continent related occurrence remarks

We currently have 3 occurrence remarks related to continent that are not used (based on Solr facets):

CONTINENT_COUNTRY_MISMATCH
CONTINENT_INVALID
CONTINENT_DERIVED_FROM_COORDINATES

Considering there is no official list of continents, we could consider deprecating CONTINENT_COUNTRY_MISMATCH and CONTINENT_DERIVED_FROM_COORDINATES.

CONTINENT_INVALID could be kept to flag values that can not be matched against the dictionary.

What and how to return Occurrence eventDate?

Please see the comment below with a proposal for changes.

The comment below this line is unchanged since 2016.

The goal of this issue is to document and collect feedback about the future response of the API related to the date to which an occurrence record occurred. This is strictly related to the response of the API.

Supported date ISO formats

The following ISO date formats, based on the granularly provided by the data publisher, will be returned by the API:

Pattern	Example value
yyyy	2016
yyyy-mm	2016-03
yyyy-mm-dd	2016-03-05
yyyy-mm-dd'T'hh:mm:ss	2016-03-05T13:03:07

Date intervals using the formats above separated with a slash (/): 2016-03-05/2016-03-06.

How should it be returned by the API?

Option 1 - Update/reuse the current eventDate

This option consists in reusing the current content of the JSON field eventDate returned by the API to return all supported ISO format dates.
Following this change, the eventDate could contains values like:

"eventDate":"2012-03/2012-04"
"eventDate":"2012-03-01/2012-04-02"
"eventDate":"2012-03-01"
"eventDate":"2012-03"
"eventDate":"2016-03-05T13:03:07"

Option 2 - introduce new field(s)

This option consists in adding new field(s) to the JSON response.
For example :

"isoEventDate":"2012-03/2012-04"

The current eventDate could be one of the following:

maintained but always empty
maintained and provided only for non date range
maintained and always filled (for date ranges it would contains the start of the range)

What to do with year, month, day

This is an open question for both options.

The different options are:

remove them to avoid confusion
maintain them but provide values only for non date range
maintain them and always return a value(for date ranges it would contains the start of the range)

remove or update CHANGELOG.md

https://github.com/gbif/gbif-api/blob/master/CHANGELOG.md has only 3 entries and stops at version 0.57 in 2017

API number of results limit by default

I am having problems with de API´s calling to obtain information on the datasets published by GBIF Colombia's node. It currently has a limit of 1000 records by default and that number of datasets was exceeded a few weeks ago ;), I would like to know if you can change the default value since the "limit" parameter is being overwritten such as API documentation describe, or maybe give me some advice about how I can get this information otherwise.

The API calling is:
http://api.gbif.org/v1/node/7e865cba-7c46-417b-ade5-97f2cf5b7be0/dataset?limit=10000

Extend EndpointType with ACEF and ColDP

When using the GBIF registry for CoL+ datasets 2 new data formats should be added to the EndpointType which are already supported in the CoL Clearinghouse:

ACEF (The CoL Annual Checklist Submission Format)
ColDP (CoL Data Package)

Move InterpretationRemarkSeverity from occurrence-common

In relation of Issue #5
The gbif-api should also define the default "severity" of each interpretation remarks.

The severity of an interpretation remarks can be different depending on the context. Here, we only want to define the default severity.

Support very long (≥8k) search requests

Users would like to search using long polygons or many taxon keys. To support this with the current search API, a long (>8k character) URL must pass through:

The user's web browser or other client
Potentially a not-very-good proxy (corporate or education filter etc)
Varnish
a. occurrence-ws
b. vectortile-ws / mapnik-server
SOLR

4.a. is easily fixed for gbif-microservice, 4.b. can be fixed for Dropwizard with

  applicationConnectors:
    - type: http
      port: 7001
      maxRequestHeaderSize: 1MiB

although there are then issues somewhere in Jersey's regex handling.

is probably OK since using HTTPS should avoid most proxies from modifying the request
requires regexes in Varnish to use .*? rather than .* for the maps rules, and there's a related note in Varnish saying needing this is "madness".

That leaves 1. That's a concern from a Jetty developer suggesting all of this is a bad idea, for compatibility and security.

So we need some way to communicate the search terms without using >8kiB, at least for website and API. We could:

Use POST and cache POST requests in Varnish
- No longer possible to share URLs, easily switch between website and API, etc, but the reasonable length limit is very high
Use POST requests to get a key (time limited?), presumably stored in a database somewhere, which maps the key to the search string.
- Allows sharing etc, but adds more complexity
Compress the search parameters in the URL:
- xz compression followed by Base64 encoding reduces a 11.5kiB string (polygon of Brandenburg) to 2.5kiB
- not an incredible saving
Make a protocol buffers format for encoding the query
- A quick try, using part of geobuf.proto for the geometry then base64 encoding, uses 5.8kiB
- even less of a saving

Introduce Occurrence isoEventDate field(s)

The current Occurrence eventDate field is a Java Date field and it can not always represent the value that was provided. For example, the value "2013-02" is a valid ISO date but can not be stored in a Date field without introducing a false precision: "2013-02-01:00:00:00". The issue at this point is that we have no mechanism to return to the user the date at the same level of granularity that was originally provided.

I suggest we introduce fields isoEventDateStart and isoEventDateEnd to store the rendered ISO date in String. The start and end concept will be used to support date ranges.

field "issue" not returned in species API

Searching occurrences by issue via API returns field issues (see example with issue DEPTH_UNLIKELY) as expected. The same is not true while searching names/species (see example1 with issue NO_SPECIES and example2 with issue ORIGINAL_NAME_DERIVED). The data are clearly well filtered, but it would be nice to get the field issues as for occurrences as well.
Or am I missing something? Thanks.

Allow ordering by eventDate and scientificName to Occurrence search

See: gbif/portal-feedback#539

Add sourceId param to http://api.gbif.org/v1/species/search

Needed to implement gbif/portal-feedback#517

Add occurrence count from latest harvest to endpoints in /dataset

As in subject. This would be immensely useful for me to help prevent unnecessary generation of DwC-As of all GBIF specimen-based occurrences if I can first discover any datasets that have dropped a significant number of records since I last downloaded them. For example, it appears https://www.gbif.org/dataset/4ce8e3f9-2546-4af1-b28d-e2eadf05dfd4 has mistakenly dropped half its 4.5M occurrences from its DwC-A at some time between today and two weeks ago. I have been in touch with Niels Klazenga to see if he can get them restored. And then I'd generate yet another 65GB DwC-A file. Incidentally, I am faced with 12hr+ download times for such files. If however I create a Droplet on DigitalOcean in Amsterdam to download the file before hopping the pond, I can get it to my machine in NA in approx. 1.5hrs.

Add ISOPARATYPE into TypeStatus

ISOPARATYPE has been used in Atals

Add bibliographicCitation to NameUsage & Occurrence

Both core dwc classes Occurrence and NameUsage aka dwc:Taxon are lacking an interpreted field for dc:bibliographicCitation which is part of the general dc properties available in the defined core IPT extensions.

These are needed to expose the publisher given record citation properly, see gbif/portal-feedback#446

Old download predicates no longer parse

https://api.gbif.org/v1/occurrence/download/0002132-150221044754056 for example.

This will be due to the updated version of Geotools, but it's necessary for people looking at statistics and so on.

add a new constructor to DoiData class

Add a new constructor with only one parameter 'status'. Will be used in gbif-doi and in integration with datacite-rest-client

Possible to include datasetKey for source of vernacular name?

See https://api.gbif.org/v1/species/4286942/vernacularNames?limit=100:

{
"taxonKey": 4286942,
"vernacularName": "Aranyhal",
"language": "hun",
"country": "HU",
"area": "Hungary",
"source": "Catalogue of Life", # Not a datasetKey
"sourceTaxonKey": 142921839
},

For vernacular names the source is returned as the title of the dataset, but not the datasetKey. Is it possible to return both?

Add two assertions: GEOREFERENCED_DATE_UNLIKELY GEOREFERENCED_DATE_INVALID

DwC geoReferencedDate has been added into LocationRecord now.
Similar with DwC.dateIdentified, We would like to add two relevant assertions , GEOREFERENCED_DATE_UNLIKELY, GEOREFERENCED_DATE_INVALID

incorrect URL in download response

When you request a download through the API, you get a response like this:

HTTP/1.1 201 Created
Date: Mon, 24 Jul 2017 08:03:27 GMT
Content-Type: application/json
Location: http://api.gbif.org/occurrence/download/request/0005187-170714134226665
Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: HEAD, GET, POST, DELETE, PUT
Connection: close

The URL in "Location" is missing the API version, and doesn't work.

Incorrect validation of MetasyncHistory causing XML synchronization to fail

MetasyncHistory#syncDate is annotated as @NotNull, but it is only set during persistence.

This causes an exception when the synchronization results are sent to the Registry WS.

Enumeration for literature topics

Is there an enumeration available through the REST API which contains the topics applied to literature monitored by the GBIF DOI tracking programme? View source on the web UI (https://www.gbif.org/resource/search?contentType=literature) shows a reference to enums.cms.vocabularyTerms.topics but I can't see this listed under https://gbif.github.io/gbif-api/apidocs/org/gbif/api/vocabulary/package-summary.html - I assume only those listed here are available through the API access to enumerations: https://www.gbif.org/developer/summary#enumerations

Revise NomenclaturalStatus

The NomenclaturalStatus enumeration is missing an entry for a correct/valid name and uses confusing terminology in some places. There should be 2 clear entries for a correct name and a validly published/available name.

As botanical and zoological terminology is using the same term valid for different things we should avoid it.

Botanical / Zoological terms:

correct (incorrect) / valid (invalid)
valid (invalid) / available (unavailable)

Suggest to use:

CORRECT & INCORRECT
AVAILABLE & UNAVAILABLE

All other more specific status values should have two methods to indicate whether that more specific status is correct and available in the above sense: isCorrect(), isAvailable()

Search for occurrences without date isn't an option

Search for occurrences without date isn't an option

total 703,245,140

592,565,753 occ before 2020
(zero invalid and zero unlikely)

56,748,687 recorded date invalid or unlikely or mismatch
(8.6 mil mismatch duplicates with before 2020 ignored)

592 mil + 56 mil is far from 703 mil. This I assume is the occurrences without date at all. But wouldn't it be practical to have them searchable as well?

I guess one could argue, that if you want a date you set a filter. And if you are a publisher you can look at your own data. Whereas the other issues has to do with our processing. But then again we add no_taxon_match when no taxon is provided. and flag occurrences without coordinates. And our issue filters are targeting the publishers in the first place.

fbitem-4cd51ab3c35a7b3305c9ec1422d26bc93a21a482
Reported by: @MortenHofft
System: Chrome 55.0.2883 / Mac OS X 10.10.5
Referer: https://demo.gbif.org/occurrence/search?issue=RECORDED_DATE_INVALID&issue=RECORDED_DATE_UNLIKELY&issue=RECORDED_DATE_MISMATCH

Accept old and new forms for EML userId

Following this issue on the EML repository, it looks like when the IPT and registry/GBIF API were written there was no guidance on how to format an ORCID in the EML.

The related commit on EML for v2.2 (not yet released) suggests it should be like this:

<userId directory="https://orcid.org/">https://orcid.org/0000-0003-0623-6682</userId>

But the IPT generates it like this:

<userId directory="http://orcid.org/">0000-0003-0623-6682</userId>

We will need to handle both, and probably also the mixture:

<userId directory="http://orcid.org/">https://0000-0003-0623-6682</userId>

Improve Interpretation remarks definition

Interpretation remarks are also known as "issues" (e.g. OccurrenceIssue).

Currently for occurrences this information is defined in InterpretationRemarksDefinition.

In order to make the definition more explicit we should bring it to the gbif-api project. The same mechanism is also required to identify remarks in ChecklistBank.

The idea would be to have an interface like the following:

public interface InterpretationRemark {
  Set<Term> getRelatedTerms();
}

OccurrenceIssue and NameUsageIssue would then implement it.

Representing imprecise dates in the Java object model

There's some similar discussion on #2, for occurrences.

The motivation is to fix gbif/portal-feedback#1676 and similar issues around dates in metadata properly.

The pubDate and temporalCoverage on a Dataset are very often only given as a year, and we should retain that. Changing the JSON response is straightforward enough, I now have:

"pubDate": "2016",
…
"temporalCoverages": [{
    "@type": "range",
    "start": "2013",
    "end": "2015"
}],

instead of the current

"pubDate": "2015-12-31T23:00:00.001+0000",
…
"temporalCoverages": [{
    "@type": "range",
    "start": "2012-12-31T23:00:00.001+0000",
    "end": "2014-12-31T23:00:00.001+0000"
}],

I could fix only the time zone issue of pubDate, and leave the 1 extra millisecond which seems to be an undocumented way to say this is a year-precision date. However, that still leaves the end of the range one year too soon — though I suppose it could be serialized as 2015-12-31T00:00:00.001+0000.

Anyway, for this I used a TemporalAccessor, since it can represent a Year, YearMonth, LocalDate (=YMD) etc. However, there's a strong warning against using this class since it can also represent things like JapaneseDate which break the usual assumptions we have about ISO dates. It's also a bit cumbersome to use — to get the year means checking it holds a year, then requesting it. So, it's much looser than we require.

I think what @mdoering wrote in #2, of creating an IsoDate class, makes most sense. This would only represent a date (not a date range), either as a year, year and month, or year month and day. I think it would be a fairly simple wrapper around Year, YearMonth, and LocalDate, e.g. returning the most precise available, or fetching a year. It can serialize into a single, ISO 8601 format field of 4, 6 or 8 digits.

I'm not too concerned with @cgendreau's concerns. Deserialization of three well-defined formats (YYYY, YYYY-MM and YYYY-MM-DD) is easy and fast. (Faster than deserializing an ISO date!)

What does everyone else think?

One millisecond hack to say it's only the year
TemporalAccessor
New IsoDate class
Something else

The Varnish logs suggest two regular users of the Java API for this endpoint. (Plazi and GBIF Japan.)

gbif api (and portal?) should tell about usage of extensions

I just discovered the regular API calls are not exposing concepts published through the usage of extensions (or at least, some extensions).

An example from a taxon checklist extended with vernacular names, species distribution, literature References and MeasurementOrFact extensions:

The portal entry overview tab shows at least some of those linked concepts (for some reason, only the distribution data but nothing else): https://www.gbif.org/species/164940120
But the equivalent API normal call doesn't show anything: https://api.gbif.org/v1/species/164940120

If you go to the verbatim versions, then no problem to find all linked data:
Portal verbatim tab: https://www.gbif.org/species/164940120/verbatim
API verbatim call: https://api.gbif.org/v1/species/164940120/verbatim

Of course a user who is browsing the portal overview tab (the default one), can always click the verbatim tab in order to see if there could be some extended information in there. In my opinion, there should be a visual notification in the page, to tell user about those additional extensions data available when clicking the verbatim tab.

But when you are programming a web app which uses the API, this notification is much more important: it would make sense to have some API flag notifying about that (something like extensionsUsage: boolean, or even better usedExtensions: [array of extensions], for example).
With such a notification, the app could decide wether or not to make a second API call to retrieve the verbatim data (only if those extensions are relevant for what the app needs to show).
Otherwise, the only way to retrieve those extended data is to always make verbatim calls. Which in most cases would be an useless bandwith waste.

@abubelinha

Registry api /dataset/search sometimes does not include recordCount

Hiya, I was wondering if there's a reason why some results in /dataset/search don't return a recordCount attribute?

An example is the 7th result here https://api.gbif.org/v1/dataset/search?q=plant&publishingCountry=AR - (key = 0ce9ca26-0e89-4f63-94fe-124d47a4451a). The API result doesn't have recordCount included, but you can see on https://www.gbif.org/dataset/0ce9ca26-0e89-4f63-94fe-124d47a4451a it has 10343 records.

Registry api /species/{int}/vernacularNames sometimes does not include source

Sometimes the source is no present. I don't understand the reason.

Two examples:
https://api.gbif.org/v1/species/2873861/vernacularNames
https://api.gbif.org/v1/species/3189073/vernacularNames

Any idea? Is it a bug? Thanks.

Implement new way of handling languages in the API

As discussed here and in #29 our current Language enum doesn't cover all our needs.

Define GBIF Region for each country

In the Country enumeration, we should be able to declare the associated GbifRegion.

Publisher param in map API appears not to work

Use of the publisher param as is documented appears to be ignored in the map API. Example: https://api.gbif.org/v2/map/occurrence/density/0/0/[email protected]?publisher=a41250f0-7c3e-11d8-a19c-b8a03c50a862&style=purpleYellow.point vs https://api.gbif.org/v2/map/occurrence/density/0/0/[email protected]?style=purpleYellow.point

Any place where api's response structures are explanained?

Hello!

I'm looking for GBIF API's responses documentation. If this is not the place to ask, please, address me to the correct one.

I'm currently working with the GBIF API, and I don't understand some of the responses. For example, when asking for dataset metrics GET /v1/dataset/d7dddbf4-2cf0-4f39-9b2a-bb099caae36c/metrics I get that response:

{
    "key": 210029,
    "datasetKey": "d7dddbf4-2cf0-4f39-9b2a-bb099caae36c",
    "usagesCount": 5858142,
    "synonymsCount": 2361744,
    "distinctNamesCount": 5844412,
    "nubMatchingCount": 3805144,
    "colMatchingCount": 2845782,
    "nubCoveragePct": 64,
    "colCoveragePct": 48,
    "countByConstituent": ,
    "countByKingdom": {
        "ANIMALIA": 1529953,
        "PLANTAE": 523372,
        "FUNGI": 201853,
        "CHROMISTA": 15357,
        "BACTERIA": 10847,
        "PROTOZOA": 7063,
        "INCERTAE_SEDIS": 787,
        "ARCHAEA": 439,
        "VIRUSES": 119
    },
    ...
}

What is the meaning of properties such as nubMatchingCount, colMatchingCount, nubCoveragePct, etc?? In the website I can see the API documentation for making request, not explaining responses though.

I will appreciate any help Thanks!

Why do descriptions have key?

When querying for species related information, I expect the taxonKey to be the first attribute. This is true for all except descriptions:

distributions: taxonKey
speciesProfiles: taxonKey
descriptions: key, then taxonKey

Why do descriptions have their own key?

Add citations to registry API for datasets

Since you're now tracking citations, it would be nice to obtain these via a GET to /dataset/{UUID}/citations. This need not be particularly verbose, perhaps nothing more than an array of DOIs because metadata for these can be had elsewhere such as Crossref's API.

Clearly specify how geometry polygons wrapping a pole or crossing the antimeridian should be specified

A polygon can cross the antimeridian (e.g. Fiji) or cross the antimeridian when wrapping around the north or south pole (e.g. around the Arctic).

I need to investigate a bit more, but either WKT doesn't really define how these polygons should be specified, or there are conflicting implementations.

Within this API project, we validate WithinPredicates using the JTS library, which lacks support. Some ways of specifying the polygon happen to work.

We should work out what we want to support, and convert as necessary to our backend systems.

As an example of the differences between implementations, locationtech/spatial4j#46 says:

Strict interpretation of WKT/OGC rules refer to the "right-hand-rule" which basically says the outer shell is counter-clockwise order. Based on that Spatial4j has a bug and your test shows it. WKT/OGC, AFAIK, also don't talk about the notion of a dateline. There is a special case in Spatial4j that when given a 4-point rectangular polygon (NOT multipolygon), then it is processed according to the right hand rule. And hence your test will pass if the shape is trivially converted. But I'm planning on removing that or making it optional in the next release because numerous people have complained, even if it meets specs.

Spatial4j does it this way because it seemed a real-world polygon wouldn't have longitudinal points >= 180 degrees apart…
…yet it's reasonable to search GBIF based on a polygon around Eurasia.

(Spatial4J is a derivative of the JTS library we use.)

This is the secondary issue from https://dev.gbif.org/issues/browse/POR-3042/ , which is a polygon of the Arctic. It's a reasonable polygon, but doesn't validate with JTS. It can be made into a rectangle -180 90…180 90,-180 90 which does validate, but we either need to do this ourselves, or clearly document it.

Change GADM search API

We've found some limitations with the initial, basic GADM search implementation.

One is an inconvenience. A query "the island of Ireland" seems very reasonable, but because different query terms are ANDed (and multiple values for the same term are ORd) this is the necessary query — every appropriate Level 1 GID:

A similar case, "Andalucia and Gibraltar", cannot be done:

https://api.gbif.org/v1/occurrence/search?gadmLevel1Gid=ESP.1_1&gadmLevel0Gid=GIB

Gibraltar doesn't have level 1 subdivisions, so there's no way to make a suitable OR-query.

Secondly, allowing queries by name will probably lead to confusion. Bolívar is the name of 30 level 2 areas, in 6 countries: https://api.gbif.org/v1/occurrence/search?gadmLevel2Name=Bol%C3%ADvar&limit=0&facet=gadmLevel0Name plus additional level 1 and 3 areas.

Proposal:

Just remove the name parameters. Allowing their use is too fragile. We don't accept country=Ireland anyway.
An additional search parameter gadmGid, so the above queries can be represented like this:

https://api.gbif.org/v1/occurrence/search?gadmGid=IRL&gadmGid=GBR.2_1

https://api.gbif.org/v1/occurrence/search?gadmGid=ESP.1_1&gadmGid=GIB

Question 1

This still leaves the facets. Do we still need gadmLevel1Gid etc terms to return in facets, or should something else be done?

Working in a similar way to searching taxa would suggest we have both, as we can already facet on both kingdomKey, phylumKey etc and taxonKey.

To help anyone who can ponder this, these are the GADM areas for Ireland + Northern Ireland:

gid_0	gid_1	gid_2	gid_3	name_0	name_1	name_2	name_3
GBR	GBR.2_1	GBR.2.10_1	GBR.2.10.1_1	United Kingdom	Northern Ireland	Newry, Mourne and Down	Down
GBR	GBR.2_1	GBR.2.10_1	GBR.2.10.2_1	United Kingdom	Northern Ireland	Newry, Mourne and Down	Newry and Mourne
GBR	GBR.2_1	GBR.2.11_1	GBR.2.11.1_1	United Kingdom	Northern Ireland	North Down and Ards	Ards
GBR	GBR.2_1	GBR.2.11_1	GBR.2.11.2_1	United Kingdom	Northern Ireland	North Down and Ards	North Down
GBR	GBR.2_1	GBR.2.1_1	GBR.2.1.1_1	United Kingdom	Northern Ireland	Antrim and Newtownabbey	Antrim
GBR	GBR.2_1	GBR.2.1_1	GBR.2.1.2_1	United Kingdom	Northern Ireland	Antrim and Newtownabbey	Newtownabbey
GBR	GBR.2_1	GBR.2.2_1	GBR.2.2.1_1	United Kingdom	Northern Ireland	Armagh, Banbridge and Craigavon	Armagh
GBR	GBR.2_1	GBR.2.2_1	GBR.2.2.2_1	United Kingdom	Northern Ireland	Armagh, Banbridge and Craigavon	Banbridge
GBR	GBR.2_1	GBR.2.2_1	GBR.2.2.3_1	United Kingdom	Northern Ireland	Armagh, Banbridge and Craigavon	Craigavon
GBR	GBR.2_1	GBR.2.3_1	GBR.2.3.1_1	United Kingdom	Northern Ireland	Belfast	Belfast
GBR	GBR.2_1	GBR.2.4_1	GBR.2.4.1_1	United Kingdom	Northern Ireland	Causeway Coast and Glens	Ballymoney
GBR	GBR.2_1	GBR.2.4_1	GBR.2.4.2_1	United Kingdom	Northern Ireland	Causeway Coast and Glens	Coleraine
GBR	GBR.2_1	GBR.2.4_1	GBR.2.4.3_1	United Kingdom	Northern Ireland	Causeway Coast and Glens	Limavady
GBR	GBR.2_1	GBR.2.4_1	GBR.2.4.4_1	United Kingdom	Northern Ireland	Causeway Coast and Glens	Moyle
GBR	GBR.2_1	GBR.2.5_1	GBR.2.5.1_1	United Kingdom	Northern Ireland	Derry and Strabane	Derry
GBR	GBR.2_1	GBR.2.5_1	GBR.2.5.2_1	United Kingdom	Northern Ireland	Derry and Strabane	Strabane
GBR	GBR.2_1	GBR.2.6_1	GBR.2.6.1_1	United Kingdom	Northern Ireland	Fermanagh and Omagh	Fermanagh
GBR	GBR.2_1	GBR.2.6_1	GBR.2.6.2_1	United Kingdom	Northern Ireland	Fermanagh and Omagh	Omagh
GBR	GBR.2_1	GBR.2.7_1	GBR.2.7.1_1	United Kingdom	Northern Ireland	Lisburn and Castlereagh	Castlereagh
GBR	GBR.2_1	GBR.2.7_1	GBR.2.7.2_1	United Kingdom	Northern Ireland	Lisburn and Castlereagh	Lisburn
GBR	GBR.2_1	GBR.2.8_1	GBR.2.8.1_1	United Kingdom	Northern Ireland	Mid and East Antrim	Ballymena
GBR	GBR.2_1	GBR.2.8_1	GBR.2.8.2_1	United Kingdom	Northern Ireland	Mid and East Antrim	Carrickfergus
GBR	GBR.2_1	GBR.2.8_1	GBR.2.8.3_1	United Kingdom	Northern Ireland	Mid and East Antrim	Larne
GBR	GBR.2_1	GBR.2.9_1	GBR.2.9.1_1	United Kingdom	Northern Ireland	Mid Ulster	Cookstown
GBR	GBR.2_1	GBR.2.9_1	GBR.2.9.2_1	United Kingdom	Northern Ireland	Mid Ulster	Dungannon
GBR	GBR.2_1	GBR.2.9_1	GBR.2.9.3_1	United Kingdom	Northern Ireland	Mid Ulster	Magherafelt
IRL	IRL.10_1	␀	␀	Ireland	Kilkenny	␀	␀
IRL	IRL.11_1	␀	␀	Ireland	Laoighis	␀	␀
IRL	IRL.12_1	␀	␀	Ireland	Leitrim	␀	␀
IRL	IRL.13_1	␀	␀	Ireland	Limerick	␀	␀
IRL	IRL.14_1	␀	␀	Ireland	Longford	␀	␀
IRL	IRL.15_1	␀	␀	Ireland	Louth	␀	␀
IRL	IRL.16_1	␀	␀	Ireland	Mayo	␀	␀
IRL	IRL.17_1	␀	␀	Ireland	Meath	␀	␀
IRL	IRL.18_1	␀	␀	Ireland	Monaghan	␀	␀
IRL	IRL.19_1	␀	␀	Ireland	Offaly	␀	␀
IRL	IRL.1_1	␀	␀	Ireland	Carlow	␀	␀
IRL	IRL.20_1	␀	␀	Ireland	Roscommon	␀	␀
IRL	IRL.21_1	␀	␀	Ireland	Sligo	␀	␀
IRL	IRL.22_1	␀	␀	Ireland	Tipperary	␀	␀
IRL	IRL.23_1	␀	␀	Ireland	Waterford	␀	␀
IRL	IRL.24_1	␀	␀	Ireland	Westmeath	␀	␀
IRL	IRL.25_1	␀	␀	Ireland	Wexford	␀	␀
IRL	IRL.26_1	␀	␀	Ireland	Wicklow	␀	␀
IRL	IRL.2_1	␀	␀	Ireland	Cavan	␀	␀
IRL	IRL.3_1	␀	␀	Ireland	Clare	␀	␀
IRL	IRL.4_1	␀	␀	Ireland	Cork	␀	␀
IRL	IRL.5_1	␀	␀	Ireland	Donegal	␀	␀
IRL	IRL.6_1	␀	␀	Ireland	Dublin	␀	␀
IRL	IRL.7_1	␀	␀	Ireland	Galway	␀	␀
IRL	IRL.8_1	␀	␀	Ireland	Kerry	␀	␀
IRL	IRL.9_1	␀	␀	Ireland	Kildare	␀	␀

Question 2

Should there be any special behaviour with the existing country parameter? At the moment, a query like https://api.gbif.org/v2/map/occurrence/adhoc/5/30/[email protected]?srs=EPSG:4326&bin=hex&hexPerTile=17&style=classic.poly&country=IE&gadmLevel1Gid=GBR.2_1 returns only those occurrences in both NaturalEarthMarineRegions Ireland and GADM Northern Ireland, i.e. the errors and inaccuracies from the data for occurrences close to the border between the two.

CORS?

Can we please have CORS?

Definition of eventDate returned by the API

According to DarwinCore the eventDate is, "the date-time or interval during which an Event occurred. For occurrences, this is the date-time when the event was recorded.".

This brings 2 concepts:

The date and time relative to the location of the event
The exact moment the event occurred on the timeline

To illustrate the example, lets take the worst case scenario:

eventDate : 2016-09-15T00:05:00+1400 (LINT, Kiritimati, Kiribati - Christmas Island, UTC+14)

At the exact same moment, it is:

2016-09-14T10:05:00+0000 (UTC)
2016-09-14T11:05:00+0100 (BST, London, United Kingdom, UTC+1)
2016-09-13T00:05:00-1200 (AoE, Baker Island, US Minor Outlying Islands, UTC-12)

In the worst case, a single eventDate 2016-09-15 can actually be on 3 different dates depending where you are on Earth.

This raises some questions:

When a user search for all records that occurred on 2016-09-14, what should be returned?
When no timezone is provided, which one should be used?

The tradeoff solution that answer the 2 questions above is to handle eventDate as a local date time. This means the eventDate is relative to its location and the time zone is therefore ignored. When you query GBIF for a specific date, you then get all records that were recorded on that date relatively to their location. The counter side of such solution is that it makes it very difficult to ordered eventDate based on the timeline (in the order they exactly happened in the real world) with a precision higher than 26 hours (biggest difference between 2 timezones).

Please let us know if you have good reason to think that handling eventDate as local date time could affect you program/work.

Regex on AbstractGbifUser doesn't allow dash (-)

Emails like [email protected] are currently rejected but the format is actually valid.

Response api name parsed nor correspond doc

Doc of api name parse https://www.gbif.org/tools/name-parser/about
not correspond value return in example http://api.gbif.org/v1/parser/name?name=Abies%20alba&name=Abies%20alba%20Mill.&name=Poa%20pratensis%20L.%20subsp.%20sergievskajae%20(Probat.)%20Tzvelev

Conflict doc and response properties

Extend Language to ISO 639-3 codes

Based on gbif/checklistbank#73 we need to deal with more languages than the current enum based on ISO 639-2 codes holds. The simplest option would be to extend the Language enum with all ~7700 codes and store both 2 and 3 letter codes as enum properties. But is a large enum like this still a good idea? Enum values are only initialised when they are first used, so should not be a JVM issue

Revise accepted values for basisOfRecord

e.g. LITERATURE is not an accepted value according to DarwinCore.

Add version to Dataset object

In order to generate the citation described by gbif/registry#4 , we need to store the version of the dataset (if available).

The version can be extracted from the packageId in the EML document.

API analytics

It is estimated that the use of api.gbif.org has been increasing by specialized users and this situation deserves to be reported and analyzed in particular, perhaps as much as possible similar to how the use of the portal is tracked using Google Analytics.

In particular, we are interested in collecting basic metrics like:

Geographic distribution of API users.
Usage per sub-domain: occurrence. dataset, species, etc.
Average response times and data traffic.

There are existing solutions and services to manage, track and analyze APIs, however, such solutions are, in our case, unviable because of cost:

Other solutions can be explored and extended to achieve similar results, for example:

Interpretation remarks on Occurrence for absence of taxa, coordinates or date

This is a discussion to harmonize how we handle absence of data for taxa coordinates and date on Occurrence records.

Currently it is different in all 3 cases:

If no taxa is provided, we flag a TAXON_MATCH_NONE
If no coordinates are provided, we have a HAS_COORDINATE = FALSE in our search index (and in the table behind)
If no date is provided, we do nothing

gbif / gbif-api Goto Github PK

gbif-api's Introduction

GBIF API

To build the project

Policies

Change Log

Documentation

gbif-api's People

Contributors

Stargazers

Watchers

Forkers

gbif-api's Issues

Supported date ISO formats

How should it be returned by the API?

Option 1 - Update/reuse the current eventDate

Option 2 - introduce new field(s)

What to do with year, month, day

Proposal:

Question 1

Question 2

Definition of eventDate returned by the API

Recommend Projects

Recommend Topics

Recommend Org

Jobs