GithubHelp home page GithubHelp logo

cmr-stac's Introduction

NASA CMR STAC

NASA's Common Metadata Repository (CMR) is a metadata catalog of NASA Earth Science data. STAC, or SpatioTemporal Asset Catalog, is a Specification for describing geospatial data with JSON and GeoJSON. The related STAC-API Specification defines an API for searching and browsing STAC catalogs.

CMR-STAC

CMR-STAC acts as a proxy between the CMR repository and STAC API queries. The goal is to expose CMR's vast collections of geospatial data as a STAC-compliant API. Even though the core metadata remains the same, a benefit of the CMR-STAC proxy is the ability to use the growing ecosystem of STAC software. Underneath, STAC API queries are translated into CMR queries which are sent to CMR and the responses are translated into STAC Collections and Items. This entire process happens dynamically at runtime, so responses will always be representative of whatever data is currently stored in CMR. If there are any deletions of data in CMR by data providers, those deletions are represented in CMR-STAC immediately.

CMR-STAC follows the STAC API 1.0.0-beta.1 specification, see the OpenAPI Documentation.

Usage

Endpoints

  • CMR-STAC: The entire catalog of NASA CMR data, organized by provider.
  • CMR-CLOUDSTAC: Also organized by provider, this API only contains STAC Collections where the Item Assets are available "in the cloud" (i.e., on S3).

Navigating

CMR-STAC can be navigated manually using the endpoints provided above, or you can utilize available STAC software to browse and use the API.

See the Usage Documentation for examples of how to interact with the API and search for data.

Limitations

While CMR-STAC provides some advantages over the CMR, there are some limitations that you should be aware of:

  • Limited search functionality: CMR-STAC does not support all of the search capabilities that CMR provides. For example, with CMR, you can search for data based on temporal and spatial criteria, as well as specific parameters such as platform, instrument, and granule size. However, with CMR-STAC, you can only search based on the STAC standard.
  • Limited metadata availability: CMR-STAC only provides metadata that follows the STAC specification. While this metadata is very rich and comprehensive, it may not provide all of the information that you need for your specific use case.

For Developers

Developer README

License

NASA Open Source Agreement v1.3 (NASA-1.3) See LICENSE.txt

cmr-stac's People

Contributors

abarciauskas-bgse avatar atdaniel avatar cayvonh avatar daniel-zamora avatar dependabot[bot] avatar dpesall avatar dylnclrk avatar eriknelson11 avatar fgabelmannjr avatar jasongilman avatar jaybarra avatar johnwteague avatar m4cy avatar matthewhanson avatar mitchstartzel avatar snyk-bot avatar sxu123 avatar ygliuvt avatar zimzoom avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cmr-stac's Issues

Which item properties get exposed from original metadata?

Does this utility only convert required STAC Item metadata, or is it possible to expose additional properties?

For example, a direct search with the ASF Vertex API looks like this:
# https://search.asf.alaska.edu/#/?dataset=SENTINEL-1%20INTERFEROGRAM%20(BETA)&polygon=POLYGON((-122.3525%2041.365,-122.0961%2041.365,-122.0961%2041.5252,-122.3525%2041.5252,-122.3525%2041.365))&zoom=9.044887353337925&center=-123.314132,41.235575&resultsLoaded=true&granule=S1-GUNW-A-R-035-tops-20200304_20200203-020818-42666N_40796N-PP-0a7f-v2_0_2&mission=S1%20I-grams%20(BETA)%20-%20Northern%20CA&beamModes=slc&polarizations=VV&flightDirs=Ascending&path=35-35&maxResults=1000&productTypes=GUNW_STD

I'm not sure how STAC Extensions like the SAR extension in this case would be incorporated here, but note some additional very useful search terms such as polarizations=VV&flightDirs=Ascending. These properties are not exposed in the STAC item metadata and therefore not searchable:

https://cmr.earthdata.nasa.gov/stac/ASF/collections/C1595422627-ASF/items/G1714418970-ASF

A "metadata" asset is linked though, which is an XML file with a bunch of additional information.

GET /search with 'collections' parameter always returns 0 results

Expected:

These two requests should return identical results.

OAFeat:

https://cmr.earthdata.nasa.gov/stac/LARC_ASDC/collections/NIMBUS7_ERB_SEFDT.v1/items

STAC API Item Search:

https://cmr.earthdata.nasa.gov/stac/LARC_ASDC/search?collections=NIMBUS7_ERB_SEFDT.v1

Actual:

The Item Search GET query returns 0 results.

Note that this does work correctly with POST:

curl -X "POST" "https://cmr.earthdata.nasa.gov/stac/LARC_ASDC/search" \
     -H 'Content-Type: application/json; charset=utf-8' \
     -d $'{
  "collections": [
    "NIMBUS7_ERB_SEFDT.v1"
  ]
}'

Need better Collection IDs

The ids surfaced in cmr-stac appear to be internal CMR ids, which are meaningless to everyone, rather than the the more typical SHORTNAME-VERSION nomenclature typically used to identify a collection.

cmr-stac should surface these as the ids instead.

bbox should accept an array

the endpoint should accept bbox passed as an array instead of a string of comma separated floats. cc @matthewhanson

Mirroring an issue documented in sat-utils/sat-search#106 (comment)

results = Search(url='https://cmr.earthdata.nasa.gov/stac/LPCLOUD',
                 collections=['HLSS30.v1.5'], 
                 #bbox = [-122.4,41.3,-122.1,41.5], #SatSearchError: "Request failed with status code 400"
                 bbox = '-122.4,41.3,-122.1,41.5', #works! 
                 datetime='2021-01-01/2021-02-01',  
                )
results.found()

Search is very fast, but loading STAC Items is very slow

Performing a search can return >1000 items in less than 1 second. But loading those search results as STAC Items takes orders of magnitude longer (>1 min). Is this expected?

%%time
import pystac_client #0.3.0

URL = 'https://cmr.earthdata.nasa.gov/stac/NSIDC_ECS'
catalog = pystac_client.Client.open(URL)

results = catalog.search(
                 collections=['NSIDC-0723.v4'], 
                 bbox = '-54.85,69.31,-52.18,70.26',
                 datetime='2000-01-01/2021-12-31', 
                )

print(f"{results.matched()} items found") # 1387 items found, Wall time: 909 ms
%%time
items = results.get_all_items() # Wall time: 1min 5s

pagination not supported

Pagination is not currently supported, or at least there is no next link as per the core STAC spec:
https://github.com/radiantearth/stac-spec/blob/master/api-spec/api-spec.md#filter-parameters-and-fields

Although limit is supported, but there is presumably some max limit at which point it will time out, so no way to get items beyond that.

Note that STAC 0.9.0 will align with OGC API Features 1.0.0 for pagination, which is to provide a complete next link along with an optional body for POST requests

temporal extent in most Collections incorrect

Some of the temporal extents given appear to be incorrect.

For instance the LANCE MODIS L2 swath data: https://pcmii5dog3.execute-api.us-east-1.amazonaws.com/dev//collections/C1426717545-LANCEMODIS
has 2017-10-20T00:00:00.000Z as both the start and end temporal extent, which is clearly wrong.

And the non NRT version also has a single date for the start and end extent:
https://pcmii5dog3.execute-api.us-east-1.amazonaws.com/dev//collections/C1443528505-LAADS

For ICESat-2 ATL06 we also see the same thing, where there is a single date for start and end:
https://pcmii5dog3.execute-api.us-east-1.amazonaws.com/dev//collections/C1631076765-NSIDC_ECS

It appears that nearly all the Collections are incorrect, although the NCEP Reanslysis: https://pcmii5dog3.execute-api.us-east-1.amazonaws.com/dev//collections/C1237113465-GES_DISC
appears to have reasonable values for the temporal extent.

Invalid stac-api-spec conformance reporting.

The reported conformance for item-search#fields, item-search#query and item-search#sort are currently non-functional but reported in the conformance at ` curl --location --request GET ' https://cmr.earthdata.nasa.gov/stac/LPCLOUD' ` See ` curl --location --request GET ' https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/HLSL30.v1.5/items?fields=-geometry' ` and ` curl --location --request GET ' https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/HLSL30.v1.5/items?sortby=+properties.datetime' ` as failing examples.

Limited results returned for Client.open('https://cmr.earthdata.nasa.gov/stac/NSIDC_ECS')

I'm trying to use cmr-stac to search collections at NSIDC. However, the following code only returns 10 datasets. It appears that their is a limit on the dataset. I get a similar result for PODAAC collections. Is there a work around for this or is it a bug?

from pystac_client import Client
nsidc_cat = Client.open('https://cmr.earthdata.nasa.gov/stac/NSIDC_ECS')
nsidc_cat.get_links()

[<Link rel=self target=https://cmr.earthdata.nasa.gov/stac/NSIDC_ECS>,
 <Link rel=root target=https://cmr.earthdata.nasa.gov/stac/>,
 <Link rel=collections target=https://cmr.earthdata.nasa.gov/stac/NSIDC_ECS/collections>,
 <Link rel=search target=https://cmr.earthdata.nasa.gov/stac/NSIDC_ECS/search>,
 <Link rel=search target=https://cmr.earthdata.nasa.gov/stac/NSIDC_ECS/search>,
 <Link rel=conformance target=https://cmr.earthdata.nasa.gov/stac/NSIDC_ECS/conformance>,
 <Link rel=service-desc target=https://api.stacspec.org/v1.0.0-beta.1/openapi.yaml>,
 <Link rel=service-doc target=https://api.stacspec.org/v1.0.0-beta.1/index.html>,
 <Link rel=child target=https://cmr.earthdata.nasa.gov/stac/NSIDC_ECS/collections/ABLVIS0.v1>,
 <Link rel=child target=https://cmr.earthdata.nasa.gov/stac/NSIDC_ECS/collections/ABOLVIS1A.v1>,
 <Link rel=child target=https://cmr.earthdata.nasa.gov/stac/NSIDC_ECS/collections/ABLVIS1B.v1>,
 <Link rel=child target=https://cmr.earthdata.nasa.gov/stac/NSIDC_ECS/collections/ABLVIS2.v1>,
 <Link rel=child target=https://cmr.earthdata.nasa.gov/stac/NSIDC_ECS/collections/AFLVIS0.v1>,
 <Link rel=child target=https://cmr.earthdata.nasa.gov/stac/NSIDC_ECS/collections/AFOLVIS1A.v1>,
 <Link rel=child target=https://cmr.earthdata.nasa.gov/stac/NSIDC_ECS/collections/AFLVIS1B.v1>,
 <Link rel=child target=https://cmr.earthdata.nasa.gov/stac/NSIDC_ECS/collections/AFLVIS2.v1>,
 <Link rel=child target=https://cmr.earthdata.nasa.gov/stac/NSIDC_ECS/collections/AU_Rain.v1>,
 <Link rel=child target=https://cmr.earthdata.nasa.gov/stac/NSIDC_ECS/collections/AU_Land.v1>,
 <Link rel=next target=https://cmr.earthdata.nasa.gov/stac/NSIDC_ECS?page=2>]

stac-api-spec query extension no longer functions.

The test spec appears to indicate that the use of the stac-api-spec query extension is supported but while the following query functions.

This search result in 1304 matched items ` curl --location --request POST 'https://cmr.earthdata.nasa.gov/stac/LPCLOUD/search' \ --header 'Content-Type: application/json' \ --data-raw '{ "datetime": "2021-01-28", "collections": ["HLSL30.v2.0"] }' `

But including a query object (which should not affect the number of matched items) results in 0 items being returned ` curl --location --request POST 'https://cmr.earthdata.nasa.gov/stac/LPCLOUD/search' \ --header 'Content-Type: application/json' \ --data-raw '{ "datetime": "2021-01-28", "collections": ["HLSL30.v2.0"], "query": { "eo:cloud_cover": { "gte": 0, "lte": 100 }} }' `

service-desc link type is different than href's response content-type header

Validating against [https://cmr.earthdata.nasa.gov/stac/USGS_EROS](https://cmr.earthdata.nasa.gov/stac/USGS_EROS), the service-desc link:

{'rel': 'service-desc', 'href': 'https://api.stacspec.org/v1.0.0-beta.1/openapi.yaml', 'title': 'OpenAPI Doc', 'type': 'application/vnd.oai.openapi;version=3.0'}

advertises the type as 'application/vnd.oai.openapi;version=3.0', but the endpoint response (https://api.stacspec.org/v1.0.0-beta.1/openapi.yaml) sets the content type header as 'text/yaml'

pagination for collections should use rel=next instead rel=child

There is pagination of sorts the listing of provider collections:
https://cmr-stac-api.dev.element84.com/stac/catalog/providers/LAADS

The last link is:

{
rel: "child",
title: "Page 2",
href: "https://cmr-stac-api.dev.element84.com/stac/catalog/providers/LAADS/page/2"
}

Neither STAC nor OGC API Features define pagination for collection endpoints, but the same approach should be used as for items, which is to use a rel=next link, rather than child

Bad Gateway limit errors

bbox = '-53.0172669999999968,-9.5331669999999988,-48.4956669999999974,-3.1035670000000000'                       
#datetime='2020-04-23/2021-04-23', 
datetime='2000-04-23/2021-04-23', 

from satsearch import Search
results = Search(url='https://cmr.earthdata.nasa.gov/stac/LPDAAC_ECS',
                 #collections=['ASTGTM.v003'], 
                 #collections=['AST_09XT.v003'], 
                 collections=['AST_L1T.v003'], 
                 #collections=['AST_L1BE.v003'],                    
                 
                 limit=limit,
                 bbox = '-53.0172669999999968,-9.5331669999999988,-48.4956669999999974,-3.1035670000000000' ,
                 #bbox = '-74.5,40.2128,-73.5,41.2128' ,
                 datetime='2000-04-23/2021-04-23', 
                )

results.found()

image

POST /search with bbox fails with 400 with only a formatting change.

This query with a bbox parameter fails with a 400:

curl -X "POST" "https://cmr.earthdata.nasa.gov/stac/LARC_ASDC/search" \
     -H 'Content-Type: application/json; charset=utf-8' \
     -d $'{ "bbox": [ 0, 0, 1, 1 ] }'

But this one, which should be identical does not:

curl -X "POST" "https://cmr.earthdata.nasa.gov/stac/LARC_ASDC/search"      -H 'Content-Type: text/plain; charset=utf-8'      -d $'{
  "bbox": [
0,
0,
1,
1
  ]
}'

This isn't specific to curl, as I'm getting this same error using the Paw http client.

datetime field is incorrect

The datetime property in a STAC Item is a single datetime, but the datetime field in cmr-stac is two datetimes separated by a slash.

The datetime-range extension should be used to represent the start_datetime and end_datetime fields: https://github.com/radiantearth/stac-spec/tree/master/extensions/datetime-range

Note that in the upcoming STAC 0.9.0 the datetime-range extension will change to not use a prefix in the field names (i.e., start_datetime rather than dtr:start_datetime)

media types often wrong for links

Collections have links to an HTML overview page, except the type is given as application/json.
It seems that most all links have type set to application/json, even when they are not.

Many types of geometries fail with intersects filter and POST

Running against https://cmr.earthdata.nasa.gov/stac/LARC_ASDC

These queries fail:

  • POST Search with intersects:{'type': 'LineString', 'coordinates': [[100.0, 0.0], [101.0, 1.0]]} returned status code 400
  • POST Search with intersects:{'type': 'Polygon', 'coordinates': [[[100.0, 0.0], [101.0, 0.0], [101.0, 1.0], [100.0, 1.0], [100.0, 0.0]]]} returned status code 502
  • POST Search with intersects:{'type': 'Polygon', 'coordinates': [[[100.0, 0.0], [101.0, 0.0], [101.0, 1.0], [100.0, 1.0], [100.0, 0.0]], [[100.8, 0.8], [100.8, 0.2], [100.2, 0.2], [100.2, 0.8], [100.8, 0.8]]]} returned status code 502
  • POST Search with intersects:{'type': 'MultiPoint', 'coordinates': [[100.0, 0.0], [101.0, 1.0]]} returned status code 400
  • POST Search with intersects:{'type': 'MultiLineString', 'coordinates': [[[100.0, 0.0], [101.0, 1.0]], [[102.0, 2.0], [103.0, 3.0]]]} returned status code 400
  • POST Search with intersects:{'type': 'MultiPolygon', 'coordinates': [[[[102.0, 2.0], [103.0, 2.0], [103.0, 3.0], [102.0, 3.0], [102.0, 2.0]]], [[[100.0, 0.0], [101.0, 0.0], [101.0, 1.0], [100.0, 1.0], [100.0, 0.0]], [[100.2, 0.2], [100.2, 0.8], [100.8, 0.8], [100.8, 0.2], [100.2, 0.2]]]]} returned status code 502

oddly, Point and GeometryCollection both work.

LPDAAC_ESC Catalogue collection returns error

Thanks to Scott Henderson for the approach

some experiments have found that some collections are fine, some return errors. I was interested in looking at ASTER coverage, so:-

https://github.com/RichardScottOZ/Pangeo-Experiments/blob/main/STAC-Catalogue-NASA-LPDAAC-ECS-AST.ipynb

cat2 = intake.open_stac_catalog(f'https://cmr.earthdata.nasa.gov/stac/LPDAAC_ECS/collections?limit={limit}')
col_info2 = pd.DataFrame(cat2.metadata['collections'])
col_info2.head(1)

Need better Item IDs

Similar to #20, Items also seem to be using the same generic IDs, probably from CMR.

The Item IDs are less important than for collections, but these are clearly different than the 'Granule ID' that is normally used. The granule ID is what should be the STAC Item id.

collection id and link missing from Items

While Collections properly link to Items under them, the Items do not link or identify the collection they belong to.

There should be a collection field in the Item containing the id of the collection. In addition there should be a link with rel=collection and a URL linking to the specific collection endpoint.

Items and search endpoints are not ItemCollections

The /collections/{collectionId}/items and /stac/search endpoints should return an ItemCollection, as per:
https://github.com/radiantearth/stac-spec/blob/v0.8.1/api-spec/api-spec.md#ogc-api---features-endpoints

ItemCollection definition:
https://github.com/radiantearth/stac-spec/blob/v0.8.1/item-spec/itemcollection-spec.md

Basically the type and links fields are missing to make it a valid GeoJSON FeatureCollection.

Also note that OGC API - Features requires some additional fields as well, such as numMatched and numReturned.

LPCLOUD HLS status code 400

Yesterday at one stage I was getting the above error

stac_items = Search(url='https://cmr.earthdata.nasa.gov/stac/LPCLOUD',
                 collections=['HLSL30.v1.5'], 
                 #bbox = '-53.0172669999999968,-9.5331669999999988,-48.4956669999999974,-3.1035670000000000' ,
                 bbox = '-53.0232820986343754,-8.1236837545427090, -49.4688521093868800,-4.8677173521785928',
                 datetime='2016-04-23/2021-04-23', 
                ).items()

 File "stackstac_NASA_HLS_cropped.py", line 119, in <module>
    stac_items = Search(url='https://cmr.earthdata.nasa.gov/stac/LPCLOUD',
  File "C:\Users\rscott\AppData\Local\Continuum\anaconda3\envs\stackstac\lib\site-packages\satsearch\search.py", line 90, in items
    found = self.found(headers=headers)
  File "C:\Users\rscott\AppData\Local\Continuum\anaconda3\envs\stackstac\lib\site-packages\satsearch\search.py", line 62, in found
    results = self.query(url=url, headers=headers, **kwargs)
  File "C:\Users\rscott\AppData\Local\Continuum\anaconda3\envs\stackstac\lib\site-packages\satsearch\search.py", line 80, in query
    raise SatSearchError(response.text)
satsearch.search.SatSearchError: "Request failed with status code 400"

and then it worked, then it stopped working again.

Some sort of rate limiting or something along those lines perhaps?

Incorrect item asset "types"

First of all, thanks for putting this together! I think it is going to be an amazing resource for consistent, programmatic access to NASA data.

After a bit of initial testing I've noticed some incorrect assignment of asset type (which complicates opening remote assets):

https://cmr.earthdata.nasa.gov/cmr-stac/NSIDC_ECS/collections/C1908075185-NSIDC_ECS/items/G1921160949-NSIDC_ECS

  "assets": {
    "0": {
      "href": "https://n5eil01u.ecs.nsidc.org/DP4/MEASURES/NSIDC-0723.003/2015.01.01/GL_S1bks_mosaic_01Jan15_12Jan15_shape_v03.0.shx",
      "type": "application/x-geotiff"
    },

https://cmr.earthdata.nasa.gov/cmr-stac/NSIDC_ECS/collections/C1908075185-NSIDC_ECS/items/G1923750834-NSIDC_ECS

  "assets": {
    "data": {
      "href": "https://n5eil01u.ecs.nsidc.org/DP4/MEASURES/NSIDC-0723.003/2015.01.01/greenland_v03.0.mov",
      "type": "application/x-geotiff"
    },

I'm guessing "type": "application/x-geotiff" is set as default somewhere if the filetype doesn't exist or isn't recognized in the original CMR metadata?

search takes invalid parameters

I found out the proper way to search a collection by using the search link from the collection...except it is using a parameter called collectionId which is not a valid parameter

https://pcmii5dog3.execute-api.us-east-1.amazonaws.com/dev/stac/search?collectionId=C1443528505-LAADS

collections are searched via a collections parameter, and a list may be provided.

I think the confusion came from the fact that 'collectionId' is given in this table:
https://github.com/radiantearth/stac-spec/blob/master/api-spec/api-spec.md#filter-parameters-and-fields

But it is a path-only parameter, meaning that it is used to specify the collection in an items endpoint like:
https://pcmii5dog3.execute-api.us-east-1.amazonaws.com/dev//collections/C1631076765-NSIDC_ECS/items

where C1631076765-NSIDC_ECS is the collectionId

Many valid datetime values are not accepted

The following datetime values do not work correctly and return a 400 against https://cmr.earthdata.nasa.gov/stac/LARC_ASDC

Search with datetime=1978-01-01T00:00:00.000Z extracted from an Item returned status code 400
Search with datetime=1985-04-12T23:20:50.52Z returned status code 400
Search with datetime=1985-04-12T23:20:50,52Z returned status code 400
Search with datetime=1996-12-19T16:39:57-00:00 returned status code 400
Search with datetime=1996-12-19T16:39:57+00:00 returned status code 400
Search with datetime=1996-12-19T16:39:57-08:00 returned status code 400
Search with datetime=1996-12-19T16:39:57+08:00 returned status code 400
Search with datetime=../1985-04-12T23:20:50.52Z returned status code 400
Search with datetime=1985-04-12T23:20:50.52Z/.. returned status code 400
Search with datetime=/1985-04-12T23:20:50.52Z returned status code 400
Search with datetime=1985-04-12T23:20:50.52Z/ returned status code 400
Search with datetime=1985-04-12T23:20:50.52Z/1986-04-12T23:20:50.52Z returned status code 400
Search with datetime=1985-04-12T23:20:50.52+01:00/1986-04-12T23:20:50.52Z+01:00 returned status code 400
Search with datetime=1985-04-12T23:20:50.52-01:00/1986-04-12T23:20:50.52Z-01:00 returned status code 400
Search with datetime=1937-01-01T12:00:27.87+01:00 returned status code 400
Search with datetime=1985-04-12T23:20:50.52Z returned status code 400
Search with datetime=1937-01-01T12:00:27.8710+01:00 returned status code 400
Search with datetime=1937-01-01T12:00:27.8+01:00 returned status code 400
Search with datetime=1937-01-01T12:00:27.8Z returned status code 400
Search with datetime=2020-07-23T00:00:00.000+03:00 returned status code 400
Search with datetime=2020-07-23T00:00:00+03:00 returned status code 400
Search with datetime=1985-04-12t23:20:50.000z returned status code 400
Search with datetime=2020-07-23T00:00:00Z returned status code 400
Search with datetime=2020-07-23T00:00:00.0Z returned status code 400
Search with datetime=2020-07-23T00:00:00.01Z returned status code 400
Search with datetime=2020-07-23T00:00:00.012Z returned status code 400
Search with datetime=2020-07-23T00:00:00.0123Z returned status code 400
Search with datetime=2020-07-23T00:00:00.01234Z returned status code 400
Search with datetime=2020-07-23T00:00:00.012345Z returned status code 400
Search with datetime=2020-07-23T00:00:00.0123456Z returned status code 400
Search with datetime=2020-07-23T00:00:00.01234567Z returned status code 400
Search with datetime=2020-07-23T00:00:00.012345678Z returned status code 400
Search with datetime=1986-04-12T23:20:50.52Z/1985-04-12T23:20:50.52Z returned status code 400

The issuie is likely in the file datetime.js.

This regex:

const DATE_TIME_RX = new RegExp('\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}(\\.\\d{2:4})?Z');

should be

const DATE_TIME_RX = new RegExp('\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}(\\.\\d{2,4})?Z');

where the fractional seconds matcher is \d{2,4} rather than \d{2:4} The syntax {2:4} was intended to be 2 to 4 digits, but actually matches {2:4} literally, e.g., 1985-04-12T23:20:50.5{2:4}Z

According to the RFC3339 spec, this should allow 1 or more digit, not just 2-4.

Also, the T and Z can also be lowercase per:

 NOTE: Per [ABNF] and ISO8601, the "T" and "Z" characters in this
      syntax may alternatively be lower case "t" or "z" respectively.

Also, I believe the ISO8601 one is missing the fractional seconds:

const ISO_8601_DATE_RX = new RegExp(
  '(\\d{4})-(\\d{2})-(\\d{2})T(\\d{2}):(\\d{2}):(\\d{2})[+-](\\d{2}):(\\d{2})')

Item Search datetime query parameter with only date is allowed, but should not

date-only datetime strings are not allowed for the datetime query parameter per OAFeat and STAC API Item Search.

However, cmr-stac allows these when it should not. E.g. https://cmr.earthdata.nasa.gov/stac/LARC_ASDC?datetime=1985-04-12 does not return a 400 status code and instead implements the semantics of ../1985-04-12T00:00:00Z. This is also a confusing behavior, as https://cmr.earthdata.nasa.gov/stac/LARC_ASDC?datetime=../1985-04-12 returns no results and https://cmr.earthdata.nasa.gov/stac/LARC_ASDC?datetime=1985-04-12/.. returns a 400.

Invalid path components

cmr-stac allows one to browse through some top level catalogs before getting to collections, which is fine, but the path components do not follow the STAC spec, so the existing clients are unable to work with it.

The provider:
https://cmr-stac-api.dev.element84.com/stac/catalog/providers/ASF

seems like it should essentially be a root catalog, and the STAC endpoints should go from here, which would be useful and allow for cross-collection searches from a single provider, but:
https://cmr-stac-api.dev.element84.com/stac/catalog/providers/ASF/stac
https://cmr-stac-api.dev.element84.com/stac/catalog/providers/ASF/stac/search
etc. do not work

Instead a collection goes back a level to something like:
https://cmr-stac-api.dev.element84.com/stac/catalog/collections/C1213928843-ASF

This implies that the root is
https://cmr-stac-api.dev.element84.com/stac/catalog
which is invalid, and that:
https://cmr-stac-api.dev.element84.com/stac/catalog/stac
would be the root catalog, which is also invalid

Sortby fails on namespaced properties

Sorting functions as expected on raw properties but fails on properties with a namespace prefix such as eo:cloud_cover.

curl --location --request GET 'https://cmr.earthdata.nasa.gov/stac/LPCLOUD/search?bounding_box=-10,5,-9,11&datetime=2021-01-28&sortby=properties.eo:cloud_cover'

Fields missing from individual Item

I noticed a bit of a discrepancy between what information is given when looking at a list of items versus what's displayed when looking at a singular item. Particularly, the stac_extensions and eo:cloud_cover fields.

Again, this information is displayed when looking at the .../<collectionID>/items, but not when I click on a specific Item's self link .../<collectionID>/items/<itemID>. Below I've put two links as an example.

Displayed: https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/C1711924822-LPCLOUD/items
Not Displayed: https://cmr.earthdata.nasa.gov/stac/LPCLOUD/collections/C1711924822-LPCLOUD/items/G1969487860-LPCLOUD

"next" links use URL escaping that the server doesn't understand

original metadata should be an asset, not a link

See this example:
https://pcmii5dog3.execute-api.us-east-1.amazonaws.com/dev//collections/C1443528505-LAADS/items/G1465827297-LAADS

there is a link that is "metadata". Metadata should be an asset, not a link.

Links are primarily for including the hierarchical links (self, parent, child, root, collection), but may also be external related links such as: related software, scientific papers, license info, etc.
If it is the data itself or associated metadata for the data, it should be an asset.

/search endpoint is advertised with media type `application/geo+json` in Landing Page link rel=search, but returns Content-Type header `application/json`

The landing page has a link with rel=search advertising the correct media type of application/geo+json.

    {
      "rel": "search",
      "href": "https://cmr.earthdata.nasa.gov/stac/LARC_ASDC/search",
      "title": "Provider Item Search",
      "type": "application/geo+json",
      "method": "GET"
    },
    {
      "rel": "search",
      "href": "https://cmr.earthdata.nasa.gov/stac/LARC_ASDC/search",
      "title": "Provider Item Search",
      "type": "application/geo+json",
      "method": "POST"
    },

However, invoking this search endpoint returns Content-Type header application/json in violation of the advertised conformance class https://api.stacspec.org/v1.0.0-beta.1/item-search:

$ http https://cmr.earthdata.nasa.gov/stac/LARC_ASDC/search
HTTP/1.1 200 OK
Connection: keep-alive
Content-Encoding: gzip
Content-Type: application/json; charset=utf-8

'limit' does not work as expected

I've noticed using the 'limit' keyword is either ignored, or causes API failures if set to a large value (>1000)

import pystac_client #0.3
URL = 'https://cmr.earthdata.nasa.gov/stac/NSIDC_ECS'
catalog = pystac_client.Client.open(URL)
results = catalog.search(
                 collections=['NSIDC-0723.v4'], 
                 bbox = '-54.85,69.31,-52.18,70.26',
                 datetime='2000-01-01/2021-12-31', 
                limit=20,
                )

# For limit=20
print(f"{results.matched()} items found") # 1387 items found (expecting 20)
items = results.get_all_items_as_dict()
print(len(items['features'])) # 1387 

# For limit=1000
print(f"{results.matched()} items found") # 1387 items found (expecting 1000)
print(results.get_all_items_as_dict()) # {'type': 'FeatureCollection', 'features': []}

seemingly related #192, #152 (comment) cc @matthewhanson

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.