microsoft / planetarycomputerdatacatalog Goto Github PK
View Code? Open in Web Editor NEWData catalog for the Microsoft Planetary Computer
Home Page: https://planetarycomputer.microsoft.com
License: MIT License
Data catalog for the Microsoft Planetary Computer
Home Page: https://planetarycomputer.microsoft.com
License: MIT License
as @TomAugspurger and I discussed in closed issues and PRs, I would like to learn and help better document the datasets and their usage. I am currently actively working on my project and spending a significant time on organizing data pipelines, so I thought I would open this issue to offer help and see if we could push for some documentation while at it.
I am particularly concerned with GOES datasets for now, and I can be verbose (I guess good for documenting stuff!) so I don't mind helping write something if I am pointed towards the resources and how to go about things.
My current "best" workflows:
(Note: that netCDF4 seems to outperform h5py by almost an order of magnitude in my local testing, hence the decision to use netcdf4 instead of unifying the approach in 1 --- one could easily use fsspec with h5py for local files too, thus only needing to change a single keyword in the workflow rather than significant portions.)
(Interesting caveat: netCDF4 reads the modified data directly, whereas with h5py one needs to apply the scaling and offsetting manually, so it is actually more work 😅)
In the items for 3dep-lidar-hag, e.g. https://planetarycomputer.microsoft.com/api/stac/v1/collections/3dep-lidar-hag/items/UT_StatewideSouth_2_2020-hag-2m-0-7 , have invalid raster:bands. They must be an array of Band Objects instead of directly a Band Object. Also, it is part of the properties instead of the asset, which is why it is not cought by validators...
The geoparquet-items
asset in the cil-gdpcir-cc0 collection collection looks weird:
Top-level:
3. Also, the cube:dimensions has lat and lon dimensions, but both have the y axis assigned which sounds wrong to me.
4. The description contains a Table, but CommonMark doesn't support tables
I'm not sure if this repository is actually the right place to place this issue, but here comes the description:
We read Landsat data for a study region in central Africa. While this works fine in most cases, we get errors when attempting to open a Landsat8 OLI scene acquired in 2022. The error only occurs when trying to load B4
.
import rasterio as rio
import planetary_computer
# URL to dataset (we load band 4)
url = 'https://landsateuwest.blob.core.windows.net/landsat-c2/level-2/standard/oli-tirs/2022/173/059/LC08_L2SP_173059_20220706_20220722_02_T1/LC08_L2SP_173059_20220706_20220722_02_T1_SR_B4.TIF'
url_signed = planetary_computer.sign_url(url)
ds = rio.open(url_signed)
This gives the following error message
Traceback (most recent call last):
File "rasterio/_base.pyx", line 310, in rasterio._base.DatasetBase.__init__
File "rasterio/_base.pyx", line 221, in rasterio._base.open_dataset
File "rasterio/_err.pyx", line 221, in rasterio._err.exc_wrap_pointer
rasterio._err.CPLE_OpenFailedError: '/vsicurl/https://landsateuwest.blob.core.windows.net/landsat-c2/level-2/standard/oli-tirs/2022/173/059/LC08_L2SP_173059_20220706_20220722_02_T1/LC08_L2SP_173059_20220706_20220722_02_T1_SR_B4.TIF?st=2023-07-13T09%3A03%3A45Z&se=2023-07-14T09%3A48%3A45Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2023-07-14T07%3A40%3A28Z&ske=2023-07-21T07%3A40%3A28Z&sks=b&skv=2021-06-08&sig=rpO6Ia5Y8JQvnxLCUdC8Bv%2BNjzTbEEbdnItAAizP/Lg%3D' not recognized as a supported file format.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/mnt/ides/Lukas/venvs/GeoPython/lib64/python3.11/site-packages/rasterio/env.py", line 451, in wrapper
return f(*args, **kwds)
^^^^^^^^^^^^^^^^
File "/mnt/ides/Lukas/venvs/GeoPython/lib64/python3.11/site-packages/rasterio/__init__.py", line 304, in open
dataset = DatasetReader(path, driver=driver, sharing=sharing, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "rasterio/_base.pyx", line 312, in rasterio._base.DatasetBase.__init__
rasterio.errors.RasterioIOError: '/vsicurl/https://landsateuwest.blob.core.windows.net/landsat-c2/level-2/standard/oli-tirs/2022/173/059/LC08_L2SP_173059_20220706_20220722_02_T1/LC08_L2SP_173059_20220706_20220722_02_T1_SR_B4.TIF?st=2023-07-13T09%3A03%3A45Z&se=2023-07-14T09%3A48%3A45Z&sp=rl&sv=2021-06-08&sr=c&skoid=c85c15d6-d1ae-42d4-af60-e2ca0f81359b&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2023-07-14T07%3A40%3A28Z&ske=2023-07-21T07%3A40%3A28Z&sks=b&skv=2021-06-08&sig=rpO6Ia5Y8JQvnxLCUdC8Bv%2BNjzTbEEbdnItAAizP/Lg%3D' not recognized as a supported file format.
When we do the same for, e.g., Band 3, (url = https://landsateuwest.blob.core.windows.net/landsat-c2/level-2/standard/oli-tirs/2022/173/059/LC08_L2SP_173059_20220706_20220722_02_T1/LC08_L2SP_173059_20220706_20220722_02_T1_SR_B3.TIF
) the rio
call works with out any problems and as expected:
src = rio.open(url_signed) # now set to B3 instead of B4
src.meta
outputs
{'driver': 'GTiff', 'dtype': 'uint16', 'nodata': 0.0, 'width': 7591, 'height': 7741, 'count': 1, 'crs': CRS.from_epsg(32635), 'transform': Affine(30.0, 0.0, 716385.0,
0.0, -30.0, 275715.0)}
as expected.
Any hint why the B4 dataset cannot be read?
When scrolling through the layer panel, I've noticed that immediately after reaching the top or bottom, the panel will immediately close. Because the panel is positioned over the map, this results in suddenly zooming in or out.
OS: macOS 12.0.1
Browser: Chrome 97.0.4692.71
Input: Trackpad
The tooltip on disabled filters does not always close as expected. This can be annoying because the tooltip covers useful information and actions and feels "stuck".
Adding the medium (300ms) delay
on the tooltip would help too, to reduce the frequency of the tooltip triggering on accident: https://developer.microsoft.com/en-us/fluentui#/controls/web/tooltip#TooltipDelay
Tooltip persists when cursor moves slowly, or when the cursor hovers the tooltip
Tooltip persists when the cursor hovers the tooltip
I think the bounding box for some collections is invalid, the Lat values seem to be swapped or have the wrong sign:
More collections could be affected, I stopped at G or so.
https://planetarycomputer.microsoft.com/api/stac/v1/collections/noaa-mrms-qpe-24h-pass2 reports in summaries pass 1 (instead of 2) and period 1 (instead of 24). Looks like wrong parameters have been passed to the stactools package...
For example, this is the bounding box stored in bbox
for tile 10SGG at timestamp 2016-05-05T19:04:02.027000Z
and id S2A_MSIL2A_20160505T190402_R113_T10SGG_20210211T085211
.
But this is what the tile actually contains:
For tile 10SGG there are several more of these cases. More examples are:
S2A_MSIL2A_20160714T184922_R113_T10SGG_20210212T043726
S2A_MSIL2A_20160624T184922_R113_T10SGG_20210211T224812
S2A_MSIL2A_20160614T190352_R113_T10SGG_20210211T193819
S2A_MSIL2A_20160604T184922_R113_T10SGG_20210211T163042
S2A_MSIL2A_20160525T190352_R113_T10SGG_20210211T141725
S2A_MSIL2A_20160515T184922_R113_T10SGG_20210211T113732
S2A_MSIL2A_20160505T190402_R113_T10SGG_20210211T085211
S2A_MSIL2A_20160415T190352_R113_T10SGG_20210211T040941
S2A_MSIL2A_20160405T190352_R113_T10SGG_20210211T015225
S2A_MSIL2A_20160326T185252_R113_T10SGG_20210528T193911
S2A_MSIL2A_20160306T190352_R113_T10SGG_20210528T112222
S2A_MSIL2A_20160215T190352_R113_T10SGG_20210528T070200
S2A_MSIL2A_20160205T190342_R113_T10SGG_20210528T030140
S2A_MSIL2A_20160126T185642_R113_T10SGG_20210527T232223
S2A_MSIL2A_20160116T190342_R113_T10SGG_20210527T184113
S2A_MSIL2A_20160106T190352_R113_T10SGG_20210527T104143
S2A_MSIL2A_20151227T185812_R113_T10SGG_20210526T202139
S2A_MSIL2A_20151217T190352_R113_T10SGG_20210526T062535
This is also not the only sentinel tile affected.
Hi,
I am trying to access ERA5 dataset from Planetary Computer Data Catalog using Python.
https://planetarycomputer.microsoft.com/dataset/era5-pds
On the page, it says that the temporal extent for this dataset is 01/01/1979 – Present, but if I follow the tutorial on the Example Notebook and change the code to
search = catalog.search(
collections=["era5-pds"], datetime="2021-01", query={"era5:kind": {"eq": "an"}}
)
for example, the search returns no items.
In fact, in https://planetarycomputer.microsoft.com/api/stac/v1/collections/era5-pds/items it is clear that the "end_datetime":"2020-12-31T23:00:00Z"
.
Is this correct?
I am very frustrated with that because the service worked very well for my research, but I need data up to the present.
Thanks for the service, it accesses the data faster than I was expecting.
Best,
The values for file:checksum are not valid multihash values, e.g. in https://planetarycomputer.microsoft.com/api/stac/v1/collections/sentinel-3-olci-wfr-l2-netcdf/items/S3A_OL_2_WFR_20240310T021506_20240310T021806_0179_110_046_1980
Hi, thanks for providing these datasets. I would like to access the goes-cmi dataset through fsspec
and adlfs
(i.e. pip install fsspec adlfs
), but I cannot seem to figure it out.
It seems the account_name
associated with these datasets is pcstacitems
? That doesn't seem to be documented. Anyway, I cannot get a straightforward anonymous access to any dataset. For example, this code
import fsspec
fs = fsspec.filesystem("abfs", account_name="pcstacitems")
fs.ls("/") # or fs.ls("abfs://items/" and so on
errors with
ErrorCode:NoAuthenticationInformation
Content: <?xml version="1.0" encoding="utf-8"?><Error><Code>NoAuthenticationInformation</Code><Message>Server failed to authenticate the request. Please refer to the information in the www-authenticate header.
Or sometimes with things like "not found". Is there an easy way to access these datasets through fsspec? I also tried initiating a SAS token, but to no avail.
I also tried to figure out the details from this, but to no avail. https://planetarycomputer.microsoft.com/api/stac/v1/collections/goes-cmi/items/OR_ABI-L2-F-M6_G17_s20222200300319
For example, this also didn't work.
import fsspec
fs = fsspec.filesystem("abfs", account_name="goeseuwest")
fs.ls("noaa-goes-cogs/goes-17")
It would be nice if these datasets could be provided more transparently. I understand the desire to make them easy to use via Jupyter notebooks (like the examples) but I found those to be extremely hard for proper application/deployment (beyond the simple examples). For comparison, the GOES datasets could be easily used on AWS and GCP without any issue:
import fsspec
gcp = fsspec.filesystem("gs", token="anon", anon=True)
gcp.ls("gcp-public-data-goes-17/") # prints ['gcp-public-data-goes-17/ABI-L1b-RadC'...
aws = fsspec.filesystem("s3", token="anon", anon=True)
aws.ls("noaa-goes17/") # prints ['noaa-goes17/ABI-L1b-RadC', 'no...
Now trying:
# now trying:
az = fsspec.filesystem("abfs", token="anon", anon=True)
results in this error:
ValueError: Must provide either a connection_string or account_name with credentials!!
Thank you!
I was just debugging why I had duplicate tiles in a certain pipeline. I found out that this was related to the datastrip_id property, which depends on to the downlink station - see S2 specification. The data seems to be the same for both item's, although I only checked this for 1 band. Why do you keep duplicate data when multiple downlink stations are used?
Anyways... I then noticed that the datastrip id that is included in the item.id
does not match the one that is provided in item.properties.s2:datastrip_id
. Not sure if this is important, but I thought it would be worth mentioning. Please see example below.
from copy import deepcopy
import pandas as pd
import planetary_computer
import pystac_client
def items_to_dataframe(items):
_items = []
for i in items:
_i = deepcopy(i)
_items.append(_i)
df = pd.DataFrame(pd.json_normalize(_items))
for field in ["properties.datetime"]:
if field in df:
df[field] = pd.to_datetime(df[field])
df = df.sort_values("properties.datetime")
return df
catalog = pystac_client.Client.open(
"https://planetarycomputer.microsoft.com/api/stac/v1",
modifier=planetary_computer.sign_inplace,
)
roi = {
"type": "Polygon",
"coordinates": [
[
[146.0678527, -15.3746464],
[147.0909455, -15.3765786],
[147.0913918, -16.369226],
[146.0632786, -16.3671625],
[146.0678527, -15.3746464],
]
],
}
search = catalog.search(
collections=["sentinel-2-l2a"],
intersects=roi,
datetime="2022-01-01/2022-11-01",
)
items = search.item_collection()
items_ = [i.to_dict() for i in items]
df = items_to_dataframe(items_)
def split_id(x):
return pd.Series(x.id.split("_"))
df[
[
"mission_id",
"product_level",
"datetake_start_time",
"relative_orbit_number",
"tilenumber",
"id_datastrip",
]
] = df.apply(split_id, axis=1)
# two examples for which I found data which same data, but different datastrips
SAME_DATA_DIFFERENT_DATASTRIP = [
"S2A_MSIL2A_20220128T002711_R016_T55LDC_20220227T190716",
"S2A_MSIL2A_20220128T002711_R016_T55LDC_20220212T221526",
]
df_ = df.loc[df["id"].isin(SAME_DATA_DIFFERENT_DATASTRIP)].copy()
# makes it a bit easier to see the difference
def split_s2_datstrip(x):
return x["properties.s2:datastrip_id"].split("_")[6]
df_["s2_datastrip"] = df_.apply(split_s2_datstrip, axis=1)
df_[["id_datastrip", "s2_datastrip"]]
id_datastrip | s2_datastrip |
---|---|
20220227T190716 | 20220227T190717 |
20220212T221526 | 20220212T221527 |
After adding Swagger UI React, the app generates two types of warnings.
On build:
No parser and no filepath given, using 'babel' the parser now but this will throw an error in the future. Please specify a parser or a filepath so one can be inferred.
at runtime:
react_devtools_backend.js:2430 Warning: componentWillReceiveProps has been renamed, and is not recommended for use. See https://reactjs.org/link/unsafe-component-lifecycles for details.
This is at least partly being tracked by swagger-api/swagger-ui#5729, though hasn't gotten much attention.
https://planetarycomputer.microsoft.com/api/stac/v1/collections/mtbs lists two bboxes, but according to the STAC Collection spec you must list three bboxes. First the union and then all sub-bboxes.
This leads to downstream issues in the clients where the first bbox is omitted: https://mspc.lutana.de/collections/mtbs
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.