roocs / daops Goto Github PK
View Code? Open in Web Editor NEWLicense: BSD 3-Clause "New" or "Revised" License
License: BSD 3-Clause "New" or "Revised" License
workflow.apply()
In our consolidate
step, we read the datasets and decide which are in the requested time range, by opening them all with xarray
:
https://github.com/roocs/daops/blob/master/daops/utils/consolidate.py#L42-L69
In our new intake
catalog approach, we have the time information for each file directly accessible. We could allow daops
to lookup an intake
catalog (if we can work out a clean way to make this connection).
This would speed things up.
Tagging: @cehbrecht @ellesmith88
Follow roocs-utils approach
See code here:
https://github.com/roocs/daops/blob/master/daops/utils/consolidate.py#L131-L133
Need to move the catalog lookup to work on a per-dataset level. (Could cache them all in an object before looping through).
Propose that we try to synchronise our argument names with those defined in OGC Common:
bbox
: http://docs.opengeospatial.org/is/17-069r3/17-069r3.html#_parameter_bbox
datetime
: http://docs.opengeospatial.org/is/17-069r3/17-069r3.html#_parameter_datetime
And any others we can find.
@huard @cehbrecht: we are working on a branch that will allow daops
to be called as a command-line utility (for subset
only) - for testing with the ESA Earth Observation Exploitation Platform Common Architecture (EOEPCA) framework. EOEPCA uses ADES (Application Deployment and Execution Service) to generate a WPS and allow applications to be deployed and run to it in the following way:
daops/cli.py
interface)Are you happy for us to add these features to the master
branch of daops
? There should be no disruption to the existing components.
We have a common set of arguments that need to be sent to either of the open
functions in xarray
:
xr.open_dataset(...)
xr.open_mfdataset(...)
We need to make sure that all calls to these call:
roocs_utils.xarray_utils.xarray_utils.open_xr_dataset(...)
We need to review the daops
, dachar
and clisops
code to check they are all doing this correctly. @ellesmith88, please can you take a look at this. Thanks
Create a unit tests module for the new ResultsDB
class.
ECMWF have done classifications of variables into their required regridding types:
They can provide this.
They describe the regridding problem in terms of a sparse matrix calculation where a set of weights are applied. Once the matrices are pre-computed the computation is efficient.
Error seen on user request on production service.
See error 41 from 26.03 in this notebook:
https://nbviewer.jupyter.org/github/roocs/rooki/blob/master/notebooks/tests/test-c3s-cmip6-subset-errors-dkrz-2021-03-23.ipynb
Run this request:
wf = ops.Subset(
ops.Input(
'tas', ['c3s-cmip6.ScenarioMIP.NIMS-KMA.KACE-1-0-G.ssp245.r1i1p1f1.Amon.tas.gr.v20191217']
),
time="2021-01-01/2100-12-31",
area="-10,30,35,70"
)
resp = wf.orchestrate()
resp.status
Get into this error:
Process error: list index out of range
File "/usr/local/anaconda/envs/rook/lib/python3.7/site-packages/daops/utils/consolidate.py", line 42, in consolidate
ds = open_xr_dataset(dset)
File "/usr/local/anaconda/envs/rook/lib/python3.7/site-packages/roocs_utils/xarray_utils/xarray_utils.py", line 33, in open_xr_dataset
return xr.open_dataset(dset[0], use_cftime=True)
IndexError: list index out of range
Please check whether the Fixer approach that we have implemented will be flexible enough to address all issues with CMIP5, CMIP6 and CORDEX data.
Use the ESMValTool repository to review examples of fixes that are needed.
Do we need something more than:
We're getting ready to open a rook PR adding average_shape process, and we'd need a daops release with the latest PR supporting that new operation here.
Compatible with clisops 0.12.1
Should we support a "start,end,interval" selection of coordinates?
We have a system for generating fixes for adding the lead time variable, which uses:
https://github.com/roocs/daops/blob/decadal_fixes/daops/data_utils/coord_utils.py#L47-L76
For each fix, we include a string that includes a list of values, e.g.:
"fix_id": "AddCoordFix",
"operands": {
"var_id": "leadtime",
"value": "15,45,74,105,135,166,196,227,258,288,319,349,380,410,439,470,500,531,561,592,623,653,684,714,745,775,804,835,865,896,926,957,988,1018,1049,1079,1110,
1140,1169,1200,1231,1262,1292,1323,1354,1384,1415,1445,1476,1506,1535,1566,1596,1627,1657,1688,1719,1749,1780,1810,1841,1871,1900,1931,1961,1992,2022,2053,2084,2114,2145,2175,
2206,2236,2265,2296,2326,2357,2387,2418,2449,2479,2510,2540,2571,2601,2630,2661,2692,2723,2753,2784,2815,2845,2876,2906,2937,2967,2996,3027,3057,3088,3118,3149,3180,3210,3241,
3271,3302,3332,3361,3392,3422,3453,3483,3514,3545,3575,3606,3636,3667,3697",
"dim": [
"time"
],
"dtype": "float64",
"attrs": {
"long_name": "Time elapsed since the start of the forecast",
"standard_name": "forecast_period",
"units": "days"
},
"encoding": {
"dtype": "double"
}
},
"source": {
"name": "ceda",
"version": "",
"comments": "",
"url": "https://github.com/cp4cds/c3s34g_master/tree/master/Decadal"
}
An alternative would be to encode a rule that tells the fix function to lookup the required values and add them into the new coordinate variable. Instead of value
being set as a list of values, it could be some kind of rule such as:
"value": "derive: daops.data_utils.time_utils._get_lead_times"
@ellesmith88 This might be overkill but it is probably worth a discussion.
Update the JSON and the Fixer code so that fixes are represented as:
Hi everyone,
I firstly wanted to say thank you for all the efforts that have already been put into this framework. I would love to contribute/integrated daops
more into my workflow.
I am maintaining cmip6_preprocessing and am very interested in migrating some of the things I fix (in a quite ad-hoc fashion for now) in a more general way over here.
My primary goal for cmip6_preprocessing is to use it with python and the scientific pangeo stack, but I like the idea of documenting the actual problems (needing 'fixes') in an general and language-agnostic way over here. I was very impressed by the demo @agstephens gave a while ago during the CMIP6 cloud meeting and am now thinking of finally getting to work on this.
I am still really unsure how to actually contribute fixes to this repo, though. What I propose is to work my way through this using some quite simple fixes that are relatively easy to apply and are already documented in errata.
Specifically, I am currently testing this python code, which changes some of the metadata necessary to determine the point in time where a dataset was branched of the parent model run.
def fix_metadata_issues(ds):
# https://errata.es-doc.org/static/view.html?uid=2f6b5963-f87e-b2df-a5b0-2f12b6b68d32
if ds.attrs["source_id"] == "GFDL-CM4" and ds.attrs["experiment_id"] in [
"1pctCO2",
"abrupt-4xCO2",
"historical",
]:
ds.attrs["branch_time_in_parent"] = 91250
# https://errata.es-doc.org/static/view.html?uid=61fb170e-91bb-4c64-8f1d-6f5e342ee421
if ds.attrs["source_id"] == "GFDL-CM4" and ds.attrs["experiment_id"] in [
"ssp245",
"ssp585",
]:
ds.attrs["branch_time_in_child"] = 60225
return ds
These ingest an xarray.dataset and check certain conditions within the attributes, and then overwrite attributes accordingly. I could easily parse those out to dataset-specific 'fixes'.
Where exactly could I translate this into a fix within the daops framework? Very happy to start a PR (and then test the implementation from cmip6_preprocessing), but I am afraid I am still a bit unsure about the daops internals. Any pointers would be greatly appreciated.
class ResultSet(object):
def __init__(self, inputs=None):
self._results = collections.OrderedDict()
self.metadata = {"inputs": inputs, "process": "something", "version": 0.1}
self.file_paths = []
def add(self, dset, result):
self._results[dset] = result
for item in result:
if isintance(item, str) and os.path.isfile(item):
self.file_paths.append(result)
The error logs have shown that this request fails:
from daops.ops.subset import subset
inputs = {
'collection': 'c3s-cmip6.ScenarioMIP.NCC.NorESM2-MM.ssp245.r1i1p1f1.day.tasmax.gn.v20191108',
'area': (8.37, 39.12, 8.56, 39.26),
'level': None,
'time': ('2006-01-01T00:00:00', '2099-12-30T00:00:00'),
'output_type': 'netcdf',
'output_dir': '.',
'split_method': 'time:auto',
'file_namer': 'standard'
}
resp = subset(**inputs)
The error we are seeing is:
/usr/local/Miniconda3-py39_4.9.2-Linux-x86_64/envs/rook/lib/python3.7/site-packages/clisops/core/subset.py: found within input date time range. Defaulting to minimum time step in xarray object.
da = subset_time(da, start_date=start_date, end_date=end_date)
/usr/local/Miniconda3-py39_4.9.2-Linux-x86_64/envs/rook/lib/python3.7/site-packages/clisops/core/subset.py:een nudged to nearest valid time step in xarray object.
da = subset_time(da, start_date=start_date, end_date=end_date)
ZeroDivisionError: float divmod()
My first guess is that the lat and lon selection is coming back with no data because the range is too small - but that that doesn't actually trigger an exception in xarray, then subsequent processing is highlighting the error.
@ellesmith88 please take a look. Thanks
Can we re-use or learn from Dask internal DAG model for representing workflows - in python?
The subset()
operation already supports detailed parsers. Therefore, it is not necessary to check or modify the command-line arguments before they get sent through.
The consolidate function works for a dataset id:
parameterise(
collection='c3s-cmip5.output1.ICHEC.EC-EARTH.historical.day.atmos.day.r1i1p1.tas.latest',
time='1850/1855')
... and files with DRS folder structure:
parameterise(
collection='/data/c3s-cmip5/output1/ICHEC/EC-EARTH/historical/day/atmos/day/r1i1p1/tas/latest/tas_day_EC-EARTH_historical_r1i1p1_18500101-18591231.nc',
time='1850/1855')
But it fails when only the file name is given without the DRS folders:
parameterise(
collection='/data/tas_day_EC-EARTH_historical_r1i1p1_18500101-18591231.nc',
time='1850/1855')
Error message:
p = parameterise(
collection='/data/tas_day_EC-EARTH_historical_r1i1p1_18500101-18591231.nc',
time='1850/1855')
result = subset(**p)
ValueError: max() arg is an empty sequence
The results of subset
only have the filename but not the DRS folder structure:
p = parameterise(collection='c3s-cmip5.output1.ICHEC.EC-EARTH.historical.day.atmos.day.r1i1p1.tas.latest', time='1850/1855')
result = subset(**p)
result.file_paths
Out[9]: ['./tas_day_EC-EARTH_historical_r1i1p1_18500101-18551229.nc']
When we chain the subset
operators then the second subset
operation will fail on the output generated by the first one.
Running in ipython:
In [1]: from daops.ops.subset import subset
In [2]: from roocs_utils.parameter import parameterise
In [3]: p = parameterise(collection='/Users/pingu/tmp/data/tas_day_EC-EARTH_historical_r1i1p1_18500101-18591231.nc', time='1850/1855')
In [4]: result = subset(**p)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-4-b4d4a92f06bc> in <module>
----> 1 result = subset(**p)
~/miniconda3/envs/rook/lib/python3.7/site-packages/daops/ops/subset.py in subset(collection, time, area, level, output_dir, output_type, split_method, file_namer)
59
60 collection = consolidate.consolidate(
---> 61 parameters.get("collection"), time=parameters.get("time")
62 )
63
~/miniconda3/envs/rook/lib/python3.7/site-packages/daops/utils/consolidate.py in consolidate(collection, **kwargs)
79 # convert dset to ds_id to work with elasticsearch index
80 if not dset.count(".") > 6:
---> 81 dset = convert_to_ds_id(dset)
82
83 if "time" in kwargs:
~/miniconda3/envs/rook/lib/python3.7/site-packages/daops/utils/consolidate.py in convert_to_ds_id(dset)
48 elif os.path.isfile(dset) or dset.endswith(".nc"):
49 dset = dset.split("/")
---> 50 i = max(loc for loc, val in enumerate(dset) if val.lower() in projects)
51 ds_id = ".".join(dset[i:-1])
52 return ds_id
ValueError: max() arg is an empty sequence
I have got daops
working with another data set (not ESGF) that uses file path mappings in the config, e.g.:
[project:haduk_grid]
base_dir = {{ ceda_base_dir }}/archive/badc/ukmo-hadobs/data/insitu/MOHC/HadOBS/HadUK-Grid
file_name_template = {__derive__var_id}_hadukgrid_uk_{spatial_average}_{frequency}_{__derive__time_range}.{__derive__extensi
on}
facet_rule = project version_major version_minor version_patch version_extra spatial_average frequency variable version
fixed_path_modifiers =
variable:groundfrost pv rainfall sfcWind snowLying sun tas tasmin
frequency:mon
fixed_path_mappings =
haduk_grid.v1.0.3.0.1km.{frequency}.{variable}.v20210712:v1.0.3.0/1km/{variable}/{frequency}/v20210712/*.nc
haduk_grid.v1.0.2.1.1km.{frequency}.{variable}.v20200731:v1.0.2.1/1km/{variable}/{frequency}/v20200731/*.nc
The following code is inefficient:
https://github.com/roocs/daops/blob/master/daops/utils/consolidate.py#L58-L87
It reads:
We could try to provide hints to tell daops the date ranges in the files without having to open them. That would speed things up massively.
E.g.:
fixed_path_mappings =
haduk_grid.v1.0.3.0.1km.{frequency}.{variable}.v20210712:v1.0.3.0/1km/{variable}/{frequency}/
v20210712/*_(?P<startYYYYMM>\d{6})-(?P<endYYYYMM>\d{6}).nc
Then the code could parse the regex in the file name and not have to open the file(s).
Hi @ellesmith88, I note that we have these empty modules:
https://github.com/roocs/daops/blob/master/tests/test_operations/test_orchestrate.py
https://github.com/roocs/daops/blob/master/daops/ops/orchestrate.py
Can they be deleted?
daops and dachar should use same common root dirs.
We should remove root_dir from the operation function parameters in daops. But it should be possible to override their values somewhere, e.g. in a python object or an environment variable.
daops needs clisops:: all
dachar needs clisops:: general xarray utils
daops needs dachar:: root dirs
roocs-utils - must be lightweight and have no dependencies except xarray
The consolidate
function appears to return different object types in this block:
https://github.com/roocs/daops/blob/master/daops/utils/consolidate.py#L151-L168
We think this is wrong - but it might affect how daops
and rook
interact with the function - need to fix.
Follow OGC good practice and other relevant services to define sensible inputs for spatial/temporal parameters etc.
Needs some research into existing services.
E.g. should time window look like?
"1999-01-01T00:00:00Z/2000-10-10T12:00:00Z"
... but works locally in my conda env.
Hi @alaniwi,
In the check_result(...)
function, here, ...
Please can you add a check on the entire output array to assert that it is not all NANs or fill values. E.g. using something like numpy.isnan
or equivalent for xarray
.
This would help us spot where subsetting operation had gone wrong and return an xarray.Dataset
which had no valid data in the array.
In consolidate()
: we look for the intersection of years requested with the years in the files.
Should we be doing a complete time selection rather than just looking at years? I can't remember why we made that decision :-)
Ouranos is using ncml to provide data fixes.
Point @huard to an example where ncml does not cover the needed fix.
A fix currently looks like this:
https://github.com/roocs/proto-lib-34e/blob/master/fixes/cmip5.output1.INM.inmcm4.rcp45.mon.ocean.Omon.r1i1p1.latest.zostoga.json
Could our scanner also generate ncml fixes need by ouranos? So the scanner could be a shared component.
Currently this is all prototyping ...
We need to return some kind of provenance information after processing. Could we include details of which datasets were fixed and how?
Extend the subset
operation to support the proposed extension in: roocs/34e-mngmt#105
Key issues:
level
: allow x1/x2
and x1,x2,x3,x4
time
: allow x1/x2
and x1,x2,x3,x4
year
, month
, day
: add these arguments as options instead of time
- either one or the other - if both: default to use time
(<start>, <end>)
and a sequence of (<value1>, <value2>)
- we need a way for our parser to know the difference - maybe work from rook
downwards...rook
will know if it is a range or a sequence, maybe the range is a special object rather than tuple!?Pre-requisites:
Task:
normalise
will look up whether fixes need to applied to each dataset and will then apply them.Required steps to manually add fixes:
dachar.fixes....
, with associated unit tests if requireddachar propose-fixes -p cmip6 <json_file>
dachar propose-fixes -p cmip6 --file-list=datasets_files.txt <fix_template.json>
Relevant to:
@agstephens @ellesmith88 can we make an 0.1.0 release with a reference to clisops 0.1..0? After that we can move to the xclim subset module integrated in clisops.
I can make the release ... but probably need permissions.
See also: roocs/clisops#1
tests/test_cli.py
python -m pytest -c null tests/test_cli.py
daops
from the command-line instead of from inside pythontests/test_operations/test_subset.py
and copy these tests into tests/test_cli.py
(renaming them as being called by cli
:
To get the mapping just use the <index_name>/_mapping
endpoint (e.g.: https://es14.ceda.ac.uk:9200/c3s-roocs-fix-prop/_mapping). It is worth paring this down as you will get all the default stuff in there too, You only need the mappings which are non-standard.
Loading is as simple as
from elasticsearch import Elasticsearch
import json
with open('mapping_file.json') as reader:
mapping = json.load(reader)
index_name = 'index_name'
es = Elasticsearch()
if not es.indices.exists(index_name):
es.indices.create(index_name, body=mapping)
You can do a cross-cluster re-index to copy the data across:
https://elasticsearch-py.readthedocs.io/en/v7.11.0/helpers.html#reindex
Note: CEDA public end-point is: https://elasticsearch.ceda.ac.uk/c3s-roocs-fix-prop/_mapping
Example Search with no body specified:
https://elasticsearch.ceda.ac.uk/c3s-roocs-fix-prop/_search
@ellesmith88: I have created the following unit test module:
https://github.com/roocs/daops/blob/master/tests/test_xarray/test_xarray_aggregation.py
Most of it is in the form of skeleton code/stubs. Please can you get it working as a valid unit test.
The purpose of it is to ensure that we have tested the normal behaviour of xarray.open_mfdataset()
- just to make sure that our assumptions throughout roocs
are appropriate.
Philosophically, we created daops
and rook
to deal with dataset identifiers, which tend to include only a single data variable (along with its metadata and coordinate variables). As we consider the wider use of roocs
we find, as with the ESA CCI datasets at CEDA, that some datasets have many variables. For example, this kerchunk
file links to NetCDF
files that contain 204 variables!
Here is an example request to remind us of the existing interface (using the command-line daops subset
approach):
daops subset --area 30,-10,65,30 --time 2000-01-01/2000-02-30 --levels "/" --time-components "" --output-dir /tmp --file-namer simple https://data.ceda.ac.uk/neodc/esacci/cloud/metadata/kerchunk/version3/L3C/ATSR2-AATSR/v3.0/ESACCI-L3C_CLOUD-CLD_PRODUCTS-ATSR2_AATSR-199506-201204-fv3.0-kr1.1.json
So, should we extend the daops interface to allow specific selection of variables?
If we decide to support this extension, then maybe we have two options:
https://data.ceda.ac.uk/neodc/esacci/cloud/metadata/kerchunk/version3/L3C/ATSR2-AATSR/v3.0/ESACCI-L3C_CLOUD-CLD_PRODUCTS-ATSR2_AATSR-199506-201204-fv3.0-kr1.1.json#toa_swup,toa_swup_clr,toa_swup_hig
So a full command might be:
daops subset
--area 30,-10,65,30
--time 2000-01-01/2000-02-30
--levels "/"
--time-components ""
--output-dir /tmp
--file-namer simple
https://data.ceda.ac.uk/neodc/esacci/cloud/metadata/kerchunk/version3/L3C/ATSR2-AATSR/v3.0/ESACCI-L3C_CLOUD-CLD_PRODUCTS-ATSR2_AATSR-199506-201204-fv3.0-kr1.1.json#toa_swup,toa_swup_clr,toa_swup_hig
variables
:variables
: list of strings (or variable IDs) - DEFAULT = None
(i.e. include all variables)time
area
level
collection
@cehbrecht: what are your thoughts on this proposal?
Error on user request in production system.
See notebook, error 21, 24.03:
https://nbviewer.jupyter.org/github/roocs/rooki/blob/master/notebooks/tests/test-c3s-cmip6-subset-errors-dkrz-2021-03-23.ipynb
Run:
wf = ops.Subset(
ops.Input(
'tos', ['c3s-cmip6.ScenarioMIP.CNRM-CERFACS.CNRM-CM6-1.ssp245.r1i1p1f2.Omon.tos.gn.v20190219']
),
# time="2021-01-01/2050-12-31",
area="1,40,2,4"
)
resp = wf.orchestrate()
resp.status
Error:
Process error: Cannot apply_along_axis when any iteration dime
nsions are 0
Traceback (most recent call last):
File "/usr/local/anaconda/envs/rook/lib/python3.7/site-packages/rook/director/director.py", line 156, in process
file_uris = runner(self.inputs)
File "/usr/local/anaconda/envs/rook/lib/python3.7/site-packages/rook/utils/subset_utils.py", line 5, in run_subset
result = subset(**args)
File "/usr/local/anaconda/envs/rook/lib/python3.7/site-packages/daops/ops/subset.py", line 77, in subset
result_set = Subset(**locals()).calculate()
File "/usr/local/anaconda/envs/rook/lib/python3.7/site-packages/daops/ops/base.py", line 88, in calculate
process(self.get_operation_callable(), norm_collection, **self.params),
File "/usr/local/anaconda/envs/rook/lib/python3.7/site-packages/daops/processor.py", line 19, in process
result = operation(dset, **kwargs)
File "/usr/local/anaconda/envs/rook/lib/python3.7/site-packages/clisops/ops/subset.py", line 165, in subset
return op.process()
File "/usr/local/anaconda/envs/rook/lib/python3.7/site-packages/clisops/ops/base_operation.py", line 89, in process
processed_ds = self._calculate()
File "/usr/local/anaconda/envs/rook/lib/python3.7/site-packages/clisops/ops/subset.py", line 63, in _calculate
result = subset_bbox(ds, **self.params)
File "/usr/local/anaconda/envs/rook/lib/python3.7/site-packages/clisops/core/subset.py", line 251, in func_checker
return func(*args, **kwargs)
File "/usr/local/anaconda/envs/rook/lib/python3.7/site-packages/clisops/core/subset.py", line 875, in subset_bbox
da[var] = da[var].where(lon_cond & lat_cond, drop=True)
File "/usr/local/anaconda/envs/rook/lib/python3.7/site-packages/xarray/core/common.py", line 1273, in where
return ops.where_method(self, cond, other)
File "/usr/local/anaconda/envs/rook/lib/python3.7/site-packages/xarray/core/ops.py", line 203, in where_method
keep_attrs=True,
File "/usr/local/anaconda/envs/rook/lib/python3.7/site-packages/xarray/core/computation.py", line 1134, in apply_ufunc
keep_attrs=keep_attrs,
File "/usr/local/anaconda/envs/rook/lib/python3.7/site-packages/xarray/core/computation.py", line 271, in apply_dataarray_vfunc
result_var = func(*data_vars)
File "/usr/local/anaconda/envs/rook/lib/python3.7/site-packages/xarray/core/computation.py", line 632, in apply_variable_ufunc
for arg, core_dims in zip(args, signature.input_core_dims)
File "/usr/local/anaconda/envs/rook/lib/python3.7/site-packages/xarray/core/computation.py", line 632, in <listcomp>
for arg, core_dims in zip(args, signature.input_core_dims)
File "/usr/local/anaconda/envs/rook/lib/python3.7/site-packages/xarray/core/computation.py", line 542, in broadcast_compat_data
data = variable.data
File "/usr/local/anaconda/envs/rook/lib/python3.7/site-packages/xarray/core/variable.py", line 374, in data
return self.values
File "/usr/local/anaconda/envs/rook/lib/python3.7/site-packages/xarray/core/variable.py", line 554, in values
return _as_array_or_item(self._data)
File "/usr/local/anaconda/envs/rook/lib/python3.7/site-packages/xarray/core/variable.py", line 287, in _as_array_or_item
data = np.asarray(data)
File "/usr/local/anaconda/envs/rook/lib/python3.7/site-packages/numpy/core/_asarray.py", line 102, in asarray
return array(a, dtype, copy=False, order=order)
File "/usr/local/anaconda/envs/rook/lib/python3.7/site-packages/xarray/core/indexing.py", line 693, in __array__
self._ensure_cached()
File "/usr/local/anaconda/envs/rook/lib/python3.7/site-packages/xarray/core/indexing.py", line 690, in _ensure_cached
self.array = NumpyIndexingAdapter(np.asarray(self.array))
File "/usr/local/anaconda/envs/rook/lib/python3.7/site-packages/numpy/core/_asarray.py", line 102, in asarray
return array(a, dtype, copy=False, order=order)
File "/usr/local/anaconda/envs/rook/lib/python3.7/site-packages/xarray/core/indexing.py", line 663, in __array__
return np.asarray(self.array, dtype=dtype)
File "/usr/local/anaconda/envs/rook/lib/python3.7/site-packages/numpy/core/_asarray.py", line 102, in asarray
return array(a, dtype, copy=False, order=order)
File "/usr/local/anaconda/envs/rook/lib/python3.7/site-packages/xarray/core/indexing.py", line 568, in __array__
return np.asarray(array[self.key], dtype=None)
File "/usr/local/anaconda/envs/rook/lib/python3.7/site-packages/xarray/backends/netCDF4_.py", line 86, in __getitem__
key, self.shape, indexing.IndexingSupport.OUTER, self._getitem
File "/usr/local/anaconda/envs/rook/lib/python3.7/site-packages/xarray/core/indexing.py", line 853, in explicit_indexing_adapter
result = raw_indexing_method(raw_key.tuple)
Matt says we might get a lot for free if we use CWL. Look into whether Python implementations could manage orchestrate for us.
See examples of time and lat/lon inputs here:
https://github.com/bird-house/finch/blob/master/finch/processes/wpsio.py
More generally, we will have some inputs that are common across our packages. These might include data_refs
or resources
- that specify ESGF datasets. Maybe we should define these as part of daops
in a generalised way that rook
can use.
Once roocs-utils functions have been written
daops.utils.consolidate:
- _consolidate_dset(dset):
if dset.startswith('http'): raise ...not supported (yet)...
if os.path.isfile(dset): return dset
if dset.count('.') > 6: base_dir = roocs_utils.utils.project_utils.get_project_base_dir(dset)
if os.path.isdir(dset): return os.path.join(dset, '*.nc')
else: raise No idea what it is
With split method implemented, the output of subset is a list of lists. Is this correct?
All of these need fixing to say what order the area elements are in:
Lines 54 to 55 in 3282f63
It turns out that the answer is west, south, east, north:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.