aditya-grover / climate-learn Goto Github PK
View Code? Open in Web Editor NEWSource code for ClimateLearn
License: MIT License
Source code for ClimateLearn
License: MIT License
Describe the bug
I tried to use DataModule, for geopotential at 500hPa and also for temperature at 850hPa, as done for surface variables, e.g., 2m temperature or total precipitation. However, when using DataModule for one variable at one pressure level (e.g., geopotential_500), it returns an error when trying to load the data (from load_from_nc).
To Reproduce
Steps to reproduce the behavior:
from climate_learn.utils.datetime import Year, Days, Hours
from climate_learn.data import DataModule
data_module = DataModule(
dataset = "ERA5",
task = "forecasting",
root_dir = DATADIR,
in_vars = ["geopotential"],
out_vars = ["geopotential"],
train_start_year = Year(2015),
val_start_year = Year(2016),
test_start_year = Year(2017),
end_year = Year(2018),
pred_range = Days(3),
subsample = Hours(1),
batch_size = 128,
num_workers = 1
)
The error I got:
----> 5 data_module = DataModule(
6 dataset = "ERA5",
7 task = "forecasting",
8 root_dir = DATADIR,
9 in_vars = ["geopotential"],
10 out_vars = ["geopotential"],
11 train_start_year = Year(2015),
12 val_start_year = Year(2016),
13 test_start_year = Year(2017),
14 end_year = Year(2018),
15 pred_range = Days(3),
16 subsample = Hours(1),
17 batch_size = 128,
18 num_workers = 1
19 )
File ~/.conda/envs/pyTT/lib/python3.10/site-packages/climate_learn/data/module.py:112, in DataModule.__init__(self, dataset, task, root_dir, in_vars, out_vars, train_start_year, val_start_year, test_start_year, end_year, root_highres_dir, history, window, pred_range, subsample, batch_size, num_workers, pin_memory)
109 caller = eval(f"{dataset.upper()}{task_string}")
111 train_years = range(train_start_year, val_start_year)
--> 112 self.train_dataset = caller(
113 root_dir,
114 root_highres_dir,
115 in_vars,
116 out_vars,
117 history,
118 window,
119 pred_range.hours(),
120 train_years,
121 subsample.hours(),
122 "train",
123 )
125 val_years = range(val_start_year, test_start_year)
126 self.val_dataset = caller(
127 root_dir,
128 root_highres_dir,
(...)
136 "val",
137 )
File ~/.conda/envs/pyTT/lib/python3.10/site-packages/climate_learn/data/modules/era5_module.py:113, in ERA5Forecasting.__init__(self, root_dir, root_highres_dir, in_vars, out_vars, history, window, pred_range, years, subsample, split)
99 def __init__(
100 self,
101 root_dir,
(...)
110 split="train",
111 ):
112 print(f"Creating {split} dataset")
--> 113 super().__init__(root_dir, root_highres_dir, in_vars, years, split)
115 self.in_vars = list(self.data_dict.keys())
116 self.out_vars = out_vars
File ~/.conda/envs/pyTT/lib/python3.10/site-packages/climate_learn/data/modules/era5_module.py:28, in ERA5.__init__(self, root_dir, root_highres_dir, variables, years, split)
25 self.years = years
26 self.split = split
---> 28 self.data_dict = self.load_from_nc(self.root_dir)
29 if self.root_highres_dir is not None:
30 self.data_highres_dict = self.load_from_nc(self.root_highres_dir)
File ~/.conda/envs/pyTT/lib/python3.10/site-packages/climate_learn/data/modules/era5_module.py:69, in ERA5.load_from_nc(self, data_dir)
67 if len(xr_data.shape) == 3: # 8760, 32, 64
68 xr_data = xr_data.expand_dims(dim="level", axis=1)
---> 69 data_dict[var].append(xr_data)
70 else: # pressure level
71 for level in DEFAULT_PRESSURE_LEVELS:
KeyError: 'geopotential'
I checked in some more detail era5_module.py, it seems (to me) that there might be a bug in the for at line 60:
if len(xr_data.shape) == 3: # 8760, 32, 64
xr_data = xr_data.expand_dims(dim="level", axis=1)
data_dict[var].append(xr_data)
in this case, there is no level considered when using a variable at only one pressure level.
or did I miss something in the use case of DataModule?
Thanks!
Describe the bug
The climatology dimension is incorrect.
To Reproduce
Run the script here: https://gist.github.com/jasonjewik/c339325e4ae33c85e4cecc1356fdae38.
Expected behavior
Instead of getting a 2D tensor of shape [32, 64], I'm getting a 3D tensor of shape [3, 32, 64].
Environment
Additional context
Even if I set history to 1, I still get a shape error.
Hi,
The current way to do forecasting and downscaling involves creating a DataModule
and calling load_model
which are then passed on to the trainer. The former is inherited from LightningDataModule whereas the later is a function that returns a LightningModule.
For reproducibility purposes and clear distinction between source code and hyper-parameters, I believe refactoring to the load_model
should be done.
By refactoring, I mean load_model
should be converted to class that is inherited from pl.LightningModule
and we can then use something like LightningCLI (Similar use case shown in the docs).
This way via YAML only, one can control which class of model, datamodule, etc should be created for a given experiment.
Hello,
I've noticed the use of bilinear interpolation for downscaling climate data in loaders.py
here. Given the complexity of climate data, I'm curious about this choice.
Could you share the rationale behind using bilinear interpolation?
Thank you in advance.
Hi,
To add support for #38 in the current code, it can become quite messy as the dataset source class and task class (eg forecasting) are somewhat coupled As a result, to add support for either of new dataset source, task, or data-loading strategy would involve creating loads of new classes and writing redundant code.
To solve this, I believe that the source dataset class, task specific class, and data-loading strategy class (eg IterableDataset) should be decoupled.
I tried to formalize this as requirements that a good implementation/refactor should follow.
DataModule
would act as an interface to the entire data handling. (DataModule is inherited from LightningDataModule)ERA5
).corgipile
, etc), one should write code only for the loading strategy and it should be dataset agnostic.Unfortunately, as of now I can't think of any structuring that satisfies all the requirements. Hence, I opened this as an issue here, hoping someone would chime in and share their thoughts on this.
Describe the bug
I am following the code on page 25 in the ClimateLearn paper (https://arxiv.org/pdf/2307.01909.pdf) and when I try to do the import for DataModule I get the following error:
ImportError Traceback (most recent call last)
Cell In[1], line 1
----> 1 from climate_learn.data import DataModule
ImportError: cannot import name 'DataModule' from 'climate_learn.data' (/home/user/miniconda3/envs/precip/lib/python3.11/site-packages/climate_learn/data/init.py)
Could someone explain why this import is not working? Perhaps it is related to missing information in the file climate_learn/data/init.py.
As a side question, are there any examples of downscaling with a CNN that could be included in another QuickStart notebook example? For example, similar to the one used in the paper. If so, that would be incredibly helpful.
Thank you for your help!
Jeremy
Environment
Dear climate-learn Team!
I tried to install you repository but both methods described in your readme file failed unfortunately.
The first output looks like this:
The second was a bit weirder, after I downloaded your code base and tried to install with the requirements file:
Do you have any ideas what I could do?
The models in ClimateLearn are both cumbersome to use (no snappy way to load presets, climatology has to be manually set, baselines are baked into models) and insufficiently flexible (hard to add new models). In this issue, I outline proposed changes to resolve these issues. Ideally, I want the model quickstart to change from this:
dm = DataModule(...)
model_kwargs = {...}
optim_kwargs = {...}
mm = load_model(
name="resnet",
task="forecasting",
model_kwargs=model_kwargs,
optim_kwargs=optim_kwargs
)
set_climatology(mm, dm)
fit_lin_reg_baseline(mm, dm, reg_hparam=0.0)
To this:
dm = DataModule(...)
mm = load_forecasting_module(dm, preset="rasp-theurey-2020")
trainer = Trainer(...)
trainer.fit(mm, dm)
trainer.test(mm, dm)
Here, I show examples for load_forecasting_module
. Everything is analogous for load_downscaling_module
. The reason why I split the original load_model
function by task is to mirror the data module design, where ForecastingArgs
is a distinct class from DownscalingArgs
.
load_forecasting_module(data_module, preset)
Loads a preset model module. For example, the following loads the model and optimizers described in this paper.
load_forecasting_module(dm, preset="rasp-thuerey-2020")
Presets also exist for baselines.
load_forecasting_module(dm, preset="climatology")
load_forecasting_module(dm, preset="persistence")
load_forecasting_module(dm, preset="linear-regression")
load_forecasting_module(data_module, preset, model_kwargs)
Loads a preset model module. The user can also pass keyword arguments to modify the model architecture. For example, the following loads Rasp and Thuerey's model, but changes the dropout.
load_forecasting_module(
dm,
preset="rasp-theurey-2020",
model_kwargs={"dropout": 0.3}
)
load_forecasting_module(data_module, preset, model_kwargs, optim, optim_kwargs)
Loads a preset model module. The user can also pass keyword arguments to modify the model architecture. They can also specify the name of an optimizer (which is built into ClimateLearn) and keyword arguments for the optimzer. For example,
load_forecasting_module(
dm,
preset="rasp-theurey-2020",
model_kwargs={"dropout": 0.3},
optim="adamw",
optim_kwargs={"betas": (0.9, 0.95)}
)
load_forecasting_module(data_module, preset, model_kwargs, optimizer)
Loads a preset model module. The user can also pass keyword arguments to modify the model architecture. They can also specify an already instantiated optimizer. For exapmle,
load_forecasting_module(
dm,
preset="rasp-theurey-2020",
model_kwargs={"dropout": 0.3},
optimizer=my_cool_optimizer
)
load_forecasting_module(data_module, model, model_kwargs, optim, optim_kwargs)
Loads a model module with the given model and optimizer, which are defined in ClimateLearn but can be customized by model_kwargs
and optim_kwargs
. For example,
load_forecasting_module(
dm,
model="resnet",
model_kwargs={"n_blocks": 2},
optim="adamw",
optim_kwargs={"betas": (0.9, 0.95)}
)
load_forecasting_module(data_module, model, model_kwargs, optimizer)
Loads a model module with the given model, which is defined in ClimateLearn but can be customized by model_kwargs
. The optimizer
is specified separately. For example:
load_forecasting_module(
deta_module,
model="resnet",
model_kwargs={"n_blocks": 2},
optimizer=my_cool_optimizer
)
load_forecasting_module(data_module, net, optimizer)
Loads a model module which wraps the user-specified network and optimizer. For example:
load_forecasting_module(
data_module,
net=my_cool_network,
optimizer=my_cool_optimizer,
)
load_xxx_module(
data_module: pl.LightningDataModule,
preset: Optional[str] = None,
model: Optional[str] = None,
model_kwargs: Optional[Dict[str, Any]] = None,
optim: Optional[str] = None,
optim_kwargs: Optional[Dict[str, Any]] = None,
net: Optional[torch.nn.Module] = None,
optimizer: Optional[Union[torch.optim, Dict[str, torch.optim]]] = None,
train_loss: Optional[Union[Callable, List[Callable]]] = None,
val_loss: Optional[Union[Callable, List[Callable]]] = None,
test_loss: Optional[Union[Callable, List[Callable]]] = None
)
Note that preset
and model
are aliases for each other. They are kept as two distinct arguments for the sake of clarity. For example the following two function calls return the same module:
load_forecasting_module(dm, preset="rasp-theurey-2020")
load_forecasting_module(dm, model="rasp-theurey-2020")
But in the first case, it is more obvious that the user wants the model which has been defined in Rasp and Theurey (2020). If both preset
and model
are specified, a RuntimeError
will be thrown. This is the same behavior as when net
is passed even if model
is specified, or any other argument conflicts.
The optimizer
argument can either be a PyTorch optimizer or a dictionary which contains two keys: "optimizer"
and "lr_scheduler"
. In the case that it is just a PyTorch optimizer, no scheduler is used for the optimization.
I also add arguments for specifying loss functions. If these are left as None
, the default loss functions which are specified in ClimateLearn will be used. However, the user might want this flexibility. For example, someone might be interested in using the AtmoDist loss for downscaling.
The user can easily load presets. I've shown this for Rasp and Theurey, but we could also include ClimaX, Weyn et al. (2020), and others. Besides just loading the architectures, when possible, we can also load pre-trained models. For example, we could have both "climax"
, which loads the untrained ClimaX model, and "climax-pretrained"
, which loads the pre-trained ClimaX model.
Climatology is set automatically. ClimateLearn requires climatology to be set before training. It doesn't make sense to require the user to remember to do this. Here, climatology is set in the load_xxx_module
function. I show how this is done below.
Baselines are not baked into models. As pointed out in Issue 83, it doesn't always make sense to run persistence because the data module might not support it. Furthermore, the user might not care to see these baselines. In my proposed changes, we separate out the baselines into their own models. If the user wishes to run climatology, persistence, or linear regression, they can do that the same as any other model. For example,
load_forecasting_module(dm, preset="climatology")
New models are easier to add. The user can modify ClimateLearn's presets (e.g., Rasp and Theurey, ClimaX) and built-in architectures (e.g., ResNet, ViT), and they can define their own network and/or optimizer and pass these to the load_xxx_module
function. We can include a page in the documentation about what API is expected for forecasting networks versus downscaling networks.
In the load_xxx_module
function, we can do the following to set climatology automatically.
def load_forecasting_module(dm, ...):
# ...
mm = ForecastingLitModule(...)
mm.set_climatology(dm.get_climatology("all"))
# ...
This relies upon pull request 81 being merged, and also a minor change to DataModule.get_climatology
.
For the persistence baseline, we can do the following to determine if it is available.
def load_forecasting_module(dm, ...):
# ...
if preset == "persistence":
if set(dm.out_vars).issubset(dm.in_vars):
mm = ForecastLitModule(...)
else:
raise RuntimeError()
# ...
Again, this would require just a minor change so that the input variables and the output variables of the dataset are both available at the DataModule
level.
In making these changes, I aim for the following two goals. First, to make it easier to run benchmark models. Second, to make it easier to add a custom model. The flexibility of my proposed API allows for a balance between these two goals.
Is your feature request related to a problem? Please describe.
Currently, the iter approach first saves files into .npz chunks and then reloads them using NpyReader.
Describe the solution you'd like
Instead of duplicating data into a .npz, can we read from the netcdf files without loading all of them into the memory at the same time.
I don't know how to do this. Just creating it as an issue for now.
Describe the bug
When calling init for ERA5Forecasting
, the arguments passed to it's parent class are (root_dir, root_highres_dir, in_vars, years, split)
. Because of this while creating the data_dict , only variables that are part of in_vars are loaded from netcdf files. See lines 45 and 61.
This can result in potential bug at line 122 when we create the output data, if the output variables are not a subset of input variables.
Describe the bug
ShardDataset doesn't work with DDP
but works with DDP_spawn
. The training just hangs before the start of the first epoch.
Hello, I recently tried to load ERA5 using the updated climate-learn package:
era5_data_module = DataModule(
dataset = "ERA5",
task = "forecasting",
root_dir = era_path,
in_vars = ["temperature"],
out_vars = ["temperature"],
train_start_year = Year(1979),
val_start_year = Year(2011),
test_start_year = Year(2013),
end_year = Year(2014),
pred_range = Days(5),
subsample = Hours(6),
batch_size = 128,
num_workers = 1
)
Running the above code produces the following error:
KeyError Traceback (most recent call last)
Cell In [16], line 1
----> 1 era5_data_module = DataModule(
2 dataset = "ERA5",
3 task = "forecasting",
4 root_dir = era_path,
5 in_vars = ["temperature"],
6 out_vars = ["temperature"],
7 train_start_year = Year(1979),
8 val_start_year = Year(2011),
9 test_start_year = Year(2013),
10 end_year = Year(2014),
11 pred_range = Days(5),
12 subsample = Hours(6),
13 batch_size = 128,
14 num_workers = 1
15 )
File ~/climate-learn/src/climate_learn/data/module.py:58, in DataModule.__init__(self, dataset, task, root_dir, in_vars, out_vars, train_start_year, val_start_year, test_start_year, end_year, root_highres_dir, history, window, pred_range, subsample, batch_size, num_workers, pin_memory)
55 caller = eval(f"{dataset.upper()}{task_string}")
57 train_years = range(train_start_year, val_start_year)
---> 58 self.train_dataset = caller(
59 root_dir,
60 root_highres_dir,
61 in_vars,
62 out_vars,
63 history,
64 window,
65 pred_range.hours(),
66 train_years,
67 subsample.hours(),
68 "train",
69 )
71 val_years = range(val_start_year, test_start_year)
72 self.val_dataset = caller(
73 root_dir,
74 root_highres_dir,
(...)
82 "val",
83 )
File ~/climate-learn/src/climate_learn/data/modules/era5_module.py:122, in ERA5Forecasting.__init__(self, root_dir, root_highres_dir, in_vars, out_vars, history, window, pred_range, years, subsample, split)
119 self.pred_range = pred_range
121 inp_data = xr.concat([self.data_dict[k] for k in self.in_vars], dim="level")
--> 122 out_data = xr.concat([self.data_dict[k] for k in self.out_vars], dim="level")
123 self.inp_data = inp_data.to_numpy().astype(np.float32)
124 self.out_data = out_data.to_numpy().astype(np.float32)
File ~/climate-learn/src/climate_learn/data/modules/era5_module.py:122, in <listcomp>(.0)
119 self.pred_range = pred_range
121 inp_data = xr.concat([self.data_dict[k] for k in self.in_vars], dim="level")
--> 122 out_data = xr.concat([self.data_dict[k] for k in self.out_vars], dim="level")
123 self.inp_data = inp_data.to_numpy().astype(np.float32)
124 self.out_data = out_data.to_numpy().astype(np.float32)
KeyError: 'temperature'
Same problem happens with other pressure-level variables such as geopotential.
Is your feature request related to a problem? Please describe.
Currently ShardDataset
, implements __iter__()
to build batch. Inside the iter
, the order of data is determined by self.epoch
. Unfortunately, the self.epoch
is incremented only for the child process and not for the parent process as a result, each epoch thus results in same shuffling order.
Describe the solution you'd like
Having access to the trainer
to retrieve epoch number or set up communication across child processes under the __iter__()
.
Describe the bug
The persistence baseline for forecasting assumes that the last input value can be used as output. But this makes the assumption that the input variables is same as output variables. See this.
Describe the bug
Table 3 in Section 4.2 of the paper "ClimateLearn: Benchmarking Machine Learning for Weather and Climate Modeling" reports RMSE but in Appendix B.4.3 (Climate downscaling metrics) it points to Latitude Weighted RMSE (Eq. 2 in Appendix). I ran the code locally and confirmed that the numbers reported in the paper are showing RMSE and not Latitude Weighted RMSE.
Also, I have a question: Why lat_mse
is used for training the forecasting module but mse
is used for training the downscaling module?
Snapshots
The following snippet shows that load_forecasting_module
uses lat_rmse
as test_loss
but load_downscaling_module
uses rmse
.
climate-learn/src/climate_learn/utils/loaders.py
Lines 215 to 246 in 1a46b08
Is your feature request related to a problem? Please describe.
Not exactly a problem, but it would be nice to have support for optimizers other than just Adam and AdamW.
Describe the solution you'd like
Give user the flexibility to choose his/her own choice of optimizer as long as it is inherited from torch.optim.Optimizer
.
Hello.
I am modifying quickstart.ipynb for cmip6 case. I downloaded CMIP6 data using 'download_mpi_esm1_2_hr'. When I process it next *.nc files next, it throws error. It will be good to have sample scripts for data downloading and processing for other two datasets as well (cmip6 and prism).
I even tried downloading using weatherbench and there I get some other error.
cl.data.download_mpi_esm1_2_hr(
dst="./dataset/cmip6/temperature",
variable="temperature",
)
cl.data.download_mpi_esm1_2_hr(
dst="./dataset/cmip6/geopotential",
variable="geopotential",
)
convert_nc2npz(
root_dir="./dataset/cmip6",
save_dir="./dataset/cmip6/processed",
variables=["temperature", "geopotential"],
start_train_year=1850,
start_val_year=2000,
start_test_year=2005,
end_year=2015,
num_shards=16
)
#########################
Error Message (download_mpi_esm1_2_hr):
#########################
alueError Traceback (most recent call last)
Cell In[4], line 1
----> 1 convert_nc2npz(
2 root_dir="../dataset/cmip6",
3 save_dir="../dataset/cmip6/processed",
4 variables=["temperature", "geopotential"],
5 start_train_year=1850,
6 start_val_year=2000,
7 start_test_year=2005,
8 end_year=2015,
9 num_shards=16
10 )
File /#########################/climate-learn/src/climate_learn/data/processing/nc2npz.py:189, in convert_nc2npz(root_dir, save_dir, variables, start_train_year, start_val_year, start_test_year, end_year, num_shards)
185 test_years = range(start_test_year, end_year)
187 os.makedirs(save_dir, exist_ok=True)
--> 189 nc2np(root_dir, variables, train_years, save_dir, "train", num_shards)
190 nc2np(root_dir, variables, val_years, save_dir, "val", num_shards)
191 nc2np(root_dir, variables, test_years, save_dir, "test", num_shards)
File /#########################/climate-learn/src/climate_learn/data/processing/nc2npz.py:58, in nc2np(path, variables, years, save_dir, partition, num_shards_per_year)
56 for var in variables:
57 ps = glob.glob(os.path.join(path, var, f"{year}.nc"))
---> 58 ds = xr.open_mfdataset(
59 ps, combine="by_coords", parallel=True
60 ) # dataset for a single variable
61 code = NAME_TO_VAR[var]
63 if len(ds[code].shape) == 3: # surface level variables
File /#########################/lib/python3.9/site-packages/xarray/backends/api.py:1046, in open_mfdataset(paths, chunks, concat_dim, compat, preprocess, engine, data_vars, coords, combine, parallel, join, attrs_file, combine_attrs, **kwargs)
1041 datasets = [preprocess(ds) for ds in datasets]
1043 if parallel:
1044 # calling compute here will return the datasets/file_objs lists,
1045 # the underlying datasets will still be stored as dask arrays
-> 1046 datasets, closers = dask.compute(datasets, closers)
1048 # Combine all datasets, closing them in case of a ValueError
1049 try:
File /#########################/lib/python3.9/site-packages/dask/base.py:595, in compute(traverse, optimize_graph, scheduler, get, *args, **kwargs)
592 keys.append(x.dask_keys())
593 postcomputes.append(x.dask_postcompute())
--> 595 results = schedule(dsk, keys, **kwargs)
596 return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
File /#########################/lib/python3.9/site-packages/dask/threaded.py:89, in get(dsk, keys, cache, num_workers, pool, **kwargs)
86 elif isinstance(pool, multiprocessing.pool.Pool):
87 pool = MultiprocessingPoolExecutor(pool)
---> 89 results = get_async(
90 pool.submit,
91 pool._max_workers,
92 dsk,
93 keys,
94 cache=cache,
95 get_id=_thread_get_id,
96 pack_exception=pack_exception,
97 **kwargs,
98 )
100 # Cleanup pools associated to dead threads
101 with pools_lock:
File /#########################/lib/python3.9/site-packages/dask/local.py:511, in get_async(submit, num_workers, dsk, result, cache, get_id, rerun_exceptions_locally, pack_exception, raise_exception, callbacks, dumps, loads, chunksize, **kwargs)
509 _execute_task(task, data) # Re-execute locally
510 else:
--> 511 raise_exception(exc, tb)
512 res, worker_id = loads(res_info)
513 state["cache"][key] = res
File /#########################/lib/python3.9/site-packages/dask/local.py:319, in reraise(exc, tb)
317 if exc.traceback is not tb:
318 raise exc.with_traceback(tb)
--> 319 raise exc
File /#########################/lib/python3.9/site-packages/dask/local.py:224, in execute_task(key, task_info, dumps, loads, get_id, pack_exception)
222 try:
223 task, data = loads(task_info)
--> 224 result = _execute_task(task, data)
225 id = get_id()
226 result = dumps((result, id))
File /#########################/lib/python3.9/site-packages/dask/core.py:121, in _execute_task(arg, cache, dsk)
117 func, args = arg[0], arg[1:]
118 # Note: Don't assign the subtask results to a variable. numpy detects
119 # temporaries by their reference count and can execute certain
120 # operations in-place.
--> 121 return func(*(_execute_task(a, cache) for a in args))
122 elif not ishashable(arg):
123 return arg
File /#########################/lib/python3.9/site-packages/dask/utils.py:73, in apply(func, args, kwargs)
42 """Apply a function given its positional and keyword arguments.
43
44 Equivalent to func(*args, **kwargs)
(...)
70 >>> dsk = {'task-name': task} # adds the task to a low level Dask task graph
71 """
72 if kwargs:
---> 73 return func(*args, **kwargs)
74 else:
75 return func(*args)
File /#########################/lib/python3.9/site-packages/xarray/backends/api.py:547, in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, inline_array, chunked_array_type, from_array_kwargs, backend_kwargs, **kwargs)
544 kwargs.update(backend_kwargs)
546 if engine is None:
--> 547 engine = plugins.guess_engine(filename_or_obj)
549 if from_array_kwargs is None:
550 from_array_kwargs = {}
File /#########################/lib/python3.9/site-packages/xarray/backends/plugins.py:197, in guess_engine(store_spec)
189 else:
190 error_msg = (
191 "found the following matches with the input file in xarray's IO "
192 f"backends: {compatible_engines}. But their dependencies may not be installed, see:\n"
193 "https://docs.xarray.dev/en/stable/user-guide/io.html \n"
194 "https://docs.xarray.dev/en/stable/getting-started-guide/installing.html"
195 )
--> 197 raise ValueError(error_msg)
ValueError: did not find a match in any of xarray's currently installed IO backends ['netcdf4', 'scipy']. Consider explicitly selecting one of the installed engines via the engine
parameter, or installing additional IO dependencies, see:
https://docs.xarray.dev/en/stable/getting-started-guide/installing.html
https://docs.xarray.dev/en/stable/user-guide/io.html
The downloaded and processed data is loaded into a PyTorch Lightning data module. In the following code cell, we use the following settings:
#########################
Error Message (weatherbench):
#########################
File /#########################/climate-learn/src/climate_learn/data/processing/nc2npz.py:189, in convert_nc2npz(root_dir, save_dir, variables, start_train_year, start_val_year, start_test_year, end_year, num_shards)
185 test_years = range(start_test_year, end_year)
187 os.makedirs(save_dir, exist_ok=True)
--> 189 nc2np(root_dir, variables, train_years, save_dir, "train", num_shards)
190 nc2np(root_dir, variables, val_years, save_dir, "val", num_shards)
191 nc2np(root_dir, variables, test_years, save_dir, "test", num_shards)
File /#########################/climate-learn/src/climate_learn/data/processing/nc2npz.py:95, in nc2np(path, variables, years, save_dir, partition, num_shards_per_year)
93 else: # pressure-level variables
94 assert len(ds[code].shape) == 4
---> 95 all_levels = ds["level"][:].to_numpy()
96 all_levels = np.intersect1d(all_levels, DEFAULT_PRESSURE_LEVELS)
97 for level in all_levels:
File /#########################/lib/python3.9/site-packages/xarray/core/dataset.py:1473, in Dataset.getitem(self, key)
1471 return self.isel(**key)
1472 if utils.hashable(key):
-> 1473 return self._construct_dataarray(key)
1474 if utils.iterable_of_hashable(key):
1475 return self._copy_listed(key)
File /#########################/lib/python3.9/site-packages/xarray/core/dataset.py:1384, in Dataset._construct_dataarray(self, name)
1382 variable = self._variables[name]
1383 except KeyError:
-> 1384 _, name, variable = _get_virtual_variable(self._variables, name, self.dims)
1386 needed_dims = set(variable.dims)
1388 coords: dict[Hashable, Variable] = {}
File /#########################/lib/python3.9/site-packages/xarray/core/dataset.py:196, in _get_virtual_variable(variables, key, dim_sizes)
194 split_key = key.split(".", 1)
195 if len(split_key) != 2:
--> 196 raise KeyError(key)
198 ref_name, var_name = split_key
199 ref_var = variables[ref_name]
KeyError: 'level'
The climate_learn.data.DataModule
uses tqdm
to display progress when loading the dataset.
Unfortunately, it requires Jupyter
and ipywidgets
as a dependency. Otherwise it throws an error.
Screenshot of error attached below.
I have also added the snippet of code which I tried to run as a reference.
from climate_learn.utils.datetime import Year, Days, Hours
from climate_learn.data import DataModule
def main():
data_module = DataModule(
dataset = "ERA5",
task = "forecasting",
root_dir = "/data0/datasets/weatherbench/data/weatherbench/era5/5.625deg/",
in_vars = ["2m_temperature"],
out_vars = ["2m_temperature"],
train_start_year = Year(2005),
val_start_year = Year(2015),
test_start_year = Year(2016),
end_year = Year(2016),
pred_range = Days(3),
subsample = Hours(6),
batch_size = 128,
num_workers = 1
)
if __name__ == "__main__":
main()
This issue is resolved by running pip install jupyter
.
Describe the bug
Forecasting does not work with multiple variables as input/output
To Reproduce
data_module = DataModule(
dataset = "ERA5",
task = "forecasting",
root_dir = path,
in_vars = ["2m_temperature", "total_cloud_cover"],
out_vars = ["2m_temperature", "total_cloud_cover"],
train_start_year = Year(2010), # change
val_start_year = Year(2015),
test_start_year = Year(2017),
end_year = Year(2018),
pred_range = Days(3),
subsample = Hours(6),
batch_size = 32,
num_workers= 64,
)
model_kwargs = {
"in_channels": len(data_module.hparams.in_vars),
"out_channels": len(data_module.hparams.out_vars),
"n_blocks": 4
}
optim_kwargs = {
"lr": 1e-4,
"weight_decay": 1e-5,
"warmup_epochs": 1,
"max_epochs": 5,
}
model_module = load_model(name = "resnet", task = "forecasting", model_kwargs = model_kwargs, optim_kwargs = optim_kwargs)
set_climatology(model_module, data_module)
fit_lin_reg_baseline(model_module, data_module, reg_hparam=0.0)
from climate_learn.training import Trainer
trainer = Trainer(
seed = 0,
accelerator = "gpu",
precision = 16,
max_epochs = 1,
)
trainer.fit(model_module, data_module)
Environment
Additional context
When the climatology baseline is removed from test_step, the code works as expected.
Hi there!
Thank you for sharing the repository!
During the processing of raw precipitation data there are 2 lines where mean values are set to zero (link is below).
Could you please explain the reason behind this?
Best, Daria
Describe the bug
Running pytest tests/
gives an error for test_vit()
when running with the version 0.9.2
but works perfectly with version 0.6.12
.
To Reproduce
Create a fresh conda environment with python 3.7 and install all the dependencies.
Describe the bug
climate_learn.utils.visualize()
prints the image upside down and with flipped colormap.
To Reproduce
I am merely running the demo notebook found here - https://colab.research.google.com/drive/1WiNEK1BHsiGzo_bT9Fcm8lea2H_ghNfa
Expected behavior
I believe the map plots produced by visualize()
should have the same orientation and colormap as the stock images provided in the above notebook i.e. northern hemisphere on top and in the red-blue colormap red should correspond to higher values and blue to lower.
Environment
N/A
Additional context
N/A
Describe the documentation issue
As the title says, the data folder has almost no documentation.
Describe the solution you'd like
Detailed documentation including but not limited to doc-strings, type-hinting, comments for complex pieces of code.
I am interested in using your datasets but they are all too large to fit on my machine. On way around this would be to use the Hugging Face Datasets library, which allows you to stream the dataset input. However, it seems like to upload them to the Hub you first must have the files downloaded locally in the first place. Is it possible for you to add these datasets to the Hub? Thanks in advance.
set_denormalization
function in climate_learn.models.modules.forecast.py
has repetition of lines.
Particularly this piece of code is repeated thrice.
mean_mean_denorm, mean_std_denorm = -mean / std, 1 / std
self.mean_denormalize = transforms.Normalize(mean_mean_denorm, mean_std_denorm)
std_mean_denorm, std_std_denorm = np.zeros_like(std), 1 / std
self.std_denormalize = transforms.Normalize(std_mean_denorm, std_std_denorm)
Describe the bug
As mentioned in the title, changing the ordering of variables listed in the argument to data module has a significant effect on model performance for forecasting
To Reproduce
Steps to reproduce the behavior:
Instantiate data module as follows:
data_args = ERA5Args(
root_dir=f"{root}/data/{source}/{dataset}/{resolution}/",
variables=['geopotential', 'u_component_of_wind', 'v_component_of_wind', 'temperature', 'specific_humidity', '2m_temperature'],
years=years
)
forecasting_args = ForecastingArgs(
dataset_args=data_args,
in_vars=['geopotential', 'u_component_of_wind', 'v_component_of_wind', 'temperature', 'specific_humidity', '2m_temperature'],
out_vars=["temperature_850", "geopotential_500", "2m_temperature"],
pred_range=3*24
)
data_module_args = DataModuleArgs(
task_args=forecasting_args,
train_start_year=1979,
val_start_year=2015,
test_start_year=2017,
end_year=2018
)
data_module = DataModule(
data_module_args=data_module_args,
batch_size=128,
num_workers=1
)
The code snippet was taken from the Model_Training_Evaluation notebook, and the rest of the code is identical to the notebook. Due to the poor performance, I swapped the order of variables for data_args
:
data_args = ERA5Args(
root_dir=f"{root}/data/{source}/{dataset}/{resolution}/",
variables=['temperature', 'geopotential', 'u_component_of_wind', 'v_component_of_wind', 'specific_humidity', '2m_temperature'],
years=years
)
Expected behavior
The two experiments should have (I think) little difference in performance, but performance was drastically affected.
Additional context
Are my observations expected behavior? I am not sure if we are completely abandoning the old DataModule for the IterDataModule; if so I will remove this issue.
Describe the bug
Got this error: TypeError: DataModule.init() got an unexpected keyword argument 'dataset', even though the docs stated that it's part of the parameters.
To Reproduce
Steps to reproduce the behavior:
I just ran this:
from climate_learn.utils.datetime import Year, Days, Hours
from climate_learn.data import DataModule
data_module = DataModule(
dataset = "ERA5",
task = "forecasting",
root_dir = "/content/drive/MyDrive/Climate/.climate_tutorial/data/weatherbench/era5/5.625/",
in_vars = ["2m_temperature"],
out_vars = ["2m_temperature"],
train_start_year = Year(1979),
val_start_year = Year(2015),
test_start_year = Year(2017),
end_year = Year(2018),
pred_range = Days(3),
subsample = Hours(6),
batch_size = 128,
num_workers = 1
)
Error traceback:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
[<ipython-input-11-6ff523560500>](https://localhost:8080/#) in <cell line: 4>()
2 from climate_learn.data import DataModule
3
----> 4 data_module = DataModule(
5 dataset = "ERA5",
6 task = "forecasting",
TypeError: DataModule.__init__() got an unexpected keyword argument 'dataset'
Expected behavior
Code can run smoothly
Environment
Package Version
----------------------------- --------------------
absl-py 1.4.0
aiohttp 3.8.4
aiosignal 1.3.1
alabaster 0.7.13
albumentations 1.2.1
altair 4.2.2
anyio 3.6.2
appdirs 1.4.4
argon2-cffi 21.3.0
argon2-cffi-bindings 21.2.0
arviz 0.15.1
astropy 5.2.2
astunparse 1.6.3
async-timeout 4.0.2
attrs 23.1.0
audioread 3.0.0
autograd 1.5
Babel 2.12.1
backcall 0.2.0
beautifulsoup4 4.11.2
bleach 6.0.0
blis 0.7.9
blosc2 2.0.0
bokeh 2.4.3
branca 0.6.0
CacheControl 0.12.11
cached-property 1.5.2
cachetools 5.3.0
catalogue 2.0.8
cdsapi 0.6.1
certifi 2022.12.7
cffi 1.15.1
cftime 1.6.2
chardet 4.0.0
charset-normalizer 2.0.12
chex 0.1.7
click 8.1.3
climate-learn 0.0.2
cloudpickle 2.2.1
cmake 3.25.2
cmdstanpy 1.1.0
colorcet 3.0.1
colorlover 0.3.0
community 1.0.0b1
confection 0.0.4
cons 0.4.5
contextlib2 0.6.0.post1
contourpy 1.0.7
convertdate 2.4.0
cryptography 40.0.2
cufflinks 0.17.3
cupy-cuda11x 11.0.0
cvxopt 1.3.0
cvxpy 1.3.1
cycler 0.11.0
cymem 2.0.7
Cython 0.29.34
dask 2022.12.1
datascience 0.17.6
db-dtypes 1.1.1
dbus-python 1.2.16
debugpy 1.6.6
decorator 4.4.2
defusedxml 0.7.1
distributed 2022.12.1
dlib 19.24.1
dm-tree 0.1.8
docker-pycreds 0.4.0
docutils 0.16
dopamine-rl 4.0.6
duckdb 0.7.1
earthengine-api 0.1.350
easydict 1.10
ecos 2.0.12
editdistance 0.6.2
en-core-web-sm 3.5.0
entrypoints 0.4
ephem 4.1.4
et-xmlfile 1.1.0
etils 1.2.0
etuples 0.3.8
exceptiongroup 1.1.1
fastai 2.7.12
fastcore 1.5.29
fastdownload 0.0.7
fastjsonschema 2.16.3
fastprogress 1.0.3
fastrlock 0.8.1
filelock 3.12.0
firebase-admin 5.3.0
Flask 2.2.4
flatbuffers 23.3.3
flax 0.6.9
folium 0.14.0
fonttools 4.39.3
frozendict 2.3.7
frozenlist 1.3.3
fsspec 2023.4.0
future 0.18.3
gast 0.4.0
GDAL 3.3.2
gdown 4.6.6
gensim 4.3.1
geographiclib 2.0
geopy 2.3.0
gin-config 0.5.0
gitdb 4.0.10
GitPython 3.1.31
glob2 0.7
google 2.0.3
google-api-core 2.11.0
google-api-python-client 2.84.0
google-auth 2.17.3
google-auth-httplib2 0.1.0
google-auth-oauthlib 1.0.0
google-cloud-bigquery 3.9.0
google-cloud-bigquery-storage 2.19.1
google-cloud-core 2.3.2
google-cloud-datastore 2.15.1
google-cloud-firestore 2.11.0
google-cloud-language 2.9.1
google-cloud-storage 2.8.0
google-cloud-translate 3.11.1
google-colab 1.0.0
google-crc32c 1.5.0
google-pasta 0.2.0
google-resumable-media 2.5.0
googleapis-common-protos 1.59.0
googledrivedownloader 0.4
graphviz 0.20.1
greenlet 2.0.2
grpcio 1.54.0
grpcio-status 1.48.2
gspread 3.4.2
gspread-dataframe 3.0.8
gym 0.25.2
gym-notices 0.0.8
h5netcdf 1.1.0
h5py 3.8.0
hijri-converter 2.3.1
holidays 0.23
holoviews 1.15.4
html5lib 1.1
httpimport 1.3.0
httplib2 0.21.0
huggingface-hub 0.14.1
humanize 4.6.0
hyperopt 0.2.7
idna 3.4
imageio 2.25.1
imageio-ffmpeg 0.4.8
imagesize 1.4.1
imbalanced-learn 0.10.1
imgaug 0.4.0
importlib-metadata 4.13.0
importlib-resources 5.12.0
imutils 0.5.4
inflect 6.0.4
iniconfig 2.0.0
intel-openmp 2023.1.0
ipykernel 5.5.6
ipython 7.34.0
ipython-genutils 0.2.0
ipython-sql 0.4.1
ipywidgets 7.7.1
itsdangerous 2.1.2
jax 0.4.8
jaxlib 0.4.7+cuda11.cudnn86
jieba 0.42.1
Jinja2 3.1.2
joblib 1.2.0
jsonpickle 3.0.1
jsonschema 4.3.3
jupyter-client 6.1.12
jupyter-console 6.1.0
jupyter_core 5.3.0
jupyter-server 1.24.0
jupyterlab-pygments 0.2.2
jupyterlab-widgets 3.0.7
kaggle 1.5.13
keras 2.12.0
kiwisolver 1.4.4
korean-lunar-calendar 0.3.1
langcodes 3.3.0
lazy_loader 0.2
libclang 16.0.0
librosa 0.10.0.post2
lightgbm 3.3.5
lightning-utilities 0.8.0
lit 16.0.2
llvmlite 0.39.1
locket 1.0.0
logical-unification 0.4.5
LunarCalendar 0.0.9
lxml 4.9.2
Markdown 3.4.3
markdown-it-py 2.2.0
MarkupSafe 2.1.2
matplotlib 3.7.1
matplotlib-inline 0.1.6
matplotlib-venn 0.11.9
mdurl 0.1.2
miniKanren 1.0.3
missingno 0.5.2
mistune 0.8.4
mizani 0.8.1
mkl 2019.0
ml-dtypes 0.1.0
mlxtend 0.14.0
more-itertools 9.1.0
moviepy 1.0.3
mpmath 1.3.0
msgpack 1.0.5
multidict 6.0.4
multipledispatch 0.6.0
multitasking 0.0.11
murmurhash 1.0.9
music21 8.1.0
natsort 8.3.1
nbclient 0.7.4
nbconvert 6.5.4
nbformat 5.8.0
nest-asyncio 1.5.6
netCDF4 1.6.3
networkx 3.1
nibabel 3.0.2
nltk 3.8.1
notebook 6.4.8
numba 0.56.4
numexpr 2.8.4
numpy 1.22.4
oauth2client 4.1.3
oauthlib 3.2.2
opencv-contrib-python 4.7.0.72
opencv-python 4.7.0.72
opencv-python-headless 4.7.0.72
openpyxl 3.0.10
opt-einsum 3.3.0
optax 0.1.5
orbax-checkpoint 0.2.1
osqp 0.6.2.post8
packaging 23.1
palettable 3.3.3
pandas 1.5.3
pandas-datareader 0.10.0
pandas-gbq 0.17.9
pandocfilters 1.5.0
panel 0.14.4
param 1.13.0
parso 0.8.3
partd 1.4.0
pathlib 1.0.1
pathtools 0.1.2
pathy 0.10.1
patsy 0.5.3
pep517 0.13.0
pexpect 4.8.0
pickleshare 0.7.5
Pillow 8.4.0
pip 23.0.1
pip-tools 6.6.2
platformdirs 3.3.0
plotly 5.13.1
plotnine 0.10.1
pluggy 1.0.0
polars 0.17.3
pooch 1.6.0
portpicker 1.3.9
prefetch-generator 1.0.3
preshed 3.0.8
prettytable 0.7.2
proglog 0.1.10
progressbar2 4.2.0
prometheus-client 0.16.0
promise 2.3
prompt-toolkit 3.0.38
prophet 1.1.2
proto-plus 1.22.2
protobuf 3.20.3
psutil 5.9.5
psycopg2 2.9.6
ptyprocess 0.7.0
py-cpuinfo 9.0.0
py4j 0.10.9.7
pyarrow 9.0.0
pyasn1 0.5.0
pyasn1-modules 0.3.0
pycocotools 2.0.6
pycparser 2.21
pyct 0.5.0
pydantic 1.10.7
pydata-google-auth 1.7.0
pydot 1.4.2
pydot-ng 2.0.0
pydotplus 2.0.2
PyDrive 1.3.1
pyerfa 2.0.0.3
pygame 2.3.0
Pygments 2.14.0
PyGObject 3.36.0
pymc 5.1.2
PyMeeus 0.5.12
pymystem3 0.2.0
PyOpenGL 3.1.6
pyparsing 3.0.9
pyrsistent 0.19.3
PySocks 1.7.1
pytensor 2.10.1
pytest 7.2.2
python-apt 0.0.0
python-dateutil 2.8.2
python-louvain 0.16
python-slugify 8.0.1
python-utils 3.5.2
pytorch-lightning 2.0.2
pytz 2022.7.1
pytz-deprecation-shim 0.1.0.post0
pyviz-comms 2.2.1
PyWavelets 1.4.1
PyYAML 6.0
pyzmq 23.2.1
qdldl 0.1.7
qudida 0.0.4
regex 2022.10.31
requests 2.27.1
requests-oauthlib 1.3.1
requests-unixsocket 0.2.0
rich 13.3.4
rpy2 3.5.5
rsa 4.9
scikit-image 0.19.3
scikit-learn 1.2.2
scipy 1.10.1
scs 3.2.3
seaborn 0.12.2
Send2Trash 1.8.0
sentry-sdk 1.22.1
setproctitle 1.3.2
setuptools 67.7.2
shapely 2.0.1
six 1.16.0
sklearn-pandas 2.2.0
smart-open 6.3.0
smmap 5.0.0
sniffio 1.3.0
snowballstemmer 2.2.0
sortedcontainers 2.4.0
soundfile 0.12.1
soupsieve 2.4.1
soxr 0.3.5
spacy 3.5.2
spacy-legacy 3.0.12
spacy-loggers 1.0.4
Sphinx 3.5.4
sphinxcontrib-applehelp 1.0.4
sphinxcontrib-devhelp 1.0.2
sphinxcontrib-htmlhelp 2.0.1
sphinxcontrib-jsmath 1.0.1
sphinxcontrib-qthelp 1.0.3
sphinxcontrib-serializinghtml 1.1.5
SQLAlchemy 2.0.10
sqlparse 0.4.4
srsly 2.4.6
statsmodels 0.13.5
sympy 1.11.1
tables 3.8.0
tabulate 0.8.10
tblib 1.7.0
tenacity 8.2.2
tensorboard 2.12.2
tensorboard-data-server 0.7.0
tensorboard-plugin-wit 1.8.1
tensorflow 2.12.0
tensorflow-datasets 4.8.3
tensorflow-estimator 2.12.0
tensorflow-gcs-config 2.12.0
tensorflow-hub 0.13.0
tensorflow-io-gcs-filesystem 0.32.0
tensorflow-metadata 1.13.1
tensorflow-probability 0.19.0
tensorstore 0.1.36
termcolor 2.3.0
terminado 0.17.1
text-unidecode 1.3
textblob 0.17.1
tf-slim 1.1.0
thinc 8.1.9
threadpoolctl 3.1.0
tifffile 2023.4.12
timm 0.6.13
tinycss2 1.2.1
toml 0.10.2
tomli 2.0.1
toolz 0.12.0
torch 2.0.0+cu118
torchaudio 2.0.1+cu118
torchdata 0.6.0
torchmetrics 0.11.4
torchsummary 1.5.1
torchtext 0.15.1
torchvision 0.15.1+cu118
tornado 6.2
tqdm 4.65.0
traitlets 5.7.1
triton 2.0.0
tweepy 4.13.0
typer 0.7.0
typing_extensions 4.5.0
tzdata 2023.3
tzlocal 4.3
uritemplate 4.1.1
urllib3 1.26.15
vega-datasets 0.9.0
wandb 0.15.2
wasabi 1.1.1
wcwidth 0.2.6
webcolors 1.13
webencodings 0.5.1
websocket-client 1.5.1
Werkzeug 2.3.0
wheel 0.40.0
widgetsnbextension 3.6.4
wordcloud 1.8.2.2
wrapt 1.14.1
xarray 2022.12.0
xarray-einstats 0.5.1
xgboost 1.7.5
xlrd 2.0.1
yarl 1.9.2
yellowbrick 1.5
yfinance 0.2.18
zict 3.0.0
zipp 3.15.0
Hi,
Currently the ERA5
class is inherited from the torch.utils.data.dataset
.
For tasks involving a lot of input and output variables, it becomes impossible to load them all in the RAM at the same time. Hence, I was wondering if there would be a support for IterableDataset
. It would support loading only a subpart of the data in the RAM.
Forecasting
class takes input subsample as the args but when building the inp_data
and out_data
, it ignores the subsample.
The other way to implement would be to keep the entire inp_data
and out_data
but implementing subsampling logic in the indexing.
The later seems straight forward but wastes some extra space. For the former, it would require some effort to deal with cases when window and pred_range are not factor of subsample.
Is your feature request related to a problem? Please describe.
Currently, ShardDataset loads a chunk and only after it exhausts it, it loads the next one. It would be great to hide the latency by implementing some sort of prefetching.
Describe the solution you'd like
I have no clue on how to attain it but would love to hear other's thoughts.
Describe the bug
I am trying to run the extreme events file present in src/climate_learn/data/processing/era5_extreme.py, but the map_dataset.setup() function keeps crashing. It is using more than 12 GB RAM and crashes after that.
Please suggest a solution.
Thank you in advance!
Environment
Is your feature request related to a problem? Please describe.
Currently the ERA5
module takes as input the variables and then based on the contents in the root_dir
, for each of the input variables it then considers them either as variable or constant.
Describe the solution you'd like
ERA5
init arguments should support taking input constants and assert that that input variables are not constants.
Is your feature request related to a problem? Please describe.
The feature request is motivated by the following problems:
climate-learn
downscaling against my models and wanted to see if there is a pattern in errors, e.g., during certain hours of the day, predictions are more erroneous. trainer.test(model, dm)
provides a nice summary of metrics but does not save the predictions along with lat
, lon
, time
.Describe the solution you'd like
A possible way could be to use a flow similar to cl.utils.visualize_at_index
function.
climate-learn/src/climate_learn/utils/visualize.py
Lines 10 to 12 in 1a46b08
There could be a function cl.utils.save_nc
which may look like:
def save_nc(mm, dm, in_transform, out_transform, variable, src, save_dir):
...
This function can save the predictions in exactly the same format as the data nc
files with lat
, lon
and time
co-ordinates. It should be able to retrieve lat
, lon
, time
from data module dm
.
Willingness to work on a PR
I'll be happy to work on a PR to make this happen!
Additional context
This feature may also be useful to climate researchers who want to produce a time-lapse video of predictions with other libraries with additional geolayers similar to these examples in geemap library.
So I am attempting to downscale ERA5 Sea Surface Temperature Variable, I was following along your tutorial at NeurIPS2022 CCAI. I noticed there you used 5 degree and 2 Degree Resolutions for 2m_temperature, Why is this done? It is not very clear. For Sea Surface Temperature i have data at 0.25 Degree resolution, but do i need a Coarser resolution to get this code to work for my chosen variable?
Bug Description
I am encountering an issue when attempting to replicate results for a downscaling problem. During the execution of the code, I encounter the following error:
TypeError: load_model_module() got an unexpected keyword argument 'preset'
This error occurs in the code found at this link:
When I replace the argument "preset" with "architecture", the test produces no results, outputting an empty array: [{}].
Could you provide guidance on how to accurately reproduce the results from the paper?
Hello, I got the following error when I try to download the 2.8125 res data of geopotential_500 with the following code:
cl.data.download_weatherbench(
f"{root_directory}/geopotential",
dataset="era5",
variable="geopotential_500",
resolution=2.8125
)
The error message is that:
climate_learn/data/download.py:85, in download_weatherbench(dst, dataset, variable, resolution)
83 file.write(chunk)
84 if ext == ".zip":
---> 85 with ZipFile(local_fn) as myzip:
86 myzip.extractall(dst)
87 os.unlink(local_fn)
BadZipFile: File is not a zip file
Could you help me to figure out the problem? Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.