Comments (12)
Found the reason: before the network had ~200.000 trainable parameters... now it has ~8 millions...
from climate-learning.
I will re-implement the MaxPool for backward compatibility
from climate-learning.
- Using tensorflow pipelines which should take care of our custom stratified k-fold-cross-validation that doesn't mix different years
- When computing
$A(t)$ we only need a small portion of the globe, so no need to load all the fields for that purpose - Balancing folds should be possible just based on this
$A(t)$ but perhaps we don't need to actually shuffle the data. We just need to tell tensorflow the map to the year labels in the pipelines - Consider tensorflow pipelines which train from the disk rather than RAM
from climate-learning.
I should note that there are extra memory leaks when trying current tensorflow version (that is no longer tensorflow-gpu and that is installed with pip rather than conda)
from climate-learning.
Check for easy improvement when loading data, especially if we don't want to use the whole dataset. In particular inside the Plasim_Field object
from climate-learning.
At the moment I create a separate dataset with cdo
that's the subset, and everything works much faster. But I feel this is ad hoc. Alternatively we could always work with smaller datasets that are to be concatenated in Learn2_new.py if needed, but also a bit annoying
from climate-learning.
Yeah, indeed. I was planning to do line by line evaluation of the code monitoring the RAM to see exactly when we load the data into memory and if we can do something about it, but I cannot guarantee I'll have time for that
from climate-learning.
I was just doing some runs and I noticed that now with 1000 years of data training a CNN uses 1.3TB of virtual memory... Did you change anything? If yes is not going in the right direction 😆
from climate-learning.
And that is because the MaxPool layers have disappeared
from climate-learning.
Yes, sorry forgot to re-implement them. As modified the way create_model
works. I was working with strides instead.
from climate-learning.
Although virtual memory normally doesn't matter.
from climate-learning.
My view is that we will need to implement tensorflow datasets. Operations such as shuffling (balancing) and train/validation have to be done virtually by permuting only indices. The dataset has to somehow know which portions of the full dataset it has to provide for the next(batch)
. In pytorch it is quite easy to control how data is extracted. I don't like that we have to copy paste data in memory. Normalization is another step, but it could be achieved by a layer implementation rather than doing it explicitly like we do?
from climate-learning.
Related Issues (20)
- Using `lon` and `lat` as inputs to `X` when classifying heatwaves
- `Learn2_new.py` trainer inheritence HOT 1
- Queue file
- dict arguments are not dealt with correctly HOT 1
- Transfer Learning from outside folder fails HOT 11
- It is a bad idea to write to home directory commit #d1ba925cbddd69d00bd297c4cdfc90efc8a5f1f5
- ERA_Fields.Plasim_Field.sort_lat doesn't do anything
- CESM France mask could be better HOT 6
- `lsm2mask` should be handled better
- Support for cascading inheritance HOT 1
- flexible hypop doesn't remove kwargs at default HOT 1
- pass metrics from command line
- `probabilistic_regression`: put the activation function for sigma in the model architecture rather than in the loss function
- 'monitor' is ignored to compute the score HOT 1
- cannot load kernels from composite in IIPR
- Allow to enable and disable mods
- Move useful unspecific things into separate modules
- For the gaussian approximation on CESM matrix `W` is 5.7GB HOT 1
- import params from external folder HOT 1
- Compatibility with newer tensorflow versions
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from climate-learning.