GithubHelp home page GithubHelp logo

gher-uliege / dincae Goto Github PK

View Code? Open in Web Editor NEW
45.0 9.0 18.0 211 KB

DINCAE (Data-Interpolating Convolutional Auto-Encoder) is a neural network to reconstruct missing data in satellite observations.

License: GNU General Public License v3.0

Python 77.33% Makefile 0.24% Julia 22.43%
neural-network tensorflow interpolation oceanography satellite-observations remote-sensing python netcdf4

dincae's People

Contributors

alexander-barth avatar jmbeckers avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dincae's Issues

How to use a trained network to reconstruct data?

  Hello, after reading your paper and code,I tried using DINCAE1.0 to reconstruct remote sensing data, and so far I have successfully trained and saved the 500th and 1000th training models, which are some network parameters, such as .ckpt files.
  I have a question, as mentioned in the paper, the data is divided into training data and testing data. During the training process, additional Gaussian noise and other cloud masks are added to the training data, while nothing extra is added to the testing data. In addition, 50 images were used for validation data, but these 50 images did not participate in model training.So according to my understanding, should we use the trained model parameter file (. ckpt) to reconstruct these 50 validation data after training?
  My current approach is to add this line of code in the reconstruct function of the code: save. restore (sess,'E:/DINCAE/DINCAE master/model/model-1000. ckpt '), which is added before "loop over epochs".I will create inputdata from 50 validation data and then run DINCAE. At this point, can I understand it as using the trained network model parameters to reconstruct the validation data?I only saw the functions for training the network in your code, and did not find how to directly reconstruct the validation data using the trained network parameters to verify the effectiveness of the model.
  In summary, my problem is that I don't know how to use the already trained. ckpt parameter files. My current approach is to load these parameter files, then run DINCAE, and use the 1/1000 epoch output file as the reconstructed image.The reason why I did this is that the code outputs the reconstruction results of the test data, and the test data does not add anything extra. Therefore, my input data is 50 validation data, and at this point, run DINCAE stores the reconstruction results of the validation images.
  Sorry, I am a beginner and not very familiar with deep learning. I have said too much here. I sincerely request and hope for your answer. Thank you very much!

Attributre Error : 'OwnedIterator' object has no attribute 'string_handle

Describe the bug

A description of what the bug is.

To Reproduce

I am trying to setup and run DINCAE but am getting an error

Environment

  • operating system: Windows 10
  • Python version : 3.7
  • Tensorflow version : 2.0
  • DINCAE version
    DINCAE_error

Full output

varname SST (112, 112)
data shape: (100, 112, 112)
data range: 13.575001 30.225
sz (100, 112, 112)
sz (100, 112, 112)
Number of input variables: 10
regularization_L2_beta 0
enc_nfilter_internal [16, 24, 36, 54]
nvar 10
nepoch_keep_missing 0
Traceback (most recent call last):

File "E:\D BackUp\PPL Works\UdaySir\DINCAE-master\run_DINCAE.py", line 14, in
DINCAE.reconstruct_gridded_nc(filename,varname,outdir)

File "E:\D BackUp\PPL Works\UdaySir\DINCAE-master\DINCAE_init_.py", line 741, in reconstruct_gridded_nc
**kwargs)

File "E:\D BackUp\PPL Works\UdaySir\DINCAE-master\DINCAE_init_.py", line 405, in reconstruct
train_iterator_handle = sess.run(train_iterator.string_handle())

AttributeError: 'OwnedIterator' object has no attribute 'string_handle'

Input file

Filename : " avhrr_sub_add_clouds_small.nc "

Failed to interpolate MODIS Chlorophyll-a Data

Describe the bug

A description of what the bug is.

To Reproduce

Please provide a minimal code example which reproduces the behavior (bug, performance regression, ...).

Environment

  • operating system: Ubuntu 16.04.7 LTS
  • Python version 3.7.12
  • Tensorflow version 1.15.0
  • DINCAE version v1.1.0 (Latest)

Full output

In case of an error, please paste the full error message and stack trace.
The file output attached record_modis_chl_3d.txt

Input file

Run the shell command ncdump -h myfile.nc and paste the output here.
`netcdf input_file_python {
dimensions:
lon = 445 ;
lat = 459 ;
time = UNLIMITED ; // (3 currently)
variables:
float lon(lon) ;
float lat(lat) ;
float time(time) ;
time:units = "days since 2021-01-01 00:00:00" ;
float chl(time, lat, lon) ;
chl:_FillValue = -9999.f ;
int mask(lat, lon) ;
mask:comment = "one means the data location is valid (e.g. sea for SST), zero the location is invalid (e.g. land for SST)" ;

// global attributes:
:_NCProperties = "version=2,netcdf=4.8.1,hdf5=1.12.2" ;
}`

some questions about the test dataset

In the paper, how were the last 50 images used to test the model ?
the network is trained by using 5266 images, then the cloud mask of the first 50 images is used in the last 50 images, the trained model is validated on these 50 images.
Is the testing process like this?

How should I make the input data set for DINCAE?

Hello, I am a beginner. How should I make the input data set for DINCAE? I now have some global SST nc data sets. Should I merge them into one nc file and then recreate the data set required by DINCAE? I'm confused about this because of create_input_file.py in examples

create input file for multi MODIS dataset

This is not bug question, rather a newbie question.

I don't understand what to do next after create create_input_file.jl and, how it work with multifiles MODIS overthere?
is that correlate with append mode like in the NCDatasets.jl does? to aggregate sst variables?

Thank you

How can i adapt the trained model(.ckpt.meta) for the test dataset

### Before I ask about my problem, I really appreciate a kind response from the author, Thank you.

I continuously fail to adapt the trained DINCAE model on the test dataset when I run the pre-trained model on the test dataset, an unidentified error message came from python. May I ask how can I adapt the pre-trained model?
Can anyone succeed in my question?

model_call_path="/share/ocean/SSS/SMAP_SSS_JPL_L2/G_2_stack_map_EA_3X3window/result/window5_RF_only/"
os.chdir(model_call_path)
sess = tf.compat.v1.Session()
saver=tf.train.import_meta_graph('./best_model.ckpt.meta')
saver.restore(sess,tf.train.latest_checkpoint(model_call_path))

# test dataset without added clouds
# must be reinitializable
test_dataset = tf.data.Dataset.from_generator(
    test_datagen, (tf.float32,tf.float32),
    (tf.TensorShape([jmax,imax,nvar]),tf.TensorShape([jmax,imax,2]))).batch(batch_size)

if nprefetch > 0:
    # train_dataset = train_dataset.prefetch(nprefetch)
    test_dataset = test_dataset.prefetch(nprefetch)

test_iterator = tf.compat.v1.data.Iterator.from_structure(test_dataset.output_types,
                                                test_dataset.output_shapes)
test_iterator_init_op = test_iterator.make_initializer(test_dataset)

test_iterator_handle = sess.run(test_iterator.string_handle())

handle = tf.compat.v1.placeholder(tf.string, shape=[], name = "handle_name_iterator")

iterator = tf.compat.v1.data.Iterator.from_string_handle(
        handle, test_iterator.output_types, output_shapes = test_iterator.output_shapes)

inputs_,xtrue = iterator.get_next()

(skip build structure code)

       timestr = datetime.now().strftime("%Y-%m-%dT%H%M%S")
        fname = os.path.join(outdir,"data-{}.nc".format(timestr))

        # reset test iterator, so that we start from the beginning
        sess.run(test_iterator_init_op)

        for ii in range(ceil(test_len / batch_size)):
            summary, batch_cost,batch_RMS,batch_m_rec,batch_σ2_rec = sess.run(
                [merged, cost,RMS,m_rec,σ2_rec],
                feed_dict = { handle: test_iterator_handle,
                              mask_issea: mask })

            # time instances already written
            offset = ii*batch_size
            savesample(fname,batch_m_rec,batch_σ2_rec,meandata,lon,lat,e,ii,
                       offset, transfun = transfun)

Error message
FailedPreconditionError: 2 root error(s) found.
(0) Failed precondition: Attempting to use uninitialized value conv2d_3/kernel_1
[[node conv2d_3/kernel_1/read (defined at :506) ]]
[[Sqrt_2/_131]]
(1) Failed precondition: Attempting to use uninitialized value conv2d_3/kernel_1
[[node conv2d_3/kernel_1/read (defined at :506) ]]
0 successful operations.
0 derived errors ignored.

I'm really want to adapt it for a separated cross-validation dataset.
image
from the paper
image
"sess" was called above picture and run the trained model using the session as I mentioned first.
image

Running examples input_file_python.nc

Describe the bug

Error when running the original examples

To Reproduce

Please provide a minimal code example which reproduces the behavior (bug, performance regression, ...).

Environment

  • operating system: Ubuntu 16.04.7 LTS
  • Python version 3.9.7
  • Tensorflow version 2.10.0
  • DINCAE version v1.1.0 (Latest)

Full output

In case of an error, please paste the full error message and stack trace.
`
2023-02-04 11:40:35.989971: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 FMA

To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-02-04 11:40:37.278685: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 FMA

To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-02-04 11:40:37.279681: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
2023-02-04 11:40:37.280860: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance

varname SST (3, 3)
data shape: (3, 3, 3)
data range: 0.0 0.0
sz (3, 3, 3)
sz (3, 3, 3)
Number of input variables: 10
regularization_L2_beta 0
enc_nfilter_internal [16, 24, 36, 54]
nvar 10
nepoch_keep_missing 0

Traceback (most recent call last):
File "/home/ocean/DINCAE-master/run_DINCAE.py", line 10, in
DINCAE.reconstruct_gridded_nc(filename,varname,outdir)
File "/home/ocean/DINCAE-master/DINCAE/init.py", line 733, in reconstruct_gridded_nc
reconstruct(
File "/home/ocean/DINCAE-master/DINCAE/init.py", line 404, in reconstruct
train_iterator_handle = sess.run(train_iterator.string_handle())
AttributeError: 'OwnedIterator' object has no attribute 'string_handle'
`
Input file

Run the shell command ncdump -h myfile.nc and paste the output here.
`
ocean@ocean-HP-EliteDesk-800-G1-TWR:~/DINCAE-master/examples$ ncdump -h input_file_python.nc
netcdf input_file_python {
dimensions:
lon = 3 ;
lat = 3 ;
time = UNLIMITED ; // (3 currently)
variables:
float lon(lon) ;
float lat(lat) ;
float time(time) ;
time:units = "days since 1900-01-01 00:00:00" ;
float SST(time, lat, lon) ;
SST:_FillValue = -9999.f ;
int mask(lat, lon) ;
mask:comment = "one means the data location is valid (e.g. sea for SST), zero the location is invalid (e.g. land for SST)" ;

// global attributes:
:_NCProperties = "version=2,netcdf=4.8.1,hdf5=1.10.6" ;
}
`

"SYSTEM: show(lasterr) caused an error" when running `DINCAE.reconstruct_points`

Describe the bug

Trying to run DINCAE on CPR observations.

Stracktrace

Stacktrace:
  [1] throw_boundserror(A::NCDatasets.CFVariable{Union{Missing, Float64}, 1, NCDatasets.Variable{Float64, 1, NCDataset{Nothing}}, NCDatasets.Attributes{NCDataset{Nothing}}, NamedTuple{(:fillvalue, :missing_values, :scale_factor, :add_offset, :calendar, :time_origin, :time_factor), Tuple{Float64, Tuple{Float64}, Nothing, Nothing, Nothing, Nothing, Nothing}}}, I::Tuple{StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}})
    @ Base ./abstractarray.jl:703
  [2] checkbounds
    @ ./abstractarray.jl:668 [inlined]
  [3] _getindex
    @ ./multidimensional.jl:874 [inlined]
  [4] getindex(A::NCDatasets.CFVariable{Union{Missing, Float64}, 1, NCDatasets.Variable{Float64, 1, NCDataset{Nothing}}, NCDatasets.Attributes{NCDataset{Nothing}}, NamedTuple{(:fillvalue, :missing_values, :scale_factor, :add_offset, :calendar, :time_origin, :time_factor), Tuple{Float64, Tuple{Float64}, Nothing, Nothing, Nothing, Nothing, Nothing}}}, I::StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64})
    @ Base ./abstractarray.jl:1241
  [5] loadragged(ncvar::NCDatasets.CFVariable{Union{Missing, Float64}, 1, NCDatasets.Variable{Float64, 1, NCDataset{Nothing}}, NCDatasets.Attributes{NCDataset{Nothing}}, NamedTuple{(:fillvalue, :missing_values, :scale_factor, :add_offset, :calendar, :time_origin, :time_factor), Tuple{Float64, Tuple{Float64}, Nothing, Nothing, Nothing, Nothing, Nothing}}}, index::Colon)
    @ NCDatasets ~/.julia/packages/NCDatasets/gTGnf/src/variable.jl:152
  [6] (::DINCAE.var"#131#132"{String})(ds::NCDataset{Nothing})
    @ DINCAE ~/.julia/packages/DINCAE/OlSY0/src/points.jl:442
  [7] NCDataset(f::DINCAE.var"#131#132"{String}, args::String; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ NCDatasets ~/.julia/packages/NCDatasets/gTGnf/src/dataset.jl:241
  [8] NCDataset
    @ ~/.julia/packages/NCDatasets/gTGnf/src/dataset.jl:238 [inlined]
  [9] loaddata
    @ ~/.julia/packages/DINCAE/OlSY0/src/points.jl:441 [inlined]
 [10] reconstruct_points(T::Type, Atype::Type, filename::String, varname::String, grid::Tuple{StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}}, fnames_rec::Vector{String}; epochs::Int64, batch_size::Int64, truth_uncertain::Bool, enc_nfilter_internal::StepRange{Int64, Int64}, skipconnections::UnitRange{Int64}, clip_grad::Float64, regularization_L1_beta::Int64, regularization_L2_beta::Float64, save_epochs::StepRange{Int64, Int64}, upsampling_method::Symbol, probability_skip_for_training::Float64, jitter_std_pos::Tuple{Float32, Float32}, ntime_win::Int64, learning_rate::Float64, learning_rate_decay_epoch::Int64, min_std_err::Float64, loss_weights_refine::Tuple{Float64}, auxdata_files::Vector{NamedTuple{(:filename, :varname, :errvarname), Tuple{String, String, String}}}, savesnapshot::Bool)
    @ DINCAE ~/.julia/packages/DINCAE/OlSY0/src/points.jl:545
 [11] top-level scope

so the problem comes at the reading step with the function DINCAE.loaddata(filename,varname):
https://github.com/gher-uliege/DINCAE.jl/blob/main/src/points.jl#L440-L457,
which uses loadragged.

Question

Why does the input has to be in written as contiguous ragged array representation?

To Reproduce

Please provide a minimal code example which reproduces the behavior (bug, performance regression, ...).

Environment

  • operating system: Ubuntu

Input file

netcdf CPRdata {
dimensions:
	obs = 250021 ;
	depth = 1 ;
variables:
	double time(obs) ;
		time:_CoordinateAxisType = "Time" ;
		string time:actual_range = "1958-01-01", "2020-12-23" ;
		time:axis = "T" ;
		time:calendar = "Gregorian" ;
		time:ioos_category = "Time" ;
		time:long_name = "Valid Time GMT" ;
		time:standard_name = "time" ;
		time:time_origin = "01-JAN-1900 00:00:00" ;
		time:units = "days since 1900-01-01T00:00:00Z" ;
	double latitude(obs) ;
		latitude:_CoordinateAxisType = "Lat" ;
		latitude:_FillValue = -999. ;
		latitude:actual_range = 28.045f, 90.f ;
		latitude:axis = "Y" ;
		latitude:ioos_category = "Location" ;
		latitude:latitude_reference_datum = "geographical coordinates, WGS84 projection" ;
		latitude:long_name = "Latitude" ;
		latitude:missing_value = -999. ;
		latitude:standard_name = "latitude" ;
		latitude:units = "degrees_north" ;
		latitude:valid_max = 90. ;
		latitude:valid_min = -90. ;
	double longitude(obs) ;
		longitude:_CoordinateAxisType = "Lon" ;
		longitude:_FillValue = -999. ;
		longitude:actual_range = -15.4244, 180.002 ;
		longitude:axis = "X" ;
		longitude:ioos_category = "Location" ;
		longitude:long_name = "Longitude" ;
		longitude:longitude_reference_datum = "geographical coordinates, WGS84 projection" ;
		longitude:missing_value = -999. ;
		longitude:standard_name = "longitude" ;
		longitude:units = "degrees_east" ;
		longitude:valid_max = 180. ;
		longitude:valid_min = -180. ;
	double Calanus_Finmarchicus(obs) ;
		Calanus_Finmarchicus:_FillValue = -999. ;
		Calanus_Finmarchicus:actual_range = 0., 5012. ;
		Calanus_Finmarchicus:coordinates = "time" ;
		Calanus_Finmarchicus:long_name = "Abunance of Calanus Finmarchicus" ;
		Calanus_Finmarchicus:sdn_parameter_urn = "SDN:P01::Z302M01Z" ;
		Calanus_Finmarchicus:sdn_parameter_name = "Abundance of Calanus finmarchicus (ITIS: 85272: WoRMS 104464) per unit volume of the water body by optical microscopy" ;
		Calanus_Finmarchicus:sdn_uom_urn = "SDN:P06::UCPL" ;
		Calanus_Finmarchicus:sdn_uom_name = "Individuals per cubic meter" ;
		Calanus_Finmarchicus:AphiaID = "104464" ;
		Calanus_Finmarchicus:missing_value = -999. ;
		Calanus_Finmarchicus:units = "Individuals per cubic meter" ;
		Calanus_Finmarchicus:sample_dimension = "obs" ;
...
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.