Light

Can't run the train script about palette-image-to-image-diffusion-models HOT 5 CLOSED

janspiry commented on August 11, 2024

Can't run the train script

from palette-image-to-image-diffusion-models.

Comments (5)

hasan-sayeed commented on August 11, 2024

And here is the train log:

22-05-31 17:39:02.309 - INFO: Create the log file in directory experiments\train_inpainting_celebahq_220531_173900.

22-05-31 17:39:02.331 - INFO: Dataset [InpaintDataset() form data.dataset] is created.
22-05-31 17:39:02.332 - INFO: Dataset for train have 99 samples.
22-05-31 17:39:02.332 - INFO: Dataset for val have 2 samples.
22-05-31 17:39:02.672 - INFO: Network [Network() form models.network] is created.
22-05-31 17:39:02.672 - INFO: Network [Network] weights initialize using [kaiming] method.
22-05-31 17:39:02.967 - INFO: Config is a str, converts to a dict {'name': 'mae'}
22-05-31 17:39:03.195 - INFO: Metric [mae() form models.metric] is created.
22-05-31 17:39:03.195 - INFO: Config is a str, converts to a dict {'name': 'mse_loss'}
22-05-31 17:39:03.210 - INFO: Loss [mse_loss() form models.loss] is created.
22-05-31 17:39:03.211 - INFO: Optimizer [Adam() form default file] is created.
22-05-31 17:39:03.212 - INFO: Option is None when initialize Scheduler
22-05-31 17:39:03.674 - INFO: Loading pretrained model from [experiments/train_inpainting_celebahq/checkpoint/200_Network.pth] ...
22-05-31 17:39:04.662 - INFO: Loading training state for [experiments/train_inpainting_celebahq/checkpoint/200.state] ...
22-05-31 17:39:05.057 - INFO: Model [Palette() form models.model] is created.
22-05-31 17:39:05.057 - INFO: Begin model train.
22-05-31 17:39:26.918 - INFO: train/mse_loss: 0.002101995706845738	
22-05-31 17:39:26.918 - INFO: epoch: 201	
22-05-31 17:39:26.918 - INFO: iters: 933311	
22-05-31 17:39:43.346 - INFO: train/mse_loss: 0.0034099449520440294	
22-05-31 17:39:43.346 - INFO: epoch: 202	
22-05-31 17:39:43.346 - INFO: iters: 933344	
22-05-31 17:40:00.108 - INFO: train/mse_loss: 0.0033231936262878166	
22-05-31 17:40:00.108 - INFO: epoch: 203	
22-05-31 17:40:00.108 - INFO: iters: 933377	
22-05-31 17:40:17.151 - INFO: train/mse_loss: 0.0026962171268127295	
22-05-31 17:40:17.151 - INFO: epoch: 204	
22-05-31 17:40:17.151 - INFO: iters: 933410	
22-05-31 17:40:34.390 - INFO: train/mse_loss: 0.006467201443614833	
22-05-31 17:40:34.390 - INFO: epoch: 205	
22-05-31 17:40:34.390 - INFO: iters: 933443	
22-05-31 17:40:34.390 - INFO: 


------------------------------Validation Start------------------------------

from palette-image-to-image-diffusion-models.

Janspiry commented on August 11, 2024

It may be some errors in your save_current_results function, which cause the path contains the sub dir rather than filename.

from palette-image-to-image-diffusion-models.

hasan-sayeed commented on August 11, 2024

Thank you for the reply! We could solve the problem. It was a problem regarding the .fname file.

But now we're getting this error--

Exception has occurred: RuntimeError       (note: full exception trace is shown but execution is paused at: _run_module_as_main)
[enforce fail at C:\cb\pytorch_1000000000000\work\caffe2\serialize\inline_container.cc:300] . unexpected pos 321754496 vs 321754384
  File "C:\Users\Hasan Sayeed\anaconda3\Lib\site-packages\torch\serialization.py", line 380, in save
    _save(obj, opened_zipfile, pickle_module, pickle_protocol)
  File "C:\Users\Hasan Sayeed\anaconda3\Lib\site-packages\torch\serialization.py", line 604, in _save
    zip_file.write_record(name, storage.data_ptr(), num_bytes)

During handling of the above exception, another exception occurred:

  File "C:\Users\Hasan Sayeed\anaconda3\Lib\site-packages\torch\serialization.py", line 260, in __exit__
    self.file_like.write_end_of_file()
  File "C:\Users\Hasan Sayeed\anaconda3\Lib\site-packages\torch\serialization.py", line 381, in save
    return
  File "C:\Users\Hasan Sayeed\Documents\hasan\SR3\Palette\core\base_model.py", line 124, in save_training_state
    torch.save(state, save_path)
  File "C:\Users\Hasan Sayeed\Documents\hasan\SR3\Palette\models\model.py", line 211, in save_everything
    self.save_training_state([self.optG], self.schedulers)
  File "C:\Users\Hasan Sayeed\Documents\hasan\SR3\Palette\core\base_model.py", line 51, in train
    self.save_everything()
  File "C:\Users\Hasan Sayeed\Documents\hasan\SR3\Palette\run.py", line 69, in main_worker
    model.train()
  File "C:\Users\Hasan Sayeed\Documents\hasan\SR3\Palette\run.py", line 103, in <module>
    main_worker(0, 1, opt)
  File "C:\Users\Hasan Sayeed\anaconda3\Lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\Hasan Sayeed\anaconda3\Lib\runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "C:\Users\Hasan Sayeed\anaconda3\Lib\runpy.py", line 265, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "C:\Users\Hasan Sayeed\anaconda3\Lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\Hasan Sayeed\anaconda3\Lib\runpy.py", line 194, in _run_module_as_main (Current frame)
    return _run_code(code, main_globals, None,

from palette-image-to-image-diffusion-models.

Janspiry commented on August 11, 2024

I wasn't sure what the problem was for a while. You can use the latest code, I fixed some bugs.
It is recommended to use the -d option for quick debugging first to prevent errors when validation

from palette-image-to-image-diffusion-models.

sgbaird commented on August 11, 2024

@hasan-sayeed I think you never figured this out. Do you have the stack trace for the latest error?

I might try running on a Linux machine to verify.

from palette-image-to-image-diffusion-models.

Related Issues (20)

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs