GithubHelp home page GithubHelp logo

Comments (5)

hasan-sayeed avatar hasan-sayeed commented on August 11, 2024

And here is the train log:

22-05-31 17:39:02.309 - INFO: Create the log file in directory experiments\train_inpainting_celebahq_220531_173900.

22-05-31 17:39:02.331 - INFO: Dataset [InpaintDataset() form data.dataset] is created.
22-05-31 17:39:02.332 - INFO: Dataset for train have 99 samples.
22-05-31 17:39:02.332 - INFO: Dataset for val have 2 samples.
22-05-31 17:39:02.672 - INFO: Network [Network() form models.network] is created.
22-05-31 17:39:02.672 - INFO: Network [Network] weights initialize using [kaiming] method.
22-05-31 17:39:02.967 - INFO: Config is a str, converts to a dict {'name': 'mae'}
22-05-31 17:39:03.195 - INFO: Metric [mae() form models.metric] is created.
22-05-31 17:39:03.195 - INFO: Config is a str, converts to a dict {'name': 'mse_loss'}
22-05-31 17:39:03.210 - INFO: Loss [mse_loss() form models.loss] is created.
22-05-31 17:39:03.211 - INFO: Optimizer [Adam() form default file] is created.
22-05-31 17:39:03.212 - INFO: Option is None when initialize Scheduler
22-05-31 17:39:03.674 - INFO: Loading pretrained model from [experiments/train_inpainting_celebahq/checkpoint/200_Network.pth] ...
22-05-31 17:39:04.662 - INFO: Loading training state for [experiments/train_inpainting_celebahq/checkpoint/200.state] ...
22-05-31 17:39:05.057 - INFO: Model [Palette() form models.model] is created.
22-05-31 17:39:05.057 - INFO: Begin model train.
22-05-31 17:39:26.918 - INFO: train/mse_loss: 0.002101995706845738	
22-05-31 17:39:26.918 - INFO: epoch: 201	
22-05-31 17:39:26.918 - INFO: iters: 933311	
22-05-31 17:39:43.346 - INFO: train/mse_loss: 0.0034099449520440294	
22-05-31 17:39:43.346 - INFO: epoch: 202	
22-05-31 17:39:43.346 - INFO: iters: 933344	
22-05-31 17:40:00.108 - INFO: train/mse_loss: 0.0033231936262878166	
22-05-31 17:40:00.108 - INFO: epoch: 203	
22-05-31 17:40:00.108 - INFO: iters: 933377	
22-05-31 17:40:17.151 - INFO: train/mse_loss: 0.0026962171268127295	
22-05-31 17:40:17.151 - INFO: epoch: 204	
22-05-31 17:40:17.151 - INFO: iters: 933410	
22-05-31 17:40:34.390 - INFO: train/mse_loss: 0.006467201443614833	
22-05-31 17:40:34.390 - INFO: epoch: 205	
22-05-31 17:40:34.390 - INFO: iters: 933443	
22-05-31 17:40:34.390 - INFO: 


------------------------------Validation Start------------------------------

from palette-image-to-image-diffusion-models.

Janspiry avatar Janspiry commented on August 11, 2024

It may be some errors in your save_current_results function, which cause the path contains the sub dir rather than filename.image

image

from palette-image-to-image-diffusion-models.

hasan-sayeed avatar hasan-sayeed commented on August 11, 2024

Thank you for the reply! We could solve the problem. It was a problem regarding the .fname file.

But now we're getting this error--

Exception has occurred: RuntimeError       (note: full exception trace is shown but execution is paused at: _run_module_as_main)
[enforce fail at C:\cb\pytorch_1000000000000\work\caffe2\serialize\inline_container.cc:300] . unexpected pos 321754496 vs 321754384
  File "C:\Users\Hasan Sayeed\anaconda3\Lib\site-packages\torch\serialization.py", line 380, in save
    _save(obj, opened_zipfile, pickle_module, pickle_protocol)
  File "C:\Users\Hasan Sayeed\anaconda3\Lib\site-packages\torch\serialization.py", line 604, in _save
    zip_file.write_record(name, storage.data_ptr(), num_bytes)

During handling of the above exception, another exception occurred:

  File "C:\Users\Hasan Sayeed\anaconda3\Lib\site-packages\torch\serialization.py", line 260, in __exit__
    self.file_like.write_end_of_file()
  File "C:\Users\Hasan Sayeed\anaconda3\Lib\site-packages\torch\serialization.py", line 381, in save
    return
  File "C:\Users\Hasan Sayeed\Documents\hasan\SR3\Palette\core\base_model.py", line 124, in save_training_state
    torch.save(state, save_path)
  File "C:\Users\Hasan Sayeed\Documents\hasan\SR3\Palette\models\model.py", line 211, in save_everything
    self.save_training_state([self.optG], self.schedulers)
  File "C:\Users\Hasan Sayeed\Documents\hasan\SR3\Palette\core\base_model.py", line 51, in train
    self.save_everything()
  File "C:\Users\Hasan Sayeed\Documents\hasan\SR3\Palette\run.py", line 69, in main_worker
    model.train()
  File "C:\Users\Hasan Sayeed\Documents\hasan\SR3\Palette\run.py", line 103, in <module>
    main_worker(0, 1, opt)
  File "C:\Users\Hasan Sayeed\anaconda3\Lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\Hasan Sayeed\anaconda3\Lib\runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "C:\Users\Hasan Sayeed\anaconda3\Lib\runpy.py", line 265, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "C:\Users\Hasan Sayeed\anaconda3\Lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\Hasan Sayeed\anaconda3\Lib\runpy.py", line 194, in _run_module_as_main (Current frame)
    return _run_code(code, main_globals, None,

from palette-image-to-image-diffusion-models.

Janspiry avatar Janspiry commented on August 11, 2024

I wasn't sure what the problem was for a while. You can use the latest code, I fixed some bugs.
It is recommended to use the -d option for quick debugging first to prevent errors when validation

from palette-image-to-image-diffusion-models.

sgbaird avatar sgbaird commented on August 11, 2024

@hasan-sayeed I think you never figured this out. Do you have the stack trace for the latest error?

I might try running on a Linux machine to verify.

from palette-image-to-image-diffusion-models.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.