Comments (8)
I've been seeing this too, looking into it right now.
from dreambooth-stable-diffusion.
Do you have "max_training_steps = 1000"?
I don't get mine to run at all, so :/
from dreambooth-stable-diffusion.
Nope, the max training steps is at 3000
from dreambooth-stable-diffusion.
I tried removing the --no-test
param, still get this error
Here comes the checkpoint...
Another one bites the dust...
Traceback (most recent call last):
File "main.py", line 847, in <module>
trainer.fit(model, data)
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 771, in fit
self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 723, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 811, in _fit_impl
results = self._run(model, ckpt_path=self.ckpt_path)
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1236, in _run
results = self._run_stage()
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1323, in _run_stage
return self._run_train()
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1353, in _run_train
self.fit_loop.run()
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 204, in run
self.advance(*args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/loops/fit_loop.py", line 266, in advance
self._outputs = self.epoch_loop.run(self._data_fetcher)
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 205, in run
self.on_advance_end()
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 255, in on_advance_end
self._run_validation()
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 311, in _run_validation
self.val_loop.run()
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 204, in run
self.advance(*args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 155, in advance
dl_outputs = self.epoch_loop.run(self._data_fetcher, dl_max_batches, kwargs)
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 204, in run
self.advance(*args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 134, in advance
self._on_evaluation_batch_end(output, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 267, in _on_evaluation_batch_end
self.trainer._call_callback_hooks(hook_name, output, *kwargs.values())
File "/opt/conda/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1636, in _call_callback_hooks
fn(self, self.lightning_module, *args, **kwargs)
File "/workspace/Dreambooth-Stable-Diffusion/main.py", line 470, in on_validation_batch_end
self.log_img(pl_module, batch, batch_idx, split="val")
File "/workspace/Dreambooth-Stable-Diffusion/main.py", line 434, in log_img
images = pl_module.log_images(batch, split=split, **self.log_images_kwargs)
File "/opt/conda/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/workspace/Dreambooth-Stable-Diffusion/ldm/models/diffusion/ddpm.py", line 1328, in log_images
batch = batch[0]
KeyError: 0
from dreambooth-stable-diffusion.
Itsrunning fine here , past 1000 steps
from dreambooth-stable-diffusion.
Pretty sure I found the issue. It is when writing an epoch, and the epoch size depends on your training_samples count and your regularization_images count. You can trigger this by going over 1 epoch. I think I found the spot in the code, testing it now.
from dreambooth-stable-diffusion.
from dreambooth-stable-diffusion.
At least my issue...
from dreambooth-stable-diffusion.
Related Issues (20)
- StableDiffusionPipeline load pretrained models and learned_embeds_dict from different directories HOT 1
- setuptools==58.5.0 HOT 1
- NameError: name 'trainer' is not defined HOT 8
- Does dreambooth support multi-subjects training? HOT 12
- Training on a model other than SD 1.5 HOT 6
- Regularization step always stops almost at the end HOT 1
- Conda OOM issue locally HOT 1
- Upload Images in Dreambooth Training Environment Setup fails on dreambooth_joepenna.ipynb HOT 4
- num_samples should be a positive integer value, but got num_samples=0 HOT 1
- "No training images provided" error HOT 5
- Establish a baseline with a sample set of training images HOT 1
- Ubuntu Running Error HOT 9
- ImportError: cannot import name '_PATH' from 'pytorch_lightning.utilities.types'
- support freezing text_encoder layers for OpenCLIP
- how to train x4-upscaling? HOT 1
- OutOfMemoryError: CUDA out of memory (WHY?) HOT 2
- Failure in installation step 2: ERROR: file:///content does not appear to be a Python project: neither 'setup.py' nor 'pyproject.toml' found. HOT 2
- pickle.UnpicklingError: invalid load key - Issue Using Safetensor training model HOT 1
- Torch Install Failure: "raise AssertionError("Torch not compiled with CUDA enabled") AssertionError: Torch not compiled with CUDA enabled" HOT 1
- Where is PPL implemented
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dreambooth-stable-diffusion.