<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clip

<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data

pytorch_lightning.utilities.exceptions.MisconfigurationException: ReduceLROnPlateau conditioned on metric val_loss about cdvae HOT 2 OPEN

txie-93 commented on August 23, 2024

pytorch_lightning.utilities.exceptions.MisconfigurationException: ReduceLROnPlateau conditioned on metric val_loss

from cdvae.

Comments (2)

zhuccly commented on August 23, 2024

  | Name           | Type                | Params
-------------------------------------------------------
0 | encoder        | DimeNetPlusPlusWrap | 2.2 M 
1 | decoder        | GemNetTDecoder      | 2.3 M 
2 | fc_mu          | Linear              | 65.8 K
3 | fc_var         | Linear              | 65.8 K
4 | fc_num_atoms   | Sequential          | 71.2 K
5 | fc_lattice     | Sequential          | 67.3 K
6 | fc_composition | Sequential          | 91.5 K
-------------------------------------------------------
4.9 M     Trainable params
123       Non-trainable params
4.9 M     Total params
19.682    Total estimated model params size (MB)
/home2/harsha.vasamsetti/miniconda3/envs/cdvae/lib/python3.8/site-packages/pytorch_lightning/callbacks/model_checkpoint.py:631: UserWarning: Checkpoint directory /scratch/harsha.vasamsetti/hydra/singlerun/2023-05-26/perov exists and is not empty.
  rank_zero_warn(f"Checkpoint directory {dirpath} exists and is not empty.")
Validation sanity check: 0it [00:00, ?it/s]/home2/harsha.vasamsetti/miniconda3/envs/cdvae/lib/python3.8/site-packages/torch_geometric/deprecation.py:13: UserWarning: 'data.DataLoader' is deprecated, use 'loader.DataLoader' instead
  warnings.warn(out)
/home2/harsha.vasamsetti/miniconda3/envs/cdvae/lib/python3.8/site-packages/pytorch_lightning/trainer/data_loading.py:116: UserWarning: The dataloader, val_dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 40 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
  rank_zero_warn(
Validation sanity check:   0%|                                                                | 0/2 [00:00<?, ?it/s]/scratch/harsha.vasamsetti/cdvae/cdvae/common/data_utils.py:622: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  X = torch.tensor(X, dtype=torch.float)
/scratch/harsha.vasamsetti/cdvae/cdvae/common/data_utils.py:618: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  X = torch.tensor(X, dtype=torch.float)
/home2/harsha.vasamsetti/miniconda3/envs/cdvae/lib/python3.8/site-packages/pytorch_lightning/utilities/data.py:59: UserWarning: Trying to infer the `batch_size` from an ambiguous collection. The batch size we found is 10. To avoid any miscalculations, use `self.log(..., batch_size=batch_size)`.
  warning_cache.warn(
/home2/harsha.vasamsetti/miniconda3/envs/cdvae/lib/python3.8/site-packages/pytorch_lightning/trainer/data_loading.py:116: UserWarning: The dataloader, train_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 40 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
  rank_zero_warn(
/home2/harsha.vasamsetti/miniconda3/envs/cdvae/lib/python3.8/site-packages/pytorch_lightning/trainer/data_loading.py:412: UserWarning: The number of training samples (23) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
  rank_zero_warn(
Epoch 0: 100%|█| 23/23 [00:09<00:00,  2.53it/s, loss=91.2, v_num=t2wv, train_loss_step=80.70, train_natom_loss_step=Error executing job with overrides: ['data=perov', 'expname=perov']
Traceback (most recent call last):
  File "cdvae/run.py", line 167, in main
    run(cfg)
  File "cdvae/run.py", line 155, in run
    trainer.fit(model=model, datamodule=datamodule)
  File "/home2/harsha.vasamsetti/miniconda3/envs/cdvae/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 738, in fit
    self._call_and_handle_interrupt(
  File "/home2/harsha.vasamsetti/miniconda3/envs/cdvae/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 683, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/home2/harsha.vasamsetti/miniconda3/envs/cdvae/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 773, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/home2/harsha.vasamsetti/miniconda3/envs/cdvae/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1195, in _run
    self._dispatch()
  File "/home2/harsha.vasamsetti/miniconda3/envs/cdvae/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1275, in _dispatch
    self.training_type_plugin.start_training(self)
  File "/home2/harsha.vasamsetti/miniconda3/envs/cdvae/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 202, in start_training
    self._results = trainer.run_stage()
  File "/home2/harsha.vasamsetti/miniconda3/envs/cdvae/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1285, in run_stage
    return self._run_train()
  File "/home2/harsha.vasamsetti/miniconda3/envs/cdvae/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1315, in _run_train
    self.fit_loop.run()
  File "/home2/harsha.vasamsetti/miniconda3/envs/cdvae/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 145, in run
    self.advance(*args, **kwargs)
  File "/home2/harsha.vasamsetti/miniconda3/envs/cdvae/lib/python3.8/site-packages/pytorch_lightning/loops/fit_loop.py", line 234, in advance
    self.epoch_loop.run(data_fetcher)
  File "/home2/harsha.vasamsetti/miniconda3/envs/cdvae/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 151, in run
    output = self.on_run_end()
  File "/home2/harsha.vasamsetti/miniconda3/envs/cdvae/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 303, in on_run_end
    self.update_lr_schedulers("epoch", update_plateau_schedulers=True)
  File "/home2/harsha.vasamsetti/miniconda3/envs/cdvae/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 441, in update_lr_schedulers
    self._update_learning_rates(
  File "/home2/harsha.vasamsetti/miniconda3/envs/cdvae/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 486, in _update_learning_rates
    raise MisconfigurationException(
pytorch_lightning.utilities.exceptions.MisconfigurationException: ReduceLROnPlateau conditioned on metric val_loss which is not available. Available metrics are: ['train_loss', 'train_loss_step', 'train_natom_loss', 'train_natom_loss_step', 'train_lattice_loss', 'train_lattice_loss_step', 'train_coord_loss', 'train_coord_loss_step', 'train_type_loss', 'train_type_loss_step', 'train_kld_loss', 'train_kld_loss_step', 'train_composition_loss', 'train_composition_loss_step', 'train_loss_epoch', 'train_natom_loss_epoch', 'train_lattice_loss_epoch', 'train_coord_loss_epoch', 'train_type_loss_epoch', 'train_kld_loss_epoch', 'train_composition_loss_epoch']. Condition can be set using `monitor` key in lr scheduler dict

When training the code, I am receiving this error.

Hi, I have the same issue, Did you solve it?

from cdvae.

confymacs commented on August 23, 2024

Hi, I changed the 'strict' parameter in the scheduler to 'False' (by default it should be True) and solved the problem. Here is how I modified the configure_optimizer function:

def configure_optimizers(self):
        opt = hydra.utils.instantiate(
            self.hparams.optim.optimizer, params=self.parameters(), _convert_="partial"
        )
        if not self.hparams.optim.use_lr_scheduler:
            return [opt]
        scheduler = hydra.utils.instantiate(
            self.hparams.optim.lr_scheduler, optimizer=opt
        )
       
        return {"optimizer": opt, 
                "lr_scheduler": {
                    "scheduler": scheduler,
                    "monitor": "val_loss",
                    "strict": False}}

from cdvae.

pytorch_lightning.utilities.exceptions.MisconfigurationException: ReduceLROnPlateau conditioned on metric val_loss about cdvae HOT 2 OPEN

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs