Comments (2)
| Name | Type | Params ------------------------------------------------------- 0 | encoder | DimeNetPlusPlusWrap | 2.2 M 1 | decoder | GemNetTDecoder | 2.3 M 2 | fc_mu | Linear | 65.8 K 3 | fc_var | Linear | 65.8 K 4 | fc_num_atoms | Sequential | 71.2 K 5 | fc_lattice | Sequential | 67.3 K 6 | fc_composition | Sequential | 91.5 K ------------------------------------------------------- 4.9 M Trainable params 123 Non-trainable params 4.9 M Total params 19.682 Total estimated model params size (MB) /home2/harsha.vasamsetti/miniconda3/envs/cdvae/lib/python3.8/site-packages/pytorch_lightning/callbacks/model_checkpoint.py:631: UserWarning: Checkpoint directory /scratch/harsha.vasamsetti/hydra/singlerun/2023-05-26/perov exists and is not empty. rank_zero_warn(f"Checkpoint directory {dirpath} exists and is not empty.") Validation sanity check: 0it [00:00, ?it/s]/home2/harsha.vasamsetti/miniconda3/envs/cdvae/lib/python3.8/site-packages/torch_geometric/deprecation.py:13: UserWarning: 'data.DataLoader' is deprecated, use 'loader.DataLoader' instead warnings.warn(out) /home2/harsha.vasamsetti/miniconda3/envs/cdvae/lib/python3.8/site-packages/pytorch_lightning/trainer/data_loading.py:116: UserWarning: The dataloader, val_dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 40 which is the number of cpus on this machine) in the `DataLoader` init to improve performance. rank_zero_warn( Validation sanity check: 0%| | 0/2 [00:00<?, ?it/s]/scratch/harsha.vasamsetti/cdvae/cdvae/common/data_utils.py:622: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). X = torch.tensor(X, dtype=torch.float) /scratch/harsha.vasamsetti/cdvae/cdvae/common/data_utils.py:618: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). X = torch.tensor(X, dtype=torch.float) /home2/harsha.vasamsetti/miniconda3/envs/cdvae/lib/python3.8/site-packages/pytorch_lightning/utilities/data.py:59: UserWarning: Trying to infer the `batch_size` from an ambiguous collection. The batch size we found is 10. To avoid any miscalculations, use `self.log(..., batch_size=batch_size)`. warning_cache.warn( /home2/harsha.vasamsetti/miniconda3/envs/cdvae/lib/python3.8/site-packages/pytorch_lightning/trainer/data_loading.py:116: UserWarning: The dataloader, train_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 40 which is the number of cpus on this machine) in the `DataLoader` init to improve performance. rank_zero_warn( /home2/harsha.vasamsetti/miniconda3/envs/cdvae/lib/python3.8/site-packages/pytorch_lightning/trainer/data_loading.py:412: UserWarning: The number of training samples (23) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch. rank_zero_warn( Epoch 0: 100%|█| 23/23 [00:09<00:00, 2.53it/s, loss=91.2, v_num=t2wv, train_loss_step=80.70, train_natom_loss_step=Error executing job with overrides: ['data=perov', 'expname=perov'] Traceback (most recent call last): File "cdvae/run.py", line 167, in main run(cfg) File "cdvae/run.py", line 155, in run trainer.fit(model=model, datamodule=datamodule) File "/home2/harsha.vasamsetti/miniconda3/envs/cdvae/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 738, in fit self._call_and_handle_interrupt( File "/home2/harsha.vasamsetti/miniconda3/envs/cdvae/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 683, in _call_and_handle_interrupt return trainer_fn(*args, **kwargs) File "/home2/harsha.vasamsetti/miniconda3/envs/cdvae/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 773, in _fit_impl self._run(model, ckpt_path=ckpt_path) File "/home2/harsha.vasamsetti/miniconda3/envs/cdvae/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1195, in _run self._dispatch() File "/home2/harsha.vasamsetti/miniconda3/envs/cdvae/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1275, in _dispatch self.training_type_plugin.start_training(self) File "/home2/harsha.vasamsetti/miniconda3/envs/cdvae/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 202, in start_training self._results = trainer.run_stage() File "/home2/harsha.vasamsetti/miniconda3/envs/cdvae/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1285, in run_stage return self._run_train() File "/home2/harsha.vasamsetti/miniconda3/envs/cdvae/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1315, in _run_train self.fit_loop.run() File "/home2/harsha.vasamsetti/miniconda3/envs/cdvae/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 145, in run self.advance(*args, **kwargs) File "/home2/harsha.vasamsetti/miniconda3/envs/cdvae/lib/python3.8/site-packages/pytorch_lightning/loops/fit_loop.py", line 234, in advance self.epoch_loop.run(data_fetcher) File "/home2/harsha.vasamsetti/miniconda3/envs/cdvae/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 151, in run output = self.on_run_end() File "/home2/harsha.vasamsetti/miniconda3/envs/cdvae/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 303, in on_run_end self.update_lr_schedulers("epoch", update_plateau_schedulers=True) File "/home2/harsha.vasamsetti/miniconda3/envs/cdvae/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 441, in update_lr_schedulers self._update_learning_rates( File "/home2/harsha.vasamsetti/miniconda3/envs/cdvae/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 486, in _update_learning_rates raise MisconfigurationException( pytorch_lightning.utilities.exceptions.MisconfigurationException: ReduceLROnPlateau conditioned on metric val_loss which is not available. Available metrics are: ['train_loss', 'train_loss_step', 'train_natom_loss', 'train_natom_loss_step', 'train_lattice_loss', 'train_lattice_loss_step', 'train_coord_loss', 'train_coord_loss_step', 'train_type_loss', 'train_type_loss_step', 'train_kld_loss', 'train_kld_loss_step', 'train_composition_loss', 'train_composition_loss_step', 'train_loss_epoch', 'train_natom_loss_epoch', 'train_lattice_loss_epoch', 'train_coord_loss_epoch', 'train_type_loss_epoch', 'train_kld_loss_epoch', 'train_composition_loss_epoch']. Condition can be set using `monitor` key in lr scheduler dict
When training the code, I am receiving this error.
Hi, I have the same issue, Did you solve it?
from cdvae.
Hi, I changed the 'strict' parameter in the scheduler to 'False' (by default it should be True) and solved the problem. Here is how I modified the configure_optimizer function:
def configure_optimizers(self):
opt = hydra.utils.instantiate(
self.hparams.optim.optimizer, params=self.parameters(), _convert_="partial"
)
if not self.hparams.optim.use_lr_scheduler:
return [opt]
scheduler = hydra.utils.instantiate(
self.hparams.optim.lr_scheduler, optimizer=opt
)
return {"optimizer": opt,
"lr_scheduler": {
"scheduler": scheduler,
"monitor": "val_loss",
"strict": False}}
from cdvae.
Related Issues (20)
- import error HOT 2
- OSS Error: The paging file is too small for this operation to complete. HOT 1
- Code Hanging At Start of Training
- Using your evaluation script (compute_metrics.py) HOT 1
- training project HOT 2
- A problem while training model without a property predictor HOT 1
- ModuleNotFoundError: No module named 'cdvae' HOT 2
- Successful packages to run cdvae
- Unable to run CDVAE or training script HOT 2
- expected scalar type Float but found Doubl HOT 1
- successful env.cpu.yml
- Working CDVAE implementation with Dockerfile (CUDA major versions 10, 11, 12)
- Successful environment setup HOT 5
- Property Models in cdvae/prop_models
- hydra.errors.InstantiationException: Error locating target 'cdvae.pl_data.datamodule.CrystDataModule', set env var HYDRA_FULL_ERROR=1 to see chained exception. full_key: data.datamodule
- how to visualize the generated results? HOT 19
- Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
- Public checkpoints
- Successful environment HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cdvae.