Hi I am new to MACE. I am encountering a problem that after training complete, why

Thanks for you reply. Your input seems correct to me. However I think you might

Yes, it is ilyes.batatia@ens-paris-

Question related to Model generated after training complete about mace HOT 8 CLOSED

Vinceuwe commented on June 2, 2024

Question related to Model generated after training complete

from mace.

Comments (8)

davkovacs commented on June 2, 2024

Hi! Thank you for reporting back. Could you please send us the input line for MACE?

When swa is switched on the loss changes, so it's numerical value will be different, which is not a problem.

from mace.

ilyes319 commented on June 2, 2024

Hi @Vinceuwe,
Thanks for your interest in MACE!
The model that was saved corresponds to the model with the overall lowest loss. This means that it will depend on how you weight the forces and energies in your loss. As the default is using a larger weight on forces than energies, the model with the best RMSE on forces (at epoch 176 for you) was saved.
The reason it might be confusing for you is the usage of swa. The way it works in your case is the following :

For the first 800 epochs, the loss was computed by putting a weight of 10 on the forces and 1 on the energies, hence your forces being much better.
After 800 epochs, the loss weights changed to 1000 on the energies and 1 on the forces. Hence the better energies.

Because it seems the model is struggling with learning energies, this results in a significant deterioration in the accuracy of the forces. The best model saved was thus the one at 176.
Could you please send us the input file for the trained model? Also, could you tell us your system size and check if you are using the correct E0s.

from mace.

Vinceuwe commented on June 2, 2024

Hi, Thanks for your reply. This information you provide definitely help me understand MACE more. The following is the submitting script:
python /raven/u/hwan/mace/scripts/run_train.py
--name="MACE_model"
--train_file="atoms_training_32.xyz"
--valid_fraction=0.05
--test_file="atoms_test_32.xyz"
--config_type_weights='{"Default":1.0}'
--energy_key="DFT_energy"
--forces_key="DFT_forces"
--model="MACE"
--hidden_irreps='128x0e + 128x1o'
--r_max=5.0
--batch_size=30
--max_num_epochs=1000
--swa
--start_swa=800
--ema
--ema_decay=0.99
--amsgrad
--restart_latest
--device=cuda \

2022-09-20 10:27:01.622 INFO: CUDA version: 11.1, CUDA device: 0
2022-09-20 10:27:07.698 INFO: Using isolated atom energies from training file
2022-09-20 10:27:07.725 INFO: Loaded 931 training configurations from 'atoms_training_32.xyz'
2022-09-20 10:27:07.725 INFO: Using random 5.0% of training set for validation
2022-09-20 10:27:07.864 INFO: Loaded 207 test configurations from 'atoms_test_32.xyz'
2022-09-20 10:27:07.864 INFO: Total number of configurations: train=885, valid=46, tests=[Default: 131, slab_MD: 76]
2022-09-20 10:27:07.870 INFO: AtomicNumberTable: (8, 77)
2022-09-20 10:27:07.871 INFO: Atomic energies: [-0.08969644, -0.33524439]
2022-09-20 10:27:24.751 INFO: WeightedEnergyForcesLoss(energy_weight=1.000, forces_weight=10.000)
2022-09-20 10:27:24.908 INFO: Average number of neighbors: 39.096

For my training set, it has atoms ranging from 4 atoms to 200 atoms, they are quite diverse which is the result of GAP workflow over 30 iterations

from mace.

ilyes319 commented on June 2, 2024

Thanks for you reply. Your input script seems correct to me. However I think you might have a problem with your atomic energies. Could you please try to run again while adding to your input script --E0s="average". This will do a linear fit on your training data to compute your E0s.

from mace.

Vinceuwe commented on June 2, 2024

Hi I test --E0s="average"
with Isolated atom in my training set:
2022-09-20 20:22:45.071 INFO: Epoch 994: loss=4.3130, RMSE_E_per_atom=68.1 meV, RMSE_F=450.6 meV / A
2022-09-20 20:23:09.699 INFO: Epoch 996: loss=4.1301, RMSE_E_per_atom=66.6 meV, RMSE_F=445.2 meV / A
2022-09-20 20:23:34.471 INFO: Epoch 998: loss=4.0964, RMSE_E_per_atom=66.3 meV, RMSE_F=447.6 meV / A
2022-09-20 20:23:46.774 INFO: Training complete
2022-09-20 20:23:46.775 INFO: Loading checkpoint: checkpoints/MACE_model_run-123_epoch-460.pt
2022-09-20 20:23:47.353 INFO: Loaded model from epoch 460
2022-09-20 20:23:47.353 INFO: Computing metrics for training, validation, and test sets
2022-09-20 20:24:03.228 INFO: Evaluating train ...
2022-09-20 20:24:19.352 INFO: Evaluating valid ...
2022-09-20 20:24:22.500 INFO: Evaluating Default ...
2022-09-20 20:24:24.432 INFO: Evaluating slab_MD ...
2022-09-20 20:24:24.914 INFO:
+-------------+---------------------+------------------+-------------------+
| config_type | RMSE E / meV / atom | RMSE F / meV / A | relative F RMSE % |
+-------------+---------------------+------------------+-------------------+
| train | 110.3 | 211.1 | 1.49 |
| valid | 122.6 | 186.5 | 4.17 |
| Default | 103.0 | 101.5 | 3378.72 |
| slab_MD | 52.5 | 177.4 | 13.87 |
+-------------+---------------------+------------------+-------------------+
2022-09-20 20:24:24.914 INFO: Saving model to checkpoints/MACE_model_run-123.model

without Isolated atom in my training set:
2022-09-20 20:44:35.017 INFO: Epoch 996: loss=219.9501, RMSE_E_per_atom=503.1 meV, RMSE_F=517.0 meV / A
2022-09-20 20:45:00.217 INFO: Epoch 998: loss=236.2850, RMSE_E_per_atom=521.5 meV, RMSE_F=523.2 meV / A
2022-09-20 20:45:12.731 INFO: Training complete
2022-09-20 20:45:12.732 INFO: Loading checkpoint: checkpoints/MACE_model_run-123_epoch-800.pt
2022-09-20 20:45:13.144 INFO: Loaded model from epoch 800
2022-09-20 20:45:13.144 INFO: Computing metrics for training, validation, and test sets
2022-09-20 20:45:28.456 INFO: Evaluating train ...
2022-09-20 20:45:44.218 INFO: Evaluating valid ...
2022-09-20 20:45:47.300 INFO: Evaluating Default ...
2022-09-20 20:45:49.142 INFO: Evaluating slab_MD ...
2022-09-20 20:45:49.613 INFO:
+-------------+---------------------+------------------+-------------------+
| config_type | RMSE E / meV / atom | RMSE F / meV / A | relative F RMSE % |
+-------------+---------------------+------------------+-------------------+
| train | 68.4 | 120.9 | 0.85 |
| valid | 475.9 | 281.9 | 6.30 |
| Default | 307.2 | 174.9 | 5819.96 |
| slab_MD | 44.9 | 185.4 | 14.50 |
+-------------+---------------------+------------------+-------------------+
2022-09-20 20:45:49.613 INFO: Saving model to checkpoints/MACE_model_run-123.model

These results still looks not that satisfying, do you have any possible suggestions for this?

from mace.

ilyes319 commented on June 2, 2024

Could you please link me your full log file and your train file please?

from mace.

Vinceuwe commented on June 2, 2024

he, can you provide your email?

from mace.

ilyes319 commented on June 2, 2024

Yes, it is [email protected] .

from mace.

Question related to Model generated after training complete about mace HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs