GithubHelp home page GithubHelp logo

Comments (8)

davkovacs avatar davkovacs commented on June 2, 2024

Hi! Thank you for reporting back. Could you please send us the input line for MACE?

When swa is switched on the loss changes, so it's numerical value will be different, which is not a problem.

from mace.

ilyes319 avatar ilyes319 commented on June 2, 2024

Hi @Vinceuwe,
Thanks for your interest in MACE!
The model that was saved corresponds to the model with the overall lowest loss. This means that it will depend on how you weight the forces and energies in your loss. As the default is using a larger weight on forces than energies, the model with the best RMSE on forces (at epoch 176 for you) was saved.
The reason it might be confusing for you is the usage of swa. The way it works in your case is the following :

  • For the first 800 epochs, the loss was computed by putting a weight of 10 on the forces and 1 on the energies, hence your forces being much better.
  • After 800 epochs, the loss weights changed to 1000 on the energies and 1 on the forces. Hence the better energies.

Because it seems the model is struggling with learning energies, this results in a significant deterioration in the accuracy of the forces. The best model saved was thus the one at 176.
Could you please send us the input file for the trained model? Also, could you tell us your system size and check if you are using the correct E0s.

from mace.

Vinceuwe avatar Vinceuwe commented on June 2, 2024

Hi, Thanks for your reply. This information you provide definitely help me understand MACE more. The following is the submitting script:
python /raven/u/hwan/mace/scripts/run_train.py
--name="MACE_model"
--train_file="atoms_training_32.xyz"
--valid_fraction=0.05
--test_file="atoms_test_32.xyz"
--config_type_weights='{"Default":1.0}'
--energy_key="DFT_energy"
--forces_key="DFT_forces"
--model="MACE"
--hidden_irreps='128x0e + 128x1o'
--r_max=5.0
--batch_size=30
--max_num_epochs=1000
--swa
--start_swa=800
--ema
--ema_decay=0.99
--amsgrad
--restart_latest
--device=cuda \

2022-09-20 10:27:01.622 INFO: CUDA version: 11.1, CUDA device: 0
2022-09-20 10:27:07.698 INFO: Using isolated atom energies from training file
2022-09-20 10:27:07.725 INFO: Loaded 931 training configurations from 'atoms_training_32.xyz'
2022-09-20 10:27:07.725 INFO: Using random 5.0% of training set for validation
2022-09-20 10:27:07.864 INFO: Loaded 207 test configurations from 'atoms_test_32.xyz'
2022-09-20 10:27:07.864 INFO: Total number of configurations: train=885, valid=46, tests=[Default: 131, slab_MD: 76]
2022-09-20 10:27:07.870 INFO: AtomicNumberTable: (8, 77)
2022-09-20 10:27:07.871 INFO: Atomic energies: [-0.08969644, -0.33524439]
2022-09-20 10:27:24.751 INFO: WeightedEnergyForcesLoss(energy_weight=1.000, forces_weight=10.000)
2022-09-20 10:27:24.908 INFO: Average number of neighbors: 39.096

For my training set, it has atoms ranging from 4 atoms to 200 atoms, they are quite diverse which is the result of GAP workflow over 30 iterations

from mace.

ilyes319 avatar ilyes319 commented on June 2, 2024

Thanks for you reply. Your input script seems correct to me. However I think you might have a problem with your atomic energies. Could you please try to run again while adding to your input script --E0s="average". This will do a linear fit on your training data to compute your E0s.

from mace.

Vinceuwe avatar Vinceuwe commented on June 2, 2024

Hi I test --E0s="average"
with Isolated atom in my training set:
2022-09-20 20:22:45.071 INFO: Epoch 994: loss=4.3130, RMSE_E_per_atom=68.1 meV, RMSE_F=450.6 meV / A
2022-09-20 20:23:09.699 INFO: Epoch 996: loss=4.1301, RMSE_E_per_atom=66.6 meV, RMSE_F=445.2 meV / A
2022-09-20 20:23:34.471 INFO: Epoch 998: loss=4.0964, RMSE_E_per_atom=66.3 meV, RMSE_F=447.6 meV / A
2022-09-20 20:23:46.774 INFO: Training complete
2022-09-20 20:23:46.775 INFO: Loading checkpoint: checkpoints/MACE_model_run-123_epoch-460.pt
2022-09-20 20:23:47.353 INFO: Loaded model from epoch 460
2022-09-20 20:23:47.353 INFO: Computing metrics for training, validation, and test sets
2022-09-20 20:24:03.228 INFO: Evaluating train ...
2022-09-20 20:24:19.352 INFO: Evaluating valid ...
2022-09-20 20:24:22.500 INFO: Evaluating Default ...
2022-09-20 20:24:24.432 INFO: Evaluating slab_MD ...
2022-09-20 20:24:24.914 INFO:
+-------------+---------------------+------------------+-------------------+
| config_type | RMSE E / meV / atom | RMSE F / meV / A | relative F RMSE % |
+-------------+---------------------+------------------+-------------------+
| train | 110.3 | 211.1 | 1.49 |
| valid | 122.6 | 186.5 | 4.17 |
| Default | 103.0 | 101.5 | 3378.72 |
| slab_MD | 52.5 | 177.4 | 13.87 |
+-------------+---------------------+------------------+-------------------+
2022-09-20 20:24:24.914 INFO: Saving model to checkpoints/MACE_model_run-123.model

without Isolated atom in my training set:
2022-09-20 20:44:35.017 INFO: Epoch 996: loss=219.9501, RMSE_E_per_atom=503.1 meV, RMSE_F=517.0 meV / A
2022-09-20 20:45:00.217 INFO: Epoch 998: loss=236.2850, RMSE_E_per_atom=521.5 meV, RMSE_F=523.2 meV / A
2022-09-20 20:45:12.731 INFO: Training complete
2022-09-20 20:45:12.732 INFO: Loading checkpoint: checkpoints/MACE_model_run-123_epoch-800.pt
2022-09-20 20:45:13.144 INFO: Loaded model from epoch 800
2022-09-20 20:45:13.144 INFO: Computing metrics for training, validation, and test sets
2022-09-20 20:45:28.456 INFO: Evaluating train ...
2022-09-20 20:45:44.218 INFO: Evaluating valid ...
2022-09-20 20:45:47.300 INFO: Evaluating Default ...
2022-09-20 20:45:49.142 INFO: Evaluating slab_MD ...
2022-09-20 20:45:49.613 INFO:
+-------------+---------------------+------------------+-------------------+
| config_type | RMSE E / meV / atom | RMSE F / meV / A | relative F RMSE % |
+-------------+---------------------+------------------+-------------------+
| train | 68.4 | 120.9 | 0.85 |
| valid | 475.9 | 281.9 | 6.30 |
| Default | 307.2 | 174.9 | 5819.96 |
| slab_MD | 44.9 | 185.4 | 14.50 |
+-------------+---------------------+------------------+-------------------+
2022-09-20 20:45:49.613 INFO: Saving model to checkpoints/MACE_model_run-123.model

These results still looks not that satisfying, do you have any possible suggestions for this?

from mace.

ilyes319 avatar ilyes319 commented on June 2, 2024

Could you please link me your full log file and your train file please?

from mace.

Vinceuwe avatar Vinceuwe commented on June 2, 2024

he, can you provide your email?

from mace.

ilyes319 avatar ilyes319 commented on June 2, 2024

Yes, it is [email protected] .

from mace.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.