materialsvirtuallab / matgl Goto Github PK

Graph deep learning library for materials

License: BSD 3-Clause "New" or "Revised" License

Python 99.75% Dockerfile 0.25%

deep graph learning materials-informatics materials-science

matgl's Issues

Add README for each pre-trained model

Each pre-trained model directory should contain a README with key information about the models. I have written an outlined for the M3GNet PES. Particularly important are the training datasets and the actual performance metrics (MAE in energies, forces, etc.). It would also be helpful if an actual script is provided to demonstrate the training protocol.

Pls complete it and add similar files for the other models.

`mypy` errors affect DX

If I install pre-commit hooks and then try to commit, mypy aborts with 19 pre-existing errors:

- hook id: mypy
- exit code: 1

examples/trainer_beta/train.py:15: error: Module "matgl.utils" has no attribute "utils"  [attr-defined]
examples/trainer_beta/train.py:43: error: "Tuple[Any, ...]" has no attribute "z_mean"  [attr-defined]
examples/trainer_beta/train.py:43: error: "Tuple[Any, ...]" has no attribute "num_bond_mean"  [attr-defined]
examples/trainer_beta/train.py:47: error: "Tuple[Any, ...]" has no attribute "mean"  [attr-defined]
examples/trainer_beta/train.py:47: error: "Tuple[Any, ...]" has no attribute "std"  [attr-defined]
examples/trainer_beta/train.py:56: error: "int" has no attribute "cpu"  [attr-defined]
examples/trainer_beta/train.py:66: error: Function "collections.namedtuple" is not valid as a type  [valid-type]
examples/trainer_beta/train.py:66: note: Perhaps you need "Callable[...]" or a callback protocol?
examples/trainer_beta/train.py:67: error: Function "collections.namedtuple" is not valid as a type  [valid-type]
examples/trainer_beta/train.py:67: note: Perhaps you need "Callable[...]" or a callback protocol?
examples/trainer_beta/train.py:74: error: namedtuple? has no attribute "__iter__" (not iterable)  [attr-defined]
examples/trainer_beta/train.py:80: error: namedtuple? has no attribute "z_mean"  [attr-defined]
examples/trainer_beta/train.py:80: error: namedtuple? has no attribute "num_bond_mean"  [attr-defined]
examples/trainer_beta/train.py:84: error: namedtuple? has no attribute "mean"  [attr-defined]
examples/trainer_beta/train.py:84: error: namedtuple? has no attribute "std"  [attr-defined]
examples/trainer_beta/train.py:90: error: "int" has no attribute "cpu"  [attr-defined]
examples/trainer_beta/train.py:99: error: Function "collections.namedtuple" is not valid as a type  [valid-type]
examples/trainer_beta/train.py:99: note: Perhaps you need "Callable[...]" or a callback protocol?
examples/trainer_beta/train.py:101: error: namedtuple? has no attribute "train"  [attr-defined]
examples/trainer_beta/train.py:105: error: namedtuple? has no attribute "z_mean"  [attr-defined]
examples/trainer_beta/train.py:105: error: namedtuple? has no attribute "num_bond_mean"  [attr-defined]
examples/trainer_beta/train.py:181: error: Argument 1 to "run" has incompatible type "Namespace"; expected "ArgumentParser"  [arg-type]
Found 19 errors in 1 file (checked 1 source file)

Would be a nicer developer experience if linters pass out of the box.

[Bug]: Multi-GPU Training not Working in 0.7.1 and 0.8.5

Email (Optional)

[email protected]

Version

v0.8.5 and v0.7.1

Which OS(es) are you using?

MacOS
Windows
Linux

What happened?

First off, thank you all so far for your help with getting me started on this. I think at this point my issue has become more appropriate for a bug fix.

I am trying to train a model using multiple GPUs and opened the discussion in #188

I first had an issue using 0.8.5 MatGL and PyTorch = 2.1.1 (for CUDA 11.8) where I encountered the same error in #149 :

Traceback (most recent call last):
  File "/home/u2019/work/ml/GPU-test/train.py", line 45, in <module>
    dataset = M3GNetDataset(
  File "/home/u2019/miniconda3/envs/py310/lib/python3.10/site-packages/matgl/graph/data.py", line 255, in __init__
    super().__init__(name=name)
  File "/home/u2019/miniconda3/envs/py310/lib/python3.10/site-packages/dgl/data/dgl_dataset.py", line 112, in __init__
    self._load()
  File "/home/u2019/miniconda3/envs/py310/lib/python3.10/site-packages/dgl/data/dgl_dataset.py", line 203, in _load
    self.process()
  File "/home/u2019/miniconda3/envs/py310/lib/python3.10/site-packages/matgl/graph/data.py", line 278, in process
    line_graph = create_line_graph(graph, self.threebody_cutoff)  # type: ignore
  File "/home/u2019/miniconda3/envs/py310/lib/python3.10/site-packages/matgl/graph/compute.py", line 146, in create_line_graph
    l_g, triple_bond_indices, n_triple_ij, n_triple_i, n_triple_s = compute_3body(graph_with_three_body)
  File "/home/u2019/miniconda3/envs/py310/lib/python3.10/site-packages/matgl/graph/compute.py", line 24, in compute_3body
    first_col = g.edges()[0].numpy().reshape(-1, 1)
  File "/home/u2019/miniconda3/envs/py310/lib/python3.10/site-packages/torch/utils/_device.py", line 62, in __torch_function__
    return func(*args, **kwargs)
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

This error is happening when the actual dataset is being created using M3GNetDataset. It was implemented in this commit : e825b75

Does this even need to be run on a GPU? It seems like due to hard coding it to run on CPU, that any torch.set_default_device('cuda') should come after creating the dataset itself. However looking through issue #94 Prof. Ong states that torch.device('cuda') must come first.

Nevertheless I was suggested to downgrade PyTorch and it's dependencies to 2.0.1 (and I also downgraded matgl to 0.7.1), and following the training example given by SmallBearC, I was able to get a single GPU to run.

However when attempting to setup multi-gpu use, I got an error in the MGLDataLoader:

Traceback (most recent call last):
  File "/home/myless/Potential_Training/V-Cr-Ti/Test_1/train.py", line 77, in <module>
    train_loader, val_loader, test_loader = MGLDataLoader(
  File "/home/myless/.mambaforge/envs/matgl-gpu/lib/python3.10/site-packages/matgl/graph/data.py", line 78, in MGLDataLoader
    train_loader = GraphDataLoader(
  File "/home/myless/.mambaforge/envs/matgl-gpu/lib/python3.10/site-packages/dgl/dataloading/dataloader.py", line 1451, in __init__
    self.dist_sampler = _create_dist_sampler(
  File "/home/myless/.mambaforge/envs/matgl-gpu/lib/python3.10/site-packages/dgl/dataloading/dataloader.py", line 1281, in _create_dist_sampler
    return DistributedSampler(dataset, **dist_sampler_kwargs)
  File "/home/myless/.mambaforge/envs/matgl-gpu/lib/python3.10/site-packages/torch/utils/data/distributed.py", line 68, in __init__
    num_replicas = dist.get_world_size()
  File "/home/myless/.mambaforge/envs/matgl-gpu/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 1196, in get_world_size
    return _get_group_size(group)
  File "/home/myless/.mambaforge/envs/matgl-gpu/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 576, in _get_group_size
    default_pg = _get_default_group()
  File "/home/myless/.mambaforge/envs/matgl-gpu/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 707, in _get_default_group
    raise RuntimeError(
RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.

I was given the suggestion by a maintainer to change torch/utils/data/distributed.py

def __iter__(self) -> Iterator[T_co]:
        if self.shuffle:
            # deterministically shuffle based on epoch and seed
            g = torch.Generator()
            g.manual_seed(self.seed + self.epoch)
            indices = torch.randperm(len(self.dataset), generator=g).tolist()  # type: ignore[arg-type]
        else:
            indices = list(range(len(self.dataset)))  # type: ignore[arg-type]

To:

def __iter__(self) -> Iterator[T_co]:
        if self.shuffle:
            # deterministically shuffle based on epoch and seed
            #work around added to make matgl work
            if torch.cuda.is_available():
                device = "cuda"
            else:
                device = "cpu"
            g = torch.Generator(device=device)
            g.manual_seed(self.seed + self.epoch)
            indices = torch.randperm(len(self.dataset), generator=g).tolist()  # type: ignore[arg-type]
        else:
            indices = list(range(len(self.dataset)))  # type: ignore[arg-type]

but the same error occurred. I did some digging and I found my error being caused by when the GraphDataLoader is trying to use DDP for the Training Data. It is likely an issue with how the model is being split and sent to the gpus.

There seems to be two bugs or issues here. In Version 0.8.5, I believe there is an issue with the _compute_3body() in matgl/src/matgl/graph/compute.py

    n_atoms = g.num_nodes()
    first_col = g.edges()[0].cpu().numpy().reshape(-1, 1)
    all_indices = np.arange(n_atoms).reshape(1, -1)

Could this perhaps be fixed by moving things onto the gpu after the dataset has been created?

The bug with 0.7.1 seems to be something related to the ddp strategy of loading the dataset onto the multiple gpus for training. This seems more difficult to fix for the matgl team and since it is happening with an older package version, perhaps the bug in 0.8.5 is better to focus on?

I have pasted the code that I was using on matgl==0.7.1 and the output from slurm.

Thank you,
Myles

Code snippet

from __future__ import annotations

import os, json
import shutil
import warnings

import numpy as np
import pytorch_lightning as pl
from dgl.data.utils import split_dataset
from pytorch_lightning.loggers import CSVLogger
from pymatgen.io.vasp.outputs import Vasprun

import matgl
from matgl.ext.pymatgen import Structure2Graph, get_element_list
from matgl.graph.data import M3GNetDataset, MGLDataLoader, collate_fn_efs
from matgl.models import M3GNet
from matgl.utils.training import PotentialLightningModule

# To suppress warnings for clearer output
warnings.simplefilter("ignore")

import torch
torch.set_default_device('cuda')
AVAIL_GPUS = torch.cuda.device_count()

folder_path = './test_xml_data'
#folder_path = './data'
xml_files = [f for f in os.listdir(folder_path) if f.endswith(".xml")]

#initialize empty arrays
structures = []
energies = []
forces = []
stresses = []
errors = []

for xml_file in xml_files:
	xml_file_path = os.path.join(folder_path, xml_file)

	try:
		vrun = Vasprun(xml_file_path)
		#print(f"File: {xml_file} loaded")
		for i in range(len(vrun.ionic_steps)):
			structures.append(vrun.ionic_steps[i]['structure'])
			energies.append(vrun.ionic_steps[i]['e_fr_energy'])
			forces.append(vrun.ionic_steps[i]['forces'])
			stresses.append(vrun.ionic_steps[i]['stress'])
	except Exception as e:
		error_message = f"Error parsing {xml_file}: {str(e)}"
		errors.append(error_message)
		print(error_message)

with open('./bad_vasprun.txt', 'w') as file:
	for error_message in errors:
		file.write(f"{error_message}\n")

labels = {
	"energies": energies,
	"forces": forces,
	"stresses": stresses,
}

print(f"{len(structures)} downloaded from MP.")
#formatted_data = json.dumps(labels, indent=4)

#with open("labels.json","w") as json_file:
	#json_file.write(formatted_data)
element_types = get_element_list(structures)
converter = Structure2Graph(element_types=element_types, cutoff=5.0)
dataset = M3GNetDataset(
	threebody_cutoff=4.0,
	structures=structures,
	converter=converter,
	energies=energies,
	forces=forces,
	stresses=stresses, #changed when downgrading to 0.7.1
	#labels=labels,
)
train_data, val_data, test_data = split_dataset(
	dataset,
	frac_list=[0.8, 0.1, 0.1],
	shuffle=True,
	random_state=42,
)
train_loader, val_loader, test_loader = MGLDataLoader(
	train_data=train_data,
	val_data=val_data,
	test_data=test_data,
	collate_fn=collate_fn_efs,
	batch_size=16,
	num_workers=0,
	use_ddp=True,
	pin_memory=True,
	generator=torch.Generator("cuda"),
)

model = M3GNet(
	element_types=element_types,
	is_intensive=False,
	use_smooth=True
)
lr = 1e-4
lit_module = PotentialLightningModule(model=model,lr=lr)

# If you wish to disable GPU or MPS (M1 mac) training, use the accelerator="cpu" kwarg.
logger = CSVLogger("logs", name="M3GNet_training")
# Inference mode = False is required for calculating forces, stress in test mode and prediction mode
trainer = pl.Trainer(max_epochs=1, accelerator="cuda", devices=4, strategy="ddp", logger=logger, inference_mode=False)
#trainer = pl.Trainer(max_epochs = 1, accelerator='auto', logger=logger, inference_mode=False)
trainer.fit(model=lit_module, train_dataloaders=train_loader, val_dataloaders=val_loader)

trainer.test(dataloaders=test_loader)

model_export_path = './trained_model/mgl.m3g_out'
model.save(model_export_path)

model = matgl.load_model(path = model_export_path)

Log output

Traceback (most recent call last):
  File "/home/myless/Potential_Training/V-Cr-Ti/Test_2/old_train.py", line 85, in <module>
    train_loader, val_loader, test_loader = MGLDataLoader(
  File "/home/myless/.mambaforge/envs/matgl-gpu/lib/python3.10/site-packages/matgl/graph/data.py", line 78, in MGLDataLoader
    train_loader = GraphDataLoader(
  File "/home/myless/.mambaforge/envs/matgl-gpu/lib/python3.10/site-packages/dgl/dataloading/dataloader.py", line 1451, in __init__
    self.dist_sampler = _create_dist_sampler(
  File "/home/myless/.mambaforge/envs/matgl-gpu/lib/python3.10/site-packages/dgl/dataloading/dataloader.py", line 1281, in _create_dist_sampler
Traceback (most recent call last):
  File "/home/myless/Potential_Training/V-Cr-Ti/Test_2/old_train.py", line 85, in <module>
    return DistributedSampler(dataset, **dist_sampler_kwargs)
  File "/home/myless/.mambaforge/envs/matgl-gpu/lib/python3.10/site-packages/torch/utils/data/distributed.py", line 68, in __init__
    train_loader, val_loader, test_loader = MGLDataLoader(
  File "/home/myless/.mambaforge/envs/matgl-gpu/lib/python3.10/site-packages/matgl/graph/data.py", line 78, in MGLDataLoader
    num_replicas = dist.get_world_size()
  File "/home/myless/.mambaforge/envs/matgl-gpu/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 1196, in get_world_size
    train_loader = GraphDataLoader(
  File "/home/myless/.mambaforge/envs/matgl-gpu/lib/python3.10/site-packages/dgl/dataloading/dataloader.py", line 1451, in __init__
    return _get_group_size(group)
  File "/home/myless/.mambaforge/envs/matgl-gpu/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 576, in _get_group_size
    self.dist_sampler = _create_dist_sampler(
  File "/home/myless/.mambaforge/envs/matgl-gpu/lib/python3.10/site-packages/dgl/dataloading/dataloader.py", line 1281, in _create_dist_sampler
    default_pg = _get_default_group()
  File "/home/myless/.mambaforge/envs/matgl-gpu/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 707, in _get_default_group
    return DistributedSampler(dataset, **dist_sampler_kwargs)
  File "/home/myless/.mambaforge/envs/matgl-gpu/lib/python3.10/site-packages/torch/utils/data/distributed.py", line 68, in __init__
    raise RuntimeError(
RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.
    num_replicas = dist.get_world_size()
  File "/home/myless/.mambaforge/envs/matgl-gpu/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 1196, in get_world_size
    return _get_group_size(group)
  File "/home/myless/.mambaforge/envs/matgl-gpu/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 576, in _get_group_size
    default_pg = _get_default_group()
  File "/home/myless/.mambaforge/envs/matgl-gpu/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 707, in _get_default_group
    raise RuntimeError(
RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.
srun: error: gpu-rtx6000-02: task 2: Exited with exit code 1
srun: error: gpu-rtx6000-02: tasks 0-1: Exited with exit code 1

Code of Conduct

I agree to follow this project's Code of Conduct

Material discovery / generation

Hi, does this code have some file for doing material generation, as in the latest paper?
Thanks!

pytorch lightning support

Dear everyone,

First off all thank you for taking the time to convert m3gnet to pytorch. I was wondering if there are any aspirations to build a pytorch lightning module for m3gnet which would trivialize multigpu/ multinode and mixed precision training?

best wishes,
Jonathan

Split of MPF dataset

Hi,

Thanks for your great work! I see you are using seed 42 to split the MPF 2021 dataset. May I ask if this is the same split used in the paper [1]? I didn't find out the information in the paper [1], so just ask in case.

Thank you and looking forward to your reply!

Best Regards

[1] Chen, Chi, and Shyue Ping Ong. "A universal graph deep learning interatomic potential for the periodic table." Nature Computational Science 2.11 (2022): 718-728.

Converting graph information from dgl.Graph back into Structure

Hi,

I am looking to extract the gradient with respect to atom positions from the band gap prediction model "MEGNet-MP-2019.4.1-BandGap-mfi".

By going through the wrapper into MEGNet.predict_structure function I can extract the gradients with respect to the edge attributes. Is there a way to convert this information into gradient with respect to atom positions in the structure (i.e. g.ndata["pos"] in the code below).

Below I provide the unpacked method to access the gradient within the predict_structure() function.
I can do the same within the forward method, but I am unclear which representation of the graph will provide me with the information I require for translation back to the structure.

Thank you for your help.

Code to reproduce:

import matgl
import torch

from mp_api.client import MPRester
from pymatgen.io.ase import AseAtomsAdaptor
from matgl.ext.pymatgen import Structure2Graph
from matgl.graph.compute import compute_pair_vector_and_distance

if __name__ == '__main__':
    # loading arbitrary structure from Materials Project
    with MPRester() as mpr:
        structure = mpr.get_structure_by_material_id("mp-1840", final=True)
    atoms_2 = AseAtomsAdaptor.get_atoms(structure)
   
    # selecting a band gap method
    state_feats = torch.tensor([3])

    band_gap_model_wrapper = matgl.load_model("MEGNet-MP-2019.4.1-BandGap-mfi")

    # unpacking forward method of MEGNet.predict_structure to access gradient information
    graph_converter = Structure2Graph(
        element_types=band_gap_model_wrapper.model.element_types,
        cutoff=band_gap_model_wrapper.model.cutoff,
    )
    g, state_feats_default = graph_converter.get_graph(structure)
    if state_feats is None:
        state_feats = torch.tensor(state_feats_default)
    bond_vec, bond_dist = compute_pair_vector_and_distance(g)
    g.edata["edge_attr"] = band_gap_model_wrapper.model.bond_expansion(bond_dist)

    # adding requires_grad argument edges to access gradients
    g.edata["edge_attr"].requires_grad_()

    model_output = band_gap_model_wrapper.model(g, g.edata["edge_attr"], g.ndata["node_type"], state_feats)

    gradient_wrt_edge_attr = torch.autograd.grad(
        model_output, g.edata["edge_attr"], create_graph=True, retain_graph=True,
    )
    
    model_output = model_output.detach()
    model_output_converted = band_gap_model_wrapper.transformer.inverse_transform(model_output)

    # asserting that unpacked method gives the same answer as the wrapper method provided
    band_gap_value = band_gap_model_wrapper.predict_structure(
        structure=structure,
        state_feats=state_feats
    )

    assert model_output_converted == band_gap_value

Setting datatypes consistently

I don't see a consistent choice of primitive datatypes. My guess is the default choice is intended to be torch.float32 and torch.int32. But different datatypes appear throughout the code, and in particular having some tensors be float64 leads to graph data taking an unwieldy amount of storage space.

For example, when working atom graphs I find the following datatypes in their edge data:

bond_vec torch.float32
lattice torch.float64
pbc_offshift torch.float64
bond_dist torch.float32
pbc_offset torch.float64

And node data:

volume torch.float32
pos torch.float64
node_type torch.int64

I think primitive datatypes should be kept consistent.

Additionally, I was wondering if there are plans to implement a way to easily allow a global configuration of datatypes (similar to the config file in the original m3gent code)?

ModuleNotFoundError: No module named 'matgl.trainer'

matgl/examples/training/MEGNet/MP-2019.4.1-BandGap/train_mp_bandgap.py

Line 21 in d754da8

from matgl.trainer.megnet import MEGNetTrainer

fails because class MEGNetTrainer is defined in examples/trainer_beta/megnet.py which is not included into the matgl package namespace. The class should be moved into the matgl directory.

Reproducing TF M3GNet with matgl

I'm interested in setting up a pytorch M3GNet model with the exact same architecture as MP-2021.
I tried potential = Potential(M3GNet(DEFAULT_ELEMENT_TYPES, is_intensive=False)). I then compared the parameters with the weights from the TF model: potential = Potential(model = M3GNet.load('MP-2021.2.8-EFS'). However, I obtain a different number of weights/parameters and their shapes also do not match exactly.

TF weights

>>> potential = Potential(model = M3GNet.load('MP-2021.2.8-EFS')
>>> for weight in potential.weights:              
...    print(weight.name, '|', weight.shape[::-1])

m3g_net/graph_featurizer/atom_embedding/atom_embedding/embeddings:0 | (64, 95)
m3g_net/graph_update_func/mlp/dense/kernel:0 | (64, 3)
m3g_net/three_d_interaction/mlp_1/dense_1/kernel:0 | (9, 64)
m3g_net/three_d_interaction/mlp_1/dense_1/bias:0 | (9,)
m3g_net/three_d_interaction/gated_mlp/dense_2/kernel:0 | (64, 9)
m3g_net/three_d_interaction/gated_mlp/dense_3/kernel:0 | (64, 9)
m3g_net/three_d_interaction_1/mlp_2/dense_4/kernel:0 | (9, 64)
m3g_net/three_d_interaction_1/mlp_2/dense_4/bias:0 | (9,)
m3g_net/three_d_interaction_1/gated_mlp_1/dense_5/kernel:0 | (64, 9)
m3g_net/three_d_interaction_1/gated_mlp_1/dense_6/kernel:0 | (64, 9)
m3g_net/three_d_interaction_2/mlp_3/dense_7/kernel:0 | (9, 64)
m3g_net/three_d_interaction_2/mlp_3/dense_7/bias:0 | (9,)
m3g_net/three_d_interaction_2/gated_mlp_2/dense_8/kernel:0 | (64, 9)
m3g_net/three_d_interaction_2/gated_mlp_2/dense_9/kernel:0 | (64, 9)
m3g_net/graph_network_layer/concat_atoms/gated_mlp_4/dense_15/kernel:0 | (64, 192)
m3g_net/graph_network_layer/concat_atoms/gated_mlp_4/dense_15/bias:0 | (64,)
m3g_net/graph_network_layer/concat_atoms/gated_mlp_4/dense_16/kernel:0 | (64, 64)
m3g_net/graph_network_layer/concat_atoms/gated_mlp_4/dense_16/bias:0 | (64,)
m3g_net/graph_network_layer/concat_atoms/gated_mlp_4/dense_17/kernel:0 | (64, 192)
m3g_net/graph_network_layer/concat_atoms/gated_mlp_4/dense_17/bias:0 | (64,)
m3g_net/graph_network_layer/concat_atoms/gated_mlp_4/dense_18/kernel:0 | (64, 64)
m3g_net/graph_network_layer/concat_atoms/gated_mlp_4/dense_18/bias:0 | (64,)
m3g_net/graph_network_layer/concat_atoms/dense_19/kernel:0 | (64, 3)
m3g_net/graph_network_layer/gated_atom_update/gated_mlp_3/dense_10/kernel:0 | (64, 192)
m3g_net/graph_network_layer/gated_atom_update/gated_mlp_3/dense_10/bias:0 | (64,)
m3g_net/graph_network_layer/gated_atom_update/gated_mlp_3/dense_11/kernel:0 | (64, 64)
m3g_net/graph_network_layer/gated_atom_update/gated_mlp_3/dense_11/bias:0 | (64,)
m3g_net/graph_network_layer/gated_atom_update/gated_mlp_3/dense_12/kernel:0 | (64, 192)
m3g_net/graph_network_layer/gated_atom_update/gated_mlp_3/dense_12/bias:0 | (64,)
m3g_net/graph_network_layer/gated_atom_update/gated_mlp_3/dense_13/kernel:0 | (64, 64)
m3g_net/graph_network_layer/gated_atom_update/gated_mlp_3/dense_13/bias:0 | (64,)
m3g_net/graph_network_layer/gated_atom_update/dense_14/kernel:0 | (64, 3)
m3g_net/graph_network_layer_1/concat_atoms_1/gated_mlp_6/dense_25/kernel:0 | (64, 192)
m3g_net/graph_network_layer_1/concat_atoms_1/gated_mlp_6/dense_25/bias:0 | (64,)
m3g_net/graph_network_layer_1/concat_atoms_1/gated_mlp_6/dense_26/kernel:0 | (64, 64)
m3g_net/graph_network_layer_1/concat_atoms_1/gated_mlp_6/dense_26/bias:0 | (64,)
m3g_net/graph_network_layer_1/concat_atoms_1/gated_mlp_6/dense_27/kernel:0 | (64, 192)
m3g_net/graph_network_layer_1/concat_atoms_1/gated_mlp_6/dense_27/bias:0 | (64,)
m3g_net/graph_network_layer_1/concat_atoms_1/gated_mlp_6/dense_28/kernel:0 | (64, 64)
m3g_net/graph_network_layer_1/concat_atoms_1/gated_mlp_6/dense_28/bias:0 | (64,)
m3g_net/graph_network_layer_1/concat_atoms_1/dense_29/kernel:0 | (64, 3)
m3g_net/graph_network_layer_1/gated_atom_update_1/gated_mlp_5/dense_20/kernel:0 | (64, 192)
m3g_net/graph_network_layer_1/gated_atom_update_1/gated_mlp_5/dense_20/bias:0 | (64,)
m3g_net/graph_network_layer_1/gated_atom_update_1/gated_mlp_5/dense_21/kernel:0 | (64, 64)
m3g_net/graph_network_layer_1/gated_atom_update_1/gated_mlp_5/dense_21/bias:0 | (64,)
m3g_net/graph_network_layer_1/gated_atom_update_1/gated_mlp_5/dense_22/kernel:0 | (64, 192)
m3g_net/graph_network_layer_1/gated_atom_update_1/gated_mlp_5/dense_22/bias:0 | (64,)
m3g_net/graph_network_layer_1/gated_atom_update_1/gated_mlp_5/dense_23/kernel:0 | (64, 64)
m3g_net/graph_network_layer_1/gated_atom_update_1/gated_mlp_5/dense_23/bias:0 | (64,)
m3g_net/graph_network_layer_1/gated_atom_update_1/dense_24/kernel:0 | (64, 3)
m3g_net/graph_network_layer_2/concat_atoms_2/gated_mlp_8/dense_35/kernel:0 | (64, 192)
m3g_net/graph_network_layer_2/concat_atoms_2/gated_mlp_8/dense_35/bias:0 | (64,)
m3g_net/graph_network_layer_2/concat_atoms_2/gated_mlp_8/dense_36/kernel:0 | (64, 64)
m3g_net/graph_network_layer_2/concat_atoms_2/gated_mlp_8/dense_36/bias:0 | (64,)
m3g_net/graph_network_layer_2/concat_atoms_2/gated_mlp_8/dense_37/kernel:0 | (64, 192)
m3g_net/graph_network_layer_2/concat_atoms_2/gated_mlp_8/dense_37/bias:0 | (64,)
m3g_net/graph_network_layer_2/concat_atoms_2/gated_mlp_8/dense_38/kernel:0 | (64, 64)
m3g_net/graph_network_layer_2/concat_atoms_2/gated_mlp_8/dense_38/bias:0 | (64,)
m3g_net/graph_network_layer_2/concat_atoms_2/dense_39/kernel:0 | (64, 3)
m3g_net/graph_network_layer_2/gated_atom_update_2/gated_mlp_7/dense_30/kernel:0 | (64, 192)
m3g_net/graph_network_layer_2/gated_atom_update_2/gated_mlp_7/dense_30/bias:0 | (64,)
m3g_net/graph_network_layer_2/gated_atom_update_2/gated_mlp_7/dense_31/kernel:0 | (64, 64)
m3g_net/graph_network_layer_2/gated_atom_update_2/gated_mlp_7/dense_31/bias:0 | (64,)
m3g_net/graph_network_layer_2/gated_atom_update_2/gated_mlp_7/dense_32/kernel:0 | (64, 192)
m3g_net/graph_network_layer_2/gated_atom_update_2/gated_mlp_7/dense_32/bias:0 | (64,)
m3g_net/graph_network_layer_2/gated_atom_update_2/gated_mlp_7/dense_33/kernel:0 | (64, 64)
m3g_net/graph_network_layer_2/gated_atom_update_2/gated_mlp_7/dense_33/bias:0 | (64,)
m3g_net/graph_network_layer_2/gated_atom_update_2/dense_34/kernel:0 | (64, 3)
m3g_net/pipe_24/graph_network_layer_3/graph_update_func_1/gated_mlp_9/dense_40/kernel:0 | (64, 64)
m3g_net/pipe_24/graph_network_layer_3/graph_update_func_1/gated_mlp_9/dense_40/bias:0 | (64,)
m3g_net/pipe_24/graph_network_layer_3/graph_update_func_1/gated_mlp_9/dense_41/kernel:0 | (64, 64)
m3g_net/pipe_24/graph_network_layer_3/graph_update_func_1/gated_mlp_9/dense_41/bias:0 | (64,)
m3g_net/pipe_24/graph_network_layer_3/graph_update_func_1/gated_mlp_9/dense_42/kernel:0 | (1, 64)
m3g_net/pipe_24/graph_network_layer_3/graph_update_func_1/gated_mlp_9/dense_42/bias:0 | (1,)
m3g_net/pipe_24/graph_network_layer_3/graph_update_func_1/gated_mlp_9/dense_43/kernel:0 | (64, 64)
m3g_net/pipe_24/graph_network_layer_3/graph_update_func_1/gated_mlp_9/dense_43/bias:0 | (64,)
m3g_net/pipe_24/graph_network_layer_3/graph_update_func_1/gated_mlp_9/dense_44/kernel:0 | (64, 64)
m3g_net/pipe_24/graph_network_layer_3/graph_update_func_1/gated_mlp_9/dense_44/bias:0 | (64,)
m3g_net/pipe_24/graph_network_layer_3/graph_update_func_1/gated_mlp_9/dense_45/kernel:0 | (1, 64)
m3g_net/pipe_24/graph_network_layer_3/graph_update_func_1/gated_mlp_9/dense_45/bias:0 | (1,)

matgl parameters

>>> potential = Potential(M3GNet(DEFAULT_ELEMENT_TYPES, is_intensive=False))
>>> for parameter in potential.named_parameters():
...    print(parameter[0], '|', tuple(parameter[1].shape))

model.embedding.layer_node_embedding.weight | (89, 64)
model.embedding.layer_edge_embedding.layers.0.weight | (64, 9)
model.embedding.layer_edge_embedding.layers.0.bias | (64,)
model.three_body_interactions.0.update_network_atom.layers.0.weight | (9, 64)
model.three_body_interactions.0.update_network_atom.layers.0.bias | (9,)
model.three_body_interactions.0.update_network_bond.layers.0.weight | (64, 9)
model.three_body_interactions.0.update_network_bond.gates.0.weight | (64, 9)
model.three_body_interactions.1.update_network_atom.layers.0.weight | (9, 64)
model.three_body_interactions.1.update_network_atom.layers.0.bias | (9,)
model.three_body_interactions.1.update_network_bond.layers.0.weight | (64, 9)
model.three_body_interactions.1.update_network_bond.gates.0.weight | (64, 9)
model.three_body_interactions.2.update_network_atom.layers.0.weight | (9, 64)
model.three_body_interactions.2.update_network_atom.layers.0.bias | (9,)
model.three_body_interactions.2.update_network_bond.layers.0.weight | (64, 9)
model.three_body_interactions.2.update_network_bond.gates.0.weight | (64, 9)
model.graph_layers.0.conv.edge_update_func.layers.0.weight | (64, 192)
model.graph_layers.0.conv.edge_update_func.layers.0.bias | (64,)
model.graph_layers.0.conv.edge_update_func.layers.2.weight | (64, 64)
model.graph_layers.0.conv.edge_update_func.layers.2.bias | (64,)
model.graph_layers.0.conv.edge_update_func.layers.4.weight | (64, 64)
model.graph_layers.0.conv.edge_update_func.layers.4.bias | (64,)
model.graph_layers.0.conv.edge_update_func.gates.0.weight | (64, 192)
model.graph_layers.0.conv.edge_update_func.gates.0.bias | (64,)
model.graph_layers.0.conv.edge_update_func.gates.2.weight | (64, 64)
model.graph_layers.0.conv.edge_update_func.gates.2.bias | (64,)
model.graph_layers.0.conv.edge_update_func.gates.4.weight | (64, 64)
model.graph_layers.0.conv.edge_update_func.gates.4.bias | (64,)
model.graph_layers.0.conv.edge_weight_func.weight | (64, 9)
model.graph_layers.0.conv.node_update_func.layers.0.weight | (64, 192)
model.graph_layers.0.conv.node_update_func.layers.0.bias | (64,)
model.graph_layers.0.conv.node_update_func.layers.2.weight | (64, 64)
model.graph_layers.0.conv.node_update_func.layers.2.bias | (64,)
model.graph_layers.0.conv.node_update_func.layers.4.weight | (64, 64)
model.graph_layers.0.conv.node_update_func.layers.4.bias | (64,)
model.graph_layers.0.conv.node_update_func.gates.0.weight | (64, 192)
model.graph_layers.0.conv.node_update_func.gates.0.bias | (64,)
model.graph_layers.0.conv.node_update_func.gates.2.weight | (64, 64)
model.graph_layers.0.conv.node_update_func.gates.2.bias | (64,)
model.graph_layers.0.conv.node_update_func.gates.4.weight | (64, 64)
model.graph_layers.0.conv.node_update_func.gates.4.bias | (64,)
model.graph_layers.0.conv.node_weight_func.weight | (64, 9)
model.graph_layers.1.conv.edge_update_func.layers.0.weight | (64, 192)
model.graph_layers.1.conv.edge_update_func.layers.0.bias | (64,)
model.graph_layers.1.conv.edge_update_func.layers.2.weight | (64, 64)
model.graph_layers.1.conv.edge_update_func.layers.2.bias | (64,)
model.graph_layers.1.conv.edge_update_func.layers.4.weight | (64, 64)
model.graph_layers.1.conv.edge_update_func.layers.4.bias | (64,)
model.graph_layers.1.conv.edge_update_func.gates.0.weight | (64, 192)
model.graph_layers.1.conv.edge_update_func.gates.0.bias | (64,)
model.graph_layers.1.conv.edge_update_func.gates.2.weight | (64, 64)
model.graph_layers.1.conv.edge_update_func.gates.2.bias | (64,)
model.graph_layers.1.conv.edge_update_func.gates.4.weight | (64, 64)
model.graph_layers.1.conv.edge_update_func.gates.4.bias | (64,)
model.graph_layers.1.conv.edge_weight_func.weight | (64, 9)
model.graph_layers.1.conv.node_update_func.layers.0.weight | (64, 192)
model.graph_layers.1.conv.node_update_func.layers.0.bias | (64,)
model.graph_layers.1.conv.node_update_func.layers.2.weight | (64, 64)
model.graph_layers.1.conv.node_update_func.layers.2.bias | (64,)
model.graph_layers.1.conv.node_update_func.layers.4.weight | (64, 64)
model.graph_layers.1.conv.node_update_func.layers.4.bias | (64,)
model.graph_layers.1.conv.node_update_func.gates.0.weight | (64, 192)
model.graph_layers.1.conv.node_update_func.gates.0.bias | (64,)
model.graph_layers.1.conv.node_update_func.gates.2.weight | (64, 64)
model.graph_layers.1.conv.node_update_func.gates.2.bias | (64,)
model.graph_layers.1.conv.node_update_func.gates.4.weight | (64, 64)
model.graph_layers.1.conv.node_update_func.gates.4.bias | (64,)
model.graph_layers.1.conv.node_weight_func.weight | (64, 9)
model.graph_layers.2.conv.edge_update_func.layers.0.weight | (64, 192)
model.graph_layers.2.conv.edge_update_func.layers.0.bias | (64,)
model.graph_layers.2.conv.edge_update_func.layers.2.weight | (64, 64)
model.graph_layers.2.conv.edge_update_func.layers.2.bias | (64,)
model.graph_layers.2.conv.edge_update_func.layers.4.weight | (64, 64)
model.graph_layers.2.conv.edge_update_func.layers.4.bias | (64,)
model.graph_layers.2.conv.edge_update_func.gates.0.weight | (64, 192)
model.graph_layers.2.conv.edge_update_func.gates.0.bias | (64,)
model.graph_layers.2.conv.edge_update_func.gates.2.weight | (64, 64)
model.graph_layers.2.conv.edge_update_func.gates.2.bias | (64,)
model.graph_layers.2.conv.edge_update_func.gates.4.weight | (64, 64)
model.graph_layers.2.conv.edge_update_func.gates.4.bias | (64,)
model.graph_layers.2.conv.edge_weight_func.weight | (64, 9)
model.graph_layers.2.conv.node_update_func.layers.0.weight | (64, 192)
model.graph_layers.2.conv.node_update_func.layers.0.bias | (64,)
model.graph_layers.2.conv.node_update_func.layers.2.weight | (64, 64)
model.graph_layers.2.conv.node_update_func.layers.2.bias | (64,)
model.graph_layers.2.conv.node_update_func.layers.4.weight | (64, 64)
model.graph_layers.2.conv.node_update_func.layers.4.bias | (64,)
model.graph_layers.2.conv.node_update_func.gates.0.weight | (64, 192)
model.graph_layers.2.conv.node_update_func.gates.0.bias | (64,)
model.graph_layers.2.conv.node_update_func.gates.2.weight | (64, 64)
model.graph_layers.2.conv.node_update_func.gates.2.bias | (64,)
model.graph_layers.2.conv.node_update_func.gates.4.weight | (64, 64)
model.graph_layers.2.conv.node_update_func.gates.4.bias | (64,)
model.graph_layers.2.conv.node_weight_func.weight | (64, 9)
model.final_layer.gated.layers.0.weight | (64, 64)
model.final_layer.gated.layers.0.bias | (64,)
model.final_layer.gated.layers.2.weight | (64, 64)
model.final_layer.gated.layers.2.bias | (64,)
model.final_layer.gated.layers.4.weight | (64, 64)
model.final_layer.gated.layers.4.bias | (64,)
model.final_layer.gated.layers.6.weight | (1, 64)
model.final_layer.gated.layers.6.bias | (1,)
model.final_layer.gated.gates.0.weight | (64, 64)
model.final_layer.gated.gates.0.bias | (64,)
model.final_layer.gated.gates.2.weight | (64, 64)
model.final_layer.gated.gates.2.bias | (64,)
model.final_layer.gated.gates.4.weight | (64, 64)
model.final_layer.gated.gates.4.bias | (64,)
model.final_layer.gated.gates.6.weight | (1, 64)
model.final_layer.gated.gates.6.bias | (1,)

Is there anything I can do to obtain the same architecture?

[Bug]: Release workflow broken

Email (Optional)

No response

Version

v0.8.6

Which OS(es) are you using?

MacOS
Windows
Linux

What happened?

This doesn't look right. Skips the release step even when it shouldn't. You probably wan't if: github.event_name == 'release' && needs.tests.result == 'success'.

matgl/.github/workflows/testing.yml

Line 51 in d9f2665

if: github.event.name == 'release'

Hence https://github.com/materialsvirtuallab/matgl/releases/tag/v0.8.6 didn't make it to PyPI.

Code snippet

No response

Log output

No response

Code of Conduct

I agree to follow this project's Code of Conduct

Request for a small minimal working example for training a potential in m3gnet

Hi, thanks for the repository -

I was wondering if there was any example yet of how training could be performed given a list of structures, forces and energies, using a trainer object as per the original m3gnet repository? Or is this feature still in development?

Thanks in advance, Han.

Bad coding in M3GNetDataset.load

https://github.com/materialsvirtuallab/matgl/blob/000ff489cffff957ed09a8207286c2fc7993d429/matgl/graph/data.py#L328C11-L328C11

I am puzzled why this code assumes there is a labels.json file somewhere, when the other three files have explicit file names given. And in the preceding save method, the labels.json is written using a file.write method. This is not how proper json is written. JSON is written using the json.dump method to ensure that it is proper json format.

Also, there is no unittest for the labels.json loading.

Periodic Boundaries not recognized during molecular dynamics simulation

I would like to use MatGL on a system with periodic boundary conditions, but have been unable to figure out how to do this. The issue is that even when periodic boundaries are specified in the system definitions, MatGL doesn't seem to recognize them.

To illustrate the issue, here are two python scripts that simulate 4 atoms of copper at 2400K. One uses the standard ASE package (with the EMT potential); one uses MatGL.

(1) ASE script: (produces a .traj file)

from asap3 import EMT  
from ase import units
from ase.io.trajectory import Trajectory
from ase.lattice.cubic import FaceCenteredCubic
from ase.md.langevin import Langevin
from pymatgen.core import Lattice, Structure
from pymatgen.io.ase import AseAtomsAdaptor
import warnings
warnings.simplefilter("ignore")

# Define the lattice geometry and atom coordinates
lattice = Lattice.cubic(3.61, (True, True, True))       # lattice constant in units of Angstrom ; (True, True, True) sets periodic boundaries
fractcoords = [[0, 0, 0], [0, 0.5, 0.5], [0.5, 0, 0.5], [0.5, 0.5, 0]]
struct = Structure(lattice, ["Cu", "Cu", "Cu", "Cu"], fractcoords)

ase_adaptor = AseAtomsAdaptor()
atoms = ase_adaptor.get_atoms(struct)

# Describe the interatomic interactions with the Effective Medium Theory
atoms.calc = EMT()

T = 2400  # Kelvin
dyn = Langevin(atoms, 5 * units.fs, T * units.kB, 0.002)

def printenergy(a=atoms):  # store a reference to atoms in the definition.
    """Function to print the potential, kinetic and total energy."""
    epot = a.get_potential_energy() / len(a)
    ekin = a.get_kinetic_energy() / len(a)
    print('Energy per atom: Epot = %.3feV  Ekin = %.3feV (T=%3.0fK)  '
          'Etot = %.3feV' % (epot, ekin, ekin / (1.5 * units.kB), epot + ekin))

dyn.attach(printenergy, interval=50)
traj = Trajectory('Cu_ASE.traj', 'w', atoms)
dyn.attach(traj.write, interval=50)
# Now run the dynamics
printenergy()
dyn.run(2000)

(2) MatGL script: (produces .traj and .log files)

from __future__ import annotations
import warnings
from ase.md.velocitydistribution import MaxwellBoltzmannDistribution
from pymatgen.core import Lattice, Structure
from pymatgen.io.ase import AseAtomsAdaptor
import matgl
from matgl.ext.ase import M3GNetCalculator, MolecularDynamics, Relaxer

warnings.simplefilter("ignore")
pot = matgl.load_model("M3GNet-MP-2021.2.8-PES")


# Define the lattice geometry and atom coordinates
lattice = Lattice.cubic(3.61, (True, True, True))       # lattice constant in units of Angstrom ; (True, True, True) sets periodic boundaries
fractcoords = [[0, 0, 0], [0, 0.5, 0.5], [0.5, 0, 0.5], [0.5, 0.5, 0]]
struct = Structure(lattice, ["Cu", "Cu", "Cu", "Cu"], fractcoords)

# Prepare atoms for molecular dynamics
ase_adaptor = AseAtomsAdaptor()
atoms = ase_adaptor.get_atoms(struct)

# Initiate temperature distribution
MaxwellBoltzmannDistribution(atoms, temperature_K=2400)

# Define molecular dynamics settings
driver = MolecularDynamics(
        atoms,
        potential=pot,                # uses the M3GNet interatomic potential
        temperature=2400,
        timestep=1, # 1fs,
        logfile="Cu_MatGL.log",
        trajectory="Cu_MatGL.traj",       # save trajectory
        loginterval=50,               # interval for recording the log
        ensemble='nvt'                # NVT ensemble
)

# Run molecular dynamics
driver.run(2000)

Also, to convert one of the .traj files to human-readable format, here is a little script: (produces .xyz trajectory file)

import ase
from ase.io import read, write
from ase.io.trajectory import Trajectory

traj = Trajectory("Cu_MatGL.traj")
#traj = Trajectory("Cu_ASE.traj")

atoms=traj[:]

writeme = ase.io.write("Cu_MatGL.xyz", atoms, "xyz")
#writeme = ase.io.write("Cu_ASE.xyz", atoms, "xyz")

Looking at the .xyz files, you'll notice that in the basic ASE simulation, none of the atom positions are larger than the specified lattice constant (3.61 Angstroms). This is as expected with periodic boundary conditions. However, in the MatGL simulation, the atom positions readily exceed the lattice constant value, and the atoms appear to drift through space.

For example, here are excerpts of the last recorded frame from the .xyz trajectory files:
(1) ASE trajectory excerpt

4

Cu      0.344162968499945     -0.148546966782993     -0.009870944387788
Cu     -0.177349810618174      1.722229170919916      1.765656855202022
Cu      1.736425681826978      0.345596431729686      1.793462024164234
Cu      1.727657879060445      1.651865872484211      0.104391311992427

(2) MatGL trajectory excerpt

4

Cu     10.754732275176222      1.779757546738823      7.457469460754528
Cu     10.395401224745564      3.745542174112556      9.486116801335781
Cu     12.303262940663885      1.970363284378915      9.334171650233468
Cu     12.388181989458412      3.671734565914415      7.549590636835549

Do you have any recommendations for how to resolve this issue? How should one go about implementing periodic boundary conditions in a molecular dynamics simulation that leverages MatGL? Please advise, thank you.

Issues with training M3GNet potential on GPUs.

Email (Optional)

No response

Version

v0.8.5 and v0.7.1

Which OS(es) are you using?

MacOS
Windows
Linux

What happened?

Dear developers,

I'm trying to train a M3GNet potential using the same code in the tutorial (https://matgl.ai/tutorials%2FTraining%20a%20M3GNet%20Potential%20with%20PyTorch%20Lightning.html).

Training the potential on a CPU went smoothly without any issues. However, when I switched to a GPU node for training, I ran into several errors.

I made the following adjustments to the code to enable training on a GPU node.

trainer = pl.Trainer(max_epochs=1, accelerator="gpu", devices=[0], logger=logger, inference_mode=False)
trainer.fit(model=lit_module_finetune, train_dataloaders=train_loader, val_dataloaders=val_loader)

Then the following error occurs,

I also tried to set the default device to one specific gpu, but I encountered another error:

Do you have any suggestions on fixing these errors ? Thanks in advance.

Code snippet

No response

Log output

No response

Code of Conduct

I agree to follow this project's Code of Conduct

Supporting TorchScript?

Hi,

thank you for the great work. Do you plan to provide TorchScript support in the future?

Best regards!

[Feature Request]: Relaxation under pressure

Email (Optional)

No response

Problem

Dear developers,
Considering Molecular Dynamics with Matgl supports pressure, is it possible to add external pressure to Relaxer? It would be of great benefit for studying systems in pressure ranges.

Proposed Solution

add an option to relaxer for external pressure

Alternatives

VASP supports setting PSTRESS in INCAR

Code of Conduct

I agree to follow this project's Code of Conduct

[Feature Request]: allow distinguishing atoms of the same symbol

Email (Optional)

No response

Problem

Suppose we have a situation where we want to predict an atom-specific property. For example, the spectrum of a single absorbing site on a material. It would be nice to be able to label that atom as distinct.

For example, given material with symmetrically unique elements Ti, Ti, O, O, O, O, perhaps we would like to distinguish the first Ti as "absorber" or "special" in some way. Thus it does not get the standard Ti label, it gets index len(element_types) + 1 or something like this.

Proposed Solution

In #164, I've allowed for an extra index as a "catch all" for atoms not specified in the elements_list. I'd like to add an option for yet another index, a "special" atom. In other words, the following two materials would produce the same atom label featurization:

Ti, Ti, O, O, O, O with atom index 0 indicated as "special"
Mg, Ti, O, O, O, O with the standard use of the code

I'm happy to do the coding on this.

Alternatives

No response

Code of Conduct

I agree to follow this project's Code of Conduct

Running MD with the pre-trained model in GPU

Thanks for providing such an wonderful work.
I am trying to use pre-trained model "M3GNet-MP-2021.2.8-PES" to run MD simulation.
I wanted to run it on GPU. I placed the model to GPU but I got conflicting device issue.
Could you help me resolve it?

[Bug]: Error in prediction using finetuning model

Email (Optional)

[email protected]

Version

0.7.1

Which OS(es) are you using?

MacOS
Windows
Linux

What happened?

I've Finetuning a pre-trained M3GNet, and It looks good, converges quickly, and has good accuracy，I have already uploaded the result image.
I used this model to predict, but the results were quite unexpected. The result image has been attached.
I don't know where the problem lies, because I have already trained myself from scratch many times and the process is exactly the same without any problems. The only difference now is that the pre trained model is loaded before training during finetuning. I have attached some of the code. The rest of the training is exactly the same as before.

Code snippet

Train：
    model = matgl.load_model('M3GNet-MP-2021.2.8-PES')
    lit_model = PotentialLightningModule(model=model.model, lr=0.00005, force_weight=1)

    trainer = pl.Trainer(max_epochs=10, accelerator='cuda', devices=1, precision='32')
    trainer.fit(model=lit_model, train_dataloaders=train_loader, val_dataloaders=val_loader)
    model_export_path = "./trained_model/"
    model.save(model_export_path)

Prediction：
if __name__ == '__main__':
    DB = Get_db()
    strus,e,f,sr = DB.get_stru_energy_forces_stress("vasprun.xml")
    print(f"{len(strus)} structures found !!! .")
    model =  matgl.load_model("../train/trained_model").model
    pre_e = e
    plt.scatter(range(1, len(strus)+1), pre_e, c='r')
    plt.scatter(range(1, len(strus)+1), energy, c='b')
    plt.savefig('res.png')
    plt.show()

Log output

No response

Code of Conduct

I agree to follow this project's Code of Conduct

Structure relaxation

Hello,

Thank you for making the code accessible. I want to inform with M3GNET i was able to relax the structure but when I switch to MATGL module the structure is not relaxing rather displaying large force.

I am testing this potential on HfNbTaTiZr High entropy alloys to study the core structure of screw dislocation in BCC structure. With M3GNET the relaxation was with 200 steps but with MATGL it takes 1000 steps but the forces on the structure keeps on increasing and then the terminal hangs on. Is the module on testing stage yet?

Thanks

Faild to train useing GPU with M3gnet model

I am able to train normally using the CPU. When I use GPU, it keeps failing. I don't know where the problem is, so I hope someone can help me. I would be very grateful。
My script is as follows:

from _future_ import annotations

import os
import shutil

import numpy as np
import pytorch_lightning as pl
from dgl.data.utils import split_dataset
from pymatgen.core import Structure
from matgl.ext.pymatgen import Structure2Graph, get_element_list
from matgl.graph.data import M3GNetDataset, MEGNetDataset, MGLDataLoader, collate_fn, collate_fn_efs
from matgl.models import M3GNet, MEGNet
from matgl.utils.training import ModelLightningModule, PotentialLightningModule
import torch

if _name_ == '_main_':
stru0 = Structure.from_file("../../strus/0/POSCAR")
stru1 = Structure.from_file("../../strus/1/POSCAR")
structures = [stru0, stru1] * 10
energies = np.zeros(len(structures))
forces = [np.zeros((len(s), 3)).tolist() for s in structures]
stresses = [np.zeros((3, 3)).tolist()] * len(structures)
element_types = get_element_list([stru0, stru1])
converter = Structure2Graph(element_types=element_types, cutoff=5.0)
dataset = M3GNetDataset(
threebody_cutoff=4.0,
structures=structures,
converter=converter,
energies=energies,
forces=forces,
stresses=stresses,
)
train_data, val_data, test_data = split_dataset(
dataset,
frac_list=[0.8, 0.1, 0.1],
shuffle=True,
random_state=42,
)
train_loader, val_loader, test_loader = MGLDataLoader(
train_data=train_data,
val_data=val_data,
test_data=test_data,
collate_fn=collate_fn_efs,
batch_size=32,
num_workers=8,
)
model = M3GNet(
element_types=element_types,
is_intensive=False,
)
lit_model = PotentialLightningModule(model=model)
torch.set_default_device('cuda')
torch.multiprocessing.set_start_method('spawn', force=True)
trainer = pl.Trainer(max_epochs=10)
trainer.fit(model=lit_model, train_dataloaders=train_loader, val_dataloaders=val_loader)

The error message I received is as follows:

Traceback (most recent call last):
File "/home/lycui/test/mgl/test/train.py", line 59, in
trainer.fit(model=lit_model, train_dataloaders=train_loader, val_dataloaders=val_loader)
File "/home/lycui/anaconda3/envs/mgl/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 531, in fit
call._call_and_handle_interrupt(
File "/home/lycui/anaconda3/envs/mgl/lib/python3.9/site-packages/pytorch_lightning/trainer/call.py", line 42, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/home/lycui/anaconda3/envs/mgl/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 570, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/home/lycui/anaconda3/envs/mgl/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 975, in _run
results = self._run_stage()
File "/home/lycui/anaconda3/envs/mgl/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1016, in _run_stage
self._run_sanity_check()
File "/home/lycui/anaconda3/envs/mgl/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1045, in _run_sanity_check
val_loop.run()
File "/home/lycui/anaconda3/envs/mgl/lib/python3.9/site-packages/pytorch_lightning/loops/utilities.py", line 177, in _decorator
return loop_run(self, *args, **kwargs)
File "/home/lycui/anaconda3/envs/mgl/lib/python3.9/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 115, in run
self._evaluation_step(batch, batch_idx, dataloader_idx)
File "/home/lycui/anaconda3/envs/mgl/lib/python3.9/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 375, in _evaluation_step
output = call._call_strategy_hook(trainer, hook_name, *step_kwargs.values())
File "/home/lycui/anaconda3/envs/mgl/lib/python3.9/site-packages/pytorch_lightning/trainer/call.py", line 287, in _call_strategy_hook
output = fn(*args, **kwargs)
File "/home/lycui/anaconda3/envs/mgl/lib/python3.9/site-packages/pytorch_lightning/strategies/strategy.py", line 379, in validation_step
return self.model.validation_step(*args, **kwargs)
File "/home/lycui/anaconda3/envs/mgl/lib/python3.9/site-packages/matgl/utils/training.py", line 59, in validation_step
results, batch_size = self.step(batch) # type: ignore
File "/home/lycui/anaconda3/envs/mgl/lib/python3.9/site-packages/matgl/utils/training.py", line 329, in step
e, f, s, _ = self(g=g, state_attr=state_attr, l_g=l_g)
File "/home/lycui/anaconda3/envs/mgl/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/lycui/anaconda3/envs/mgl/lib/python3.9/site-packages/matgl/utils/training.py", line 317, in forward
e, f, s, h = self.model(g=g, l_g=l_g, state_attr=state_attr)
File "/home/lycui/anaconda3/envs/mgl/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/lycui/anaconda3/envs/mgl/lib/python3.9/site-packages/matgl/apps/pes.py", line 75, in forward
total_energies = self.data_std * self.model(g=g, state_attr=state_attr, l_g=l_g) + self.data_mean
File "/home/lycui/anaconda3/envs/mgl/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/lycui/anaconda3/envs/mgl/lib/python3.9/site-packages/matgl/models/_m3gnet.py", line 227, in forward
expanded_dists = self.bond_expansion(g.edata["bond_dist"])
File "/home/lycui/anaconda3/envs/mgl/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/lycui/anaconda3/envs/mgl/lib/python3.9/site-packages/matgl/layers/_bond.py", line 65, in forward
bond_basis = self.rbf(bond_dist)
File "/home/lycui/anaconda3/envs/mgl/lib/python3.9/site-packages/matgl/layers/_basis.py", line 104, in call
return self._call_sbf(r)
File "/home/lycui/anaconda3/envs/mgl/lib/python3.9/site-packages/matgl/layers/_basis.py", line 120, in _call_sbf
func(r[:, None] * root[None, :] / self.cutoff) * factor / torch.abs(func_add1(root[None, :]))
File "/home/lycui/anaconda3/envs/mgl/lib/python3.9/site-packages/torch/utils/_device.py", line 62, in torch_function
return func(*args, **kwargs)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

Pretrained MEGNet Model for QM9 Dataset

Could you please add the pretrained megnet model for qm9 data set under the pretrained_models directory?
Only the MP pretrained models are present under the pretrained-models.

Are discrepancies between old m3gnet repo expected?

I'm comparing results between the pretrained m3gnet in this repo and in the original m3gnet repo, and for some of my structures I am finding pretty large discrepancies. Is this expected? For example, for this structure:

Full Formula (Ti2 Nb3)
Reduced Formula: Ti2Nb3
abc : 2.862214 2.862214 11.801210
angles: 85.362850 94.637150 70.528779
pbc : True True True
Sites (5)
# SP a b c
--- ---- --- --- ---
0 Ti 0 0 0
1 Ti 0.6 0.4 0.2
2 Nb 0.2 0.8 0.4
3 Nb 0.8 0.2 0.6
4 Nb 0.4 0.6 0.8

I get a difference of 40 meV/atom in expected energy, and different atomic positions as well.

Placeholder variable for state features in MEGNet for crystals

Hi, I am using the MEGNet model for band gap prediction from crystal structures (MEGNet-MP-2019.4.1-BandGap-mfi), and I am trying to understand the purpose of using two zeros as a placeholder for the global state feature and the way it is processed. Running the following command

mgl predict -m MEGNet-MP-2019.4.1-BandGap-mfi --infile crys.cif

with default arguments, I get an error in line

matgl/matgl/layers/_graph_convolution.py

Line 62 in 4961e1c

inputs = torch.hstack([vi, vj, eij, u])

File "/network/scratch/p/prashant.govindarajan/crystal_design_project/crystal-design/crystal_design/matgl/matgl/layers/_graph_convolution.py", line 62, in _edge_udf inputs = torch.hstack([vi, vj, eij, u]) RuntimeError: Tensors must have same number of dimensions: got 2 and 3

The state feature of size [2,] passes through the embedding layer to get an output of shape [2,16]. Because of this, the state feature becomes 3 dimensions once broadcasted across nodes in the graph. This causes dimension mismatch error while concatenating features in the edge update function in the graph convolution layer. So does the placeholder zero tensor that represents the state feature need to pass through the embedding layer i.e., this line?

matgl/matgl/layers/_embedding.py

Line 78 in 4961e1c

state_feat = self.layer_state_embedding(state_attr)

If yes, then how to resolve the dimension error in this case? Thank you in advance!

[Bug]: Error in forward pass to compute stress

Email (Optional)

[email protected]

Version

0.8.5

Which OS(es) are you using?

MacOS
Windows
Linux

What happened?

I'm attempting to run the code in the snippet below to compute the stress on a siliceous CHA unit cell with the ASE M3GNetCalculator (CIF file available here, but I think these issues arise with any ASE Atoms object). I did two things that seemed to resolve the issue:

changed on line 61 in matgl/layers/_three_body.py to weights = three_cutoff[torch.stack(list(line_graph.edges()), dim=1).long()].view(-1, 2)
changed line 64 in matgl/layers/_atom_ref.py to one_hot = torch.eye(num_elements)[g.ndata["node_type"].long()].

Thanks in advance for your help! Looks like there were just a few spots where the tensors needed to be changed to .long().

EDIT: I made different changes to the code than I had in my original bug report, apologies for the mistake.

Code snippet

import matgl
from matgl.ext.ase import M3GNetCalculator
potential = matgl.load_model("M3GNet-MP-2021.2.8-PES")
calculator = M3GNetCalculator(potential=potential)
atoms.calc = calculator
atoms.get_stress()

Log output

# === First Error ===

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[1], line 13
     11 calculator = M3GNetCalculator(potential=potential)
     12 atoms.calc = calculator
---> 13 atoms.get_stress()

File ~/mambaforge/envs/htvs/lib/python3.9/site-packages/ase/atoms.py:820, in Atoms.get_stress(self, voigt, apply_constraint, include_ideal_gas)
    817 if self._calc is None:
    818     raise RuntimeError('Atoms object has no calculator.')
--> 820 stress = self._calc.get_stress(self)
    821 shape = stress.shape
    823 if shape == (3, 3):
    824     # Convert to the Voigt form before possibly applying
    825     # constraints and adding the dynamic part of the stress
    826     # (the "ideal gas contribution").

File ~/mambaforge/envs/htvs/lib/python3.9/site-packages/ase/calculators/abc.py:26, in GetPropertiesMixin.get_stress(self, atoms)
     25 def get_stress(self, atoms=None):
---> 26     return self.get_property('stress', atoms)

File ~/mambaforge/envs/htvs/lib/python3.9/site-packages/ase/calculators/calculator.py:737, in Calculator.get_property(self, name, atoms, allow_calculation)
    735     if not allow_calculation:
    736         return None
--> 737     self.calculate(atoms, [name], system_changes)
    739 if name not in self.results:
    740     # For some reason the calculator was not able to do what we want,
    741     # and that is OK.
    742     raise PropertyNotImplementedError('{} not present in this '
    743                                       'calculation'.format(name))

File ~/mambaforge/envs/htvs/lib/python3.9/site-packages/matgl/ext/ase.py:178, in M3GNetCalculator.calculate(self, atoms, properties, system_changes)
    176     energies, forces, stresses, hessians = self.potential(graph, self.state_attr)
    177 else:
--> 178     energies, forces, stresses, hessians = self.potential(graph, state_attr_default)
    179 self.results.update(
    180     energy=energies.detach().numpy(),
    181     free_energy=energies.detach().numpy(),
    182     forces=forces.detach().numpy(),
    183 )
    184 if self.compute_stress:

File ~/mambaforge/envs/htvs/lib/python3.9/site-packages/torch/nn/modules/module.py:1194, in Module._call_impl(self, *input, **kwargs)
   1190 # If we don't have any hooks, we want to skip the rest of the logic in
   1191 # this function, and just call forward.
   1192 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1193         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1194     return forward_call(*input, **kwargs)
   1195 # Do not call functions when jit is used
   1196 full_backward_hooks, non_full_backward_hooks = [], []

File ~/mambaforge/envs/htvs/lib/python3.9/site-packages/matgl/apps/pes.py:76, in Potential.forward(self, g, state_attr, l_g)
     73 if self.calc_forces:
     74     g.ndata["pos"].requires_grad_(True)
---> 76 predictions = self.model(g, state_attr, l_g)
     77 if isinstance(predictions, tuple) and len(predictions) > 1:
     78     total_energies, site_wise = predictions

File ~/mambaforge/envs/htvs/lib/python3.9/site-packages/torch/nn/modules/module.py:1194, in Module._call_impl(self, *input, **kwargs)
   1190 # If we don't have any hooks, we want to skip the rest of the logic in
   1191 # this function, and just call forward.
   1192 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1193         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1194     return forward_call(*input, **kwargs)
   1195 # Do not call functions when jit is used
   1196 full_backward_hooks, non_full_backward_hooks = [], []

File ~/mambaforge/envs/htvs/lib/python3.9/site-packages/matgl/models/_m3gnet.py:252, in M3GNet.forward(self, g, state_attr, l_g)
    250 node_feat, edge_feat, state_feat = self.embedding(node_types, g.edata["rbf"], state_attr)
    251 for i in range(self.n_blocks):
--> 252     edge_feat = self.three_body_interactions[i](
    253         g,
    254         l_g,
    255         three_body_basis,
    256         three_body_cutoff,
    257         node_feat,
    258         edge_feat,
    259     )
    260     edge_feat, node_feat, state_feat = self.graph_layers[i](g, edge_feat, node_feat, state_feat)
    261 g.ndata["node_feat"] = node_feat

File ~/mambaforge/envs/htvs/lib/python3.9/site-packages/torch/nn/modules/module.py:1194, in Module._call_impl(self, *input, **kwargs)
   1190 # If we don't have any hooks, we want to skip the rest of the logic in
   1191 # this function, and just call forward.
   1192 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1193         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1194     return forward_call(*input, **kwargs)
   1195 # Do not call functions when jit is used
   1196 full_backward_hooks, non_full_backward_hooks = [], []

File ~/mambaforge/envs/htvs/lib/python3.9/site-packages/matgl/layers/_three_body.py:61, in ThreeBodyInteractions.forward(self, graph, line_graph, three_basis, three_cutoff, node_feat, edge_feat)
     59 print(three_cutoff)
     60 print(torch.stack(list(line_graph.edges()), dim=1).view(-1, 2))
---> 61 weights = three_cutoff[torch.stack(list(line_graph.edges()), dim=1)].view(-1, 2)  # type: ignore
     62 weights = torch.prod(weights, dim=-1)  # type: ignore
     63 basis = basis * weights[:, None]

IndexError: tensors used as indices must be long, byte or bool tensors

# === Second Error ===

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[1], line 13
     11 calculator = M3GNetCalculator(potential=potential)
     12 atoms.calc = calculator
---> 13 atoms.get_stress()

File ~/mambaforge/envs/htvs/lib/python3.9/site-packages/ase/atoms.py:820, in Atoms.get_stress(self, voigt, apply_constraint, include_ideal_gas)
    817 if self._calc is None:
    818     raise RuntimeError('Atoms object has no calculator.')
--> 820 stress = self._calc.get_stress(self)
    821 shape = stress.shape
    823 if shape == (3, 3):
    824     # Convert to the Voigt form before possibly applying
    825     # constraints and adding the dynamic part of the stress
    826     # (the "ideal gas contribution").

File ~/mambaforge/envs/htvs/lib/python3.9/site-packages/ase/calculators/abc.py:26, in GetPropertiesMixin.get_stress(self, atoms)
     25 def get_stress(self, atoms=None):
---> 26     return self.get_property('stress', atoms)

File ~/mambaforge/envs/htvs/lib/python3.9/site-packages/ase/calculators/calculator.py:737, in Calculator.get_property(self, name, atoms, allow_calculation)
    735     if not allow_calculation:
    736         return None
--> 737     self.calculate(atoms, [name], system_changes)
    739 if name not in self.results:
    740     # For some reason the calculator was not able to do what we want,
    741     # and that is OK.
    742     raise PropertyNotImplementedError('{} not present in this '
    743                                       'calculation'.format(name))

File ~/mambaforge/envs/htvs/lib/python3.9/site-packages/matgl/ext/ase.py:178, in M3GNetCalculator.calculate(self, atoms, properties, system_changes)
    176     energies, forces, stresses, hessians = self.potential(graph, self.state_attr)
    177 else:
--> 178     energies, forces, stresses, hessians = self.potential(graph, state_attr_default)
    179 self.results.update(
    180     energy=energies.detach().numpy(),
    181     free_energy=energies.detach().numpy(),
    182     forces=forces.detach().numpy(),
    183 )
    184 if self.compute_stress:

File ~/mambaforge/envs/htvs/lib/python3.9/site-packages/torch/nn/modules/module.py:1194, in Module._call_impl(self, *input, **kwargs)
   1190 # If we don't have any hooks, we want to skip the rest of the logic in
   1191 # this function, and just call forward.
   1192 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1193         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1194     return forward_call(*input, **kwargs)
   1195 # Do not call functions when jit is used
   1196 full_backward_hooks, non_full_backward_hooks = [], []

File ~/mambaforge/envs/htvs/lib/python3.9/site-packages/matgl/apps/pes.py:84, in Potential.forward(self, g, state_attr, l_g)
     82 total_energies = self.data_std * total_energies + self.data_mean
     83 if self.element_refs is not None:
---> 84     property_offset = torch.squeeze(self.element_refs(g))
     85     total_energies += property_offset
     87 forces = torch.zeros(1)

File ~/mambaforge/envs/htvs/lib/python3.9/site-packages/torch/nn/modules/module.py:1194, in Module._call_impl(self, *input, **kwargs)
   1190 # If we don't have any hooks, we want to skip the rest of the logic in
   1191 # this function, and just call forward.
   1192 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1193         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1194     return forward_call(*input, **kwargs)
   1195 # Do not call functions when jit is used
   1196 full_backward_hooks, non_full_backward_hooks = [], []

File ~/mambaforge/envs/htvs/lib/python3.9/site-packages/matgl/layers/_atom_ref.py:64, in AtomRef.forward(self, g, state_attr)
     52 """Get the total property offset for a system.
     53
     54 Args:
   (...)
     59     offset_per_graph
     60 """
     61 num_elements = (
     62     self.property_offset.size(dim=1) if self.property_offset.ndim > 1 else self.property_offset.size(dim=0)
     63 )
---> 64 one_hot = torch.eye(num_elements)[g.ndata["node_type"]]
     65 if self.property_offset.ndim > 1:
     66     offset_batched_with_state = []

IndexError: tensors used as indices must be long, byte or bool tensors

Code of Conduct

I agree to follow this project's Code of Conduct

When running trian_mp_eform.py, I was prompted for unexpected real parameters.

Thanks for providing such an wonderful work.
I'm trying to run "trian_mp_eform.py", but I'm getting an error：
"trainer. train(
TypeError: train() got an unexpected keyword argument 'n_epochs'”

How should I modify it to avoid this error.

LAMMPS interface

An interface to LAMMPS for MatGL is needed. This can be done in two ways:

Using LAMMPS's experimental python plugin infrastructure. See https://github.com/lanl/hippynn/blob/development/hippynn/interfaces/lammps_interface/mliap_interface.py for an example. This would not be very efficient code but it would be the easiest to finish quickly and allow users to leverage on LAMMPS workflows.
Write a C interface to LAMMPS. This was done for the TF version of M3GNet and we can perhaps adapt that implementation.

[Feature Request]: multi-fidelity code or explanation for the extended QM7b data set

Email (Optional)

[email protected]

Problem

I am trying to reproduce the results in

Chen, C.; Zuo, Y.; Ye, W.; Li, X.; Ong, S. P. Learning Properties of Ordered and Disordered Materials from Multi-Fidelity Data. Nature Computational Science, 2021, 1, 46–53.

for the extended QM7b energy data set (Extended data Fig 4b of the paper).

Following the instructions and looking at the code for the band gap example in the paper, the code for which is provided in GitHub, I set the 'state' variable/key in the structures to either 1 for low fidelity or 2 for high fidelity. This information is then passed as global feature setting nfeat_global = 1, global_embedding_dim=None.

However, I cannot reproduce the results using the default megnet with 3 blocks, 3 message passing step and graph_converter=CrystalGraph with the Gaussian distance method. Very little information was provided in the paper on this example and there is no information on the Github page.

Could you please upload the code for that example or else let me know the precise details how to implement the multi-fidelity version for that example please? Much appreciated.

Proposed Solution

if possible, could you upload the code for generating the results of Fig 4b?

Alternatives

No response

Code of Conduct

I agree to follow this project's Code of Conduct

Smoothness of energy at cutoff distance

Hello, I am using the following script to check for smoothness of the M3GNet PES. It's a very simple scenario where two particles are pulled apart inside a large box, i.e. no three-body contributions are involved.

import numpy as np
import matplotlib.pyplot as plt
from ase import Atoms
import torch
import matgl
from matgl.ext.ase import M3GNetCalculator

elements = [element for element in range(1, 4)]
N = len(elements)
r_min = 4.8
r_max = 5.2
for i, element1 in enumerate(elements):
    for element2 in elements[i:]:
        atoms = Atoms([element1, element2], positions=[(0.0, 0.0, 0.0), (r_min, 0.0, 0.0)],
                      cell=np.eye(3, dtype=np.float32) * 1000, pbc=True)
        model = matgl.load_model('M3GNet-MP-2021.2.8-PES')
        model.to(torch.float32)
        calc = M3GNetCalculator(potential=model)
        atoms.set_calculator(calc)
        distances = np.linspace(r_min, r_max, 100)
        potential_energies = []
        for distance in distances:
            atoms.set_distance(0, 1, distance)  # Set the distance between the two atoms
            potential_energy = atoms.get_potential_energy()
            potential_energies.append(potential_energy)
        plt.plot(distances, potential_energies)
        plt.xlabel('Distance (Å)')
        plt.ylabel('Potential Energy (eV)')
        plt.title(f'Potential Energy of {atoms.get_chemical_symbols()}')
        plt.grid(True)
        plt.show()

Looking at the plots, it seems that the energy curve is not as smooth around the cutoff as I would expect, but I might be wrong. What are your thoughts on this?

No such file or directory: '/usr/local/lib/python3.10/dist-packages/matgl/models/../../pretrained/MP-2018.6.1-Eform/megnet.pt'

Respected Authors
First of all, thank you for creating this wonderful library.
I am facing the same issue as the previous issue opened, but for MEGNET. I use pip install downloading matgl.
On the website, it is mentioned that pre-trained MEGNet models now available for formation energies and band gaps. I am trying to predict the band gaps, but got stuck in the implementation.

does not update the version

Hallo,

When I install via pip it does not show the latest version only 0.7.1.

no such file or directory: "../../pretrained/MP-2021.2.8-EFS/m3gnet.pt"

matgl/matgl/models/m3gnet.py

Line 245 in aaab746

def load(cls, model_dir: str = "MP-2021.2.8-EFS") -> M3GNet:

When I try to load the pretrained m3gnet potential, I get the following error:

FileNotFoundError: [Errno 2] No such file or directory: '/Users/kuner/opt/anaconda3/envs/atomate2/lib/python3.9/site-packages/matgl/models/../../pretrained/MP-2021.2.8-EFS/m3gnet.pt'

It seems like all of the necessary files are currently not being distributed with the package (note I installed this via 'pip install matgl'). Any help would be appreciated!

Stochasticity in M3GNet prediction

Hi Shyueping and Kenko

The dgl implentation of M3GNet gives stochastic predictions, while this was not observed with the original M3GNet implementation.

I'm worrying this might associated to some serious bug that needs attention to.
Please have a look:

Bug with loading M3GNet pre-trained model

Even when I load the element types, I still get an error indicating that the positional argument 'element_types' is missing. I think there might be a bug with how the model is loaded in utils/io. Do you have any suggestions on how to fix this?

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[3], line 11
      9 model_dict = json.load(open(os.path.join(potential_path,"model.json")))
     10 m3gnet = M3GNet(tuple(model_dict["kwargs"]["model"]["init_args"]["element_types"]))
---> 11 model, d = m3gnet.load(potential_path, include_json=True)
     12 # print(d)
     13 potential = Potential(model)

File ~/.conda/envs/matgl/lib/python3.8/site-packages/matgl/utils/io.py:132, in IOMixIn.load(cls, path, include_json, **kwargs)
    130 d = {k: v for k, v in d.items() if not k.startswith("@")}
    131 print(d)
--> 132 model = cls(**d)
    133 model.load_state_dict(state)  # type: ignore
    135 if include_json:

TypeError: __init__() missing 1 required positional argument: 'element_types'

M3GNet Training Tutorial Not Working

Hello,

I am working through the tutorial to train a M3GNet model (https://matgl.ai/tutorials%2FTraining%20a%20M3GNet%20Potential%20with%20PyTorch%20Lightning.html) and it appears that the interface for M3GNetDataset no longer takes inputs energies, forces, stresses.

My matgl version is 0.8.3. Could I be advised on how to proceed? Thank you in advance.

matgl - now available on conda-forge

I created a conda-forge package: https://anaconda.org/conda-forge/matgl

So you can now install matgl using:

conda install -c conda-forge matgl

Example for fine-tuning m3gnet

Hello. I would like to train a new model with the pre-trained m3gnet and my own AIMD trajectories data. Could you share an example of how to do it? Thank you so much!

M3GNet with isolated atom gives an error.

When trying to obtain the reference energies of single atoms or of non periodic systems, m3gnet gives an error.

single atom:

>>> import ase
>>> import matgl
>>> potential = matgl.load_model("M3GNet-MP-2021.2.8-PES")
>>> calc = matgl.ext.ase.M3GNetCalculator_new(potential)
>>> atoms3 = ase.Atoms('H', [[0,0,0]], cell = [100,100,100])
>>> atoms3.calc = calc_new
>>> atoms3.get_potential_energy()
---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

[<ipython-input-9-47bee463321d>](https://localhost:8080/#) in <cell line: 3>()
      1 atoms3 = Atoms('H', [[0,0,0]], cell = [100,100,100])
      2 atoms3.calc = calc_new
----> 3 atoms3.get_potential_energy()

6 frames

[/usr/local/lib/python3.10/dist-packages/numpy/core/shape_base.py](https://localhost:8080/#) in stack(arrays, axis, out)
    420     arrays = [asanyarray(arr) for arr in arrays]
    421     if not arrays:
--> 422         raise ValueError('need at least one array to stack')
    423 
    424     shapes = {arr.shape for arr in arrays}

ValueError: need at least one array to stack

non-periodic: (not expected to work)

>>> import ase
>>> import matgl
>>> potential = matgl.load_model("M3GNet-MP-2021.2.8-PES")
>>> calc = matgl.ext.ase.M3GNetCalculator_new(potential)
>>> atoms3 = ase.Atoms('H', [[0,0,0]])
>>> atoms3.calc = calc_new
>>> atoms3.get_potential_energy()
---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

[<ipython-input-10-2b31f4d7515d>](https://localhost:8080/#) in <cell line: 3>()
      1 atoms3 = Atoms('H', [[0,0,0]])
      2 atoms3.calc = calc_new
----> 3 atoms3.get_potential_energy()

5 frames

[/usr/local/lib/python3.10/dist-packages/ase/atoms.py](https://localhost:8080/#) in get_volume(self)
   1919         """Get volume of unit cell."""
   1920         if self.cell.rank != 3:
-> 1921             raise ValueError(
   1922                 'You have {0} lattice vectors: volume not defined'
   1923                 .format(self.cell.rank))

ValueError: You have 0 lattice vectors: volume not defined

A work around could be

atoms.positions.max(axis=0)-atoms.positions.min(axis=0)+calc.potential.model.cutoff

to obtain a bounding box around your atoms that places periodic images at least 1 cutoff away.

For analysis, single atoms or molecules were extracted from a simulation box in order to obtain their energy contribution.

The new way to obtain the pretrained model is very convenient, thank you for the update!

How to make a multi-target regression with m3gnet model?

Hi, I tried to repeat the example “Training a M3GNet Formation Energy Model with PyTorch Lightning.ipynb”, but I want to train this model to predict spectra as a vector, and when I try to train m3gnet model, I get the error, although I put the ntarget parameter.

# setup the architecture of MEGNet model
model = M3GNet(
        element_types=elem_list,
        is_intensive=True,
        readout_type="set2set",
        ntarget=66,
        )
# setup the MEGNetTrainer
lit_module = ModelLightningModule(model=model)

logger = CSVLogger("logs", name="M3GNet_training")
trainer = pl.Trainer(max_epochs=20, accelerator="gpu", logger=logger)
trainer.fit(model=lit_module, train_dataloaders=train_loader, val_dataloaders=val_loader)

ValueError: Caught ValueError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
    data = fetcher.fetch(index)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 54, in fetch
    return self.collate_fn(data)
  File "/usr/local/lib/python3.10/dist-packages/matgl/graph/data.py", line 31, in collate_fn
    labels = torch.tensor([next(iter(d.values())) for d in labels], dtype=matgl.float_th)  # type: ignore
ValueError: only one element tensors can be converted to Python scalars

https://colab.research.google.com/drive/1L05611HYB6UMb380xYWXp9nBZL51iHYc#scrollTo=6crRrc29Dawl

Pymatgen and ASE should be made optional dependencies

Ideally, matgl should be materials code agnostic. The only materials code-specific stuff should be in the ext and apps packages. Use in tests are ok as well.

I have removed the unnecessary dependency to pymatgen in _megnet.py. The only other problem I see is that AtomRef now contains a get_feature_matrix that optionally takes an input of list(Structures). Based on my reading, it seems this is unnecessary given that it is always created from a list of graphs by the internal code itself. @kenko911 should verify and remove the option to pass list of structures / molecules if this is not used in that manner.

The general idea should be that all graph architecture based stuff should be completely agnostic to materials code. So anything in layers, utils, graphs, data, and models should have no reference to any materials code (pymatgen, ASE or otherwise).

Once this is done, the ext and apps packages should do imports such that these two packages are made optional/extras.

`munch` raises `ModuleNotFoundError`

munch is imported but not specified as optional dep in pyproject.toml/setup.py.

matgl/examples/trainer_beta/qm9_utils.py

Line 13 in d754da8

from munch import Munch, munchify

matgl/examples/trainer_beta/train.py

Line 12 in d754da8

from munch import Munch

Just running pip install -e ./matgl and executing these scripts raises

ModuleNotFoundError: No module named 'munch'

Should bond vector + distance calculations be carried out in converters?

I see the following code repeated almost verbatim everywhere that an atomic graph is used:

graph, state_attr = converter.get_graph(structure)
bond_vec, bond_dist = compute_pair_vector_and_distance(graph)
graph.edata["bond_vec"] = bond_vec
graph.edata["bond_dist"] = bond_dist

Would it make sense to have those computed in the converter itself? Since almost all use cases of an atomic graph will want those edge attributes.

[Bug]: Urgent! ValueError: Bad serialized model or bad model name

Email (Optional)

No response

Version

0.8.5

Which OS(es) are you using?

MacOS
Windows
Linux

What happened?

matgl is holding up the whole MP stack with this error

ValueError: Bad serialized model or bad model name. It is possible that you have an older model cached. Please clear your cache by running python -c "import matgl; matgl.clear_cache()"

/opt/hostedtoolcache/Python/3.11.6/x64/lib/python3.11/site-packages/matgl/utils/io.py:213:

It's affecting pymatgen, atomate2, matcalc, emmet, ...
Please add tests for whatever broke so this doesn't happen again!

Code snippet

No response

Log output

No response

Code of Conduct

I agree to follow this project's Code of Conduct

[Bug]: Retrained M3Gnet potential cannot be used

Email (Optional)

No response

Version

0.9.1

Which OS(es) are you using?

MacOS
Windows
Linux

What happened?

I retrained M3GNet-MP-2021.2.8-PES potentials following tutorial from https://matgl.ai/tutorials%2FTraining%20a%20M3GNet%20Potential%20with%20PyTorch%20Lightning.html using collected structures, energy, forces and stresses data from our structure relaxations.
When I attempt to relax any structure using that potential, I got error AttributeError: 'M3GNet' object has no attribute 'calc_stresses'

Code snippet

from __future__ import annotations
.
import warnings

from ase.md.velocitydistribution import MaxwellBoltzmannDistribution
from pymatgen.core import Lattice, Structure
from pymatgen.io.ase import AseAtomsAdaptor

import matgl
from matgl.ext.ase import M3GNetCalculator, MolecularDynamics, Relaxer

warnings.simplefilter("ignore")

pot = matgl.load_model("MyRetrainedPotential") # this was put into ~/.cache/matgl after training
relaxer = Relaxer(potential=pot) # this produces following error:

Log output

Cell In[14], line 1
----> 1 relaxer = Relaxer(potential=pot)

File ~/miniconda3/envs/mgl/lib/python3.9/site-packages/matgl/ext/ase.py:211, in Relaxer.__init__(self, potential, state_attr, optimizer, relax_cell, stress_weight)
    200 """
    201 Args:
    202     potential (Potential): a M3GNet potential, a str path to a saved model or a short name for saved model
   (...)
    208     stress_weight (float): conversion factor from GPa to eV/A^3.
    209 """
    210 self.optimizer: Optimizer = OPTIMIZERS[optimizer.lower()].value if isinstance(optimizer, str) else optimizer
--> 211 self.calculator = M3GNetCalculator(
    212     potential=potential,
    213     state_attr=state_attr,
    214     stress_weight=stress_weight,  # type: ignore
    215 )
    216 self.relax_cell = relax_cell
    217 self.potential = potential

File ~/miniconda3/envs/mgl/lib/python3.9/site-packages/matgl/ext/ase.py:146, in M3GNetCalculator.__init__(self, potential, state_attr, stress_weight, **kwargs)
    144 super().__init__(**kwargs)
    145 self.potential = potential
--> 146 self.compute_stress = potential.calc_stresses
    147 self.compute_hessian = potential.calc_hessian
    148 self.stress_weight = stress_weight

File ~/miniconda3/envs/mgl/lib/python3.9/site-packages/torch/nn/modules/module.py:1695, in Module.__getattr__(self, name)
   1693     if name in modules:
   1694         return modules[name]
-> 1695 raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")

AttributeError: 'M3GNet' object has no attribute 'calc_stresses'

Code of Conduct

I agree to follow this project's Code of Conduct

[Feature Request]: Request for M3GNet Training Example for Property Prediction

Email (Optional)

Problem

I would like to request the inclusion of an example demonstrating the M3GNet training for property prediction. Currently, there's no dedicated example available.
I attempted to adapt the MEGNet example(examples/Training a MEGNet Formation Energy Model with PyTorch Lightning.ipynb) for M3GNet, but I encountered errors and unexpected issues during this process.

Proposed Solution

Please add an example notebook (or concise code snippet) demonstrating training M3GNet for property prediction. This addition will greatly assist users like me looking to utilize M3GNet for material property prediction tasks.

Alternatives

No response

Code of Conduct

I agree to follow this project's Code of Conduct

Error when trying to train MEGNet model on MP-2018.6.1

I am trying to run the script here:
https://github.com/materialsvirtuallab/matgl/blob/main/examples/training/MEGNet/MP-2018.6.1-Eform/train_mp_eform.py

And I am getting the following error.

Traceback (most recent call last):
File "/home/trial/matgl-try/matgl/examples/training/MEGNet/MP-2018.6.1-Eform/train_mp_eform.py", line 130, in
trainer.train(
TypeError: train() got an unexpected keyword argument 'n_epochs'

Here is how to reproduce this issue assuming you have Conda or Miniconda installed:

Create a new directory:
mkdir matgel-try
Create a conda environment with Python 3.9:
conda create -n matgl-trial python=3.9
Activate the Conda environment:
conda activate matgl-trial
Clone the repo:
git clone https://github.com/materialsvirtuallab/matgl.git
cd into the repo:
cd matgl/
install the package using the editable option:
pip install -e .
Run the script:
python examples/training/MEGNet/MP-2018.6.1-Eform/train_mp_eform.py

In fact, running the help function in the class method through the command help(trainer.train) gives the following:

Help on method train in module torch.nn.modules.module:

train(mode: bool = True) -> ~T method of matgl.utils.training.ModelTrainer instance
Sets the module in training mode.
This has any effect only on certain modules. See documentations of
particular modules for details of their behaviors in training/evaluation
mode, if they are affected, e.g. :class:`Dropout`, :class:`BatchNorm`,
etc.

Args:
    mode (bool): whether to set training mode (``True``) or evaluation
                 mode (``False``). Default: ``True``.

Returns:
    Module: self

The Arguments for the method do not seem to match the provided arguments. I think this error is due to updates that rolled out this month for both torch and pytorch-lightening but I am not sure how to fix it.

Ahmed

GPU Training

Dear developers,

Hello there! I hope you're doing well.

I have a quick question regarding the current state of GPU training support in your `matgl`` package.
As of now, does it support GPU training, or is it limited to CPU-related adaptation only?

When using default dgl in matgl, it would fail.

DGLError: [09:37:23] /opt/dgl/src/runtime/c_runtime_api.cc:82: Check failed: allow_missing: Device API cuda is not enabled. Please install the cuda version of dgl.

However, when cu102 dgl is used, it would fail to load split_dataset from dgl.data.utils.

OSError                                   Traceback (most recent call last)
Cell In[1], line 13
     11 import pytorch_lightning as pl
     12 import torch
---> 13 from dgl.data.utils import split_dataset
......

OSError: libcudart.so.10.2: cannot open shared object file: No such file or directory
---------------------------------------------------------------------------

I appreciate any insights you can provide on this matter. Thank you!

[Bug]: Data type for ASE potential energy result is not a float

Email (Optional)

No response

Version

0.0.3

Which OS(es) are you using?

MacOS
Windows
Linux

What happened?

The calc.results["energy"] property from the M3GNetCalculator should be a float but is a numpy array of length 1. This behavior is not observed in CHGNet, for what it's worth.

Code snippet

import matgl
from matgl.ext.ase import M3GNetCalculator
from ase.build import bulk

atoms = bulk("Cu")
potential = matgl.load_model("M3GNet-MP-2021.2.8-DIRECT-PES")
atoms.calc = M3GNetCalculator(potential)
e = atoms.get_potential_energy() # or atoms.calc.results["energy"]
print(e)

You can compare this with:

from ase.build import bulk
from ase.calculators.emt import EMT

atoms = bulk("Cu")
atoms.calc = EMT()
e = atoms.get_potential_energy()
print(e)

Log output

The output is: 


array(-32.750034, dtype=float32)

Code of Conduct

I agree to follow this project's Code of Conduct

Error when trying trainer.test

Hi, thank you for these amazing codes.
I tried to follow the example notebook file to train M3GNet potential. I added trainer.test(model=lit_module, dataloaders=test_loader) to test the model, but I got the RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn. I found from google that it can be due to required_grad=False in the Tensor, but the error had not occured during training and validation, so it might not the cause of the error. Could you help me out to solve this? Thanks!

materialsvirtuallab / matgl Goto Github PK

matgl's Issues

Email (Optional)

Version

Which OS(es) are you using?

What happened?

Code snippet

Log output

Code of Conduct

Email (Optional)

Version

Which OS(es) are you using?

What happened?

Code snippet

Log output

Code of Conduct

Email (Optional)

Version

Which OS(es) are you using?

What happened?

Code snippet

Log output

Code of Conduct

Email (Optional)

Problem

Proposed Solution

Alternatives

Code of Conduct

Email (Optional)

Problem

Proposed Solution

Alternatives

Code of Conduct

Email (Optional)

Version

Which OS(es) are you using?

What happened?

Code snippet

Log output

Code of Conduct

Email (Optional)

Version

Which OS(es) are you using?

What happened?

Code snippet

Log output

Code of Conduct

Email (Optional)

Problem

Proposed Solution

Alternatives

Code of Conduct

Email (Optional)

Version

Which OS(es) are you using?

What happened?

Code snippet

Log output

Code of Conduct

Email (Optional)

Version

Which OS(es) are you using?

What happened?

Code snippet

Log output

Code of Conduct

Email (Optional)

Problem

Proposed Solution

Alternatives

Code of Conduct

Email (Optional)

Version

Which OS(es) are you using?

What happened?

Code snippet

Log output

Code of Conduct

Recommend Projects

Recommend Topics