GithubHelp home page GithubHelp logo

hse-lambda / ai4material_design Goto Github PK

View Code? Open in Web Editor NEW
4.0 8.0 3.0 125.64 MB

Code for Kazeev, N., Al-Maeeni, A.R., Romanov, I. et al. Sparse representation for machine learning the properties of defects in 2D materials. npj Comput Mater 9, 113 (2023).

Home Page: https://doi.org/10.1038/s41524-023-01062-z

License: Apache License 2.0

Jupyter Notebook 98.26% Python 1.47% Dockerfile 0.01% Shell 0.23% Makefile 0.01% JavaScript 0.03%
2d-materials bandgap energy-prediction graph-neural-networks sparse structure-property

ai4material_design's Introduction

Sparse representation for machine learning the properties of defects in 2D materials

Quickstart

Open in Constructor Research Platform (a cloud service for scientific computations)

Open in Constructor Research

Table of contents

Summary

In the paper we propose sparse representation as a way to reduce the computational cost and improve the accuracy of machine learning the properties of defects in 2D materials. The code in the project implements the method, and a rigorous comparison of its performance to the a set of baselines.

Two-dimensional materials offer a promising platform for the next generation of (opto-) electronic devices and other high technology applications. One of the most exciting characteristics of 2D crystals is the ability to tune their properties via controllable introduction of defects. However, the search space for such structures is enormous, and ab-initio computations prohibitively expensive. We propose a machine learning approach for rapid estimation of the properties of 2D material given the lattice structure and defect configuration. The method suggests a way to represent configuration of 2D materials with defects that allows a neural network to train quickly and accurately. We compare our methodology with the state-of-the-art approaches and demonstrate at least 3.7 times energy prediction error drop. Also, our approach is an order of magnitude more resource-efficient than its contenders both for the training and inference part.

The main idea of our method is using a point cloud of defects as an input to the predictive model, as opposed to the usual point cloud of atoms, or expertly created feature vector. Sparse representation construction

We compare our approach to state-of-the-art generic structure-property prediction algorithms: GemNet, SchNet, MegNet, matminer+CatBoost.

For dataset, we use 2DMD. It consists of the most popular 2D materials: MoS2, WSe2, h-BN, GaSe, InSe, and black phosphorous (BP) with point defect density in the range of 2.5% to 12.5%. We use DFT to relax the structures and compute the defect formation energy and HOMO-LUMO gap. ML algorithms predict those quantities, taking unrelaxed structures as input.

Using the pre-trained models

Library

Use the library https://github.com/HSE-LAMBDA/MEGNetSparse/

This repository

  1. Clone the repository
  2. Set up the environment
  3. Download the weights and data:
dvc pull datasets/checkpoints/combined_mixed_all_train/formation_energy_per_site/megnet_pytorch/sparse/05-12-2022_19-50-53/d6b7ce45/0.pth.dvc datasets/checkpoints/combined_mixed_all_train/homo_lumo_gap_min/megnet_pytorch/sparse/05-12-2022_19-50-53/831cc496/0.pth.dvc csv-cif-low-density-8x8 csv-cif-no-spin-500-data csv-cif-spin-500-data train-only-split

The data are not needed for predictions, and are only used to generate new structures in the example notebook.

  1. Open the notebook. It contains the prediction code, along with generation of new structures with defects, and example processing of user-uploaded data.

Citation

Please cite the following two papers if you use the code or the data:

Kazeev, N., Al-Maeeni, A.R., Romanov, I. et al. Sparse representation for machine learning the properties of defects in 2D materials. npj Comput Mater 9, 113 (2023). https://doi.org/10.1038/s41524-023-01062-z
Huang, P., Lukin, R., Faleev, M. et al. Unveiling the complex structure-property correlation of defects in 2D materials based on high throughput datasets. npj 2D Mater Appl 7, 6 (2023). https://doi.org/10.1038/s41699-023-00369-1

Internal links

  • The overall design is documented in an obsolete flowchart
  • Some design decisions are outlined in an obsolete RFC
  • Project log is in Notion
  • Paper in Overleaf

ai4material_design's People

Contributors

abdalazizrashid avatar anaderi avatar gingeard avatar implausibledeniability avatar kazeevn avatar marts2007 avatar pengru2021 avatar romanovignat avatar ruskinkot1 avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ai4material_design's Issues

Combined Dataset

Hi, thanks for your providing dataset.
I downloaded the combined dataset from Constructor Research Platform. I found NaN value in formation energy, band_gap and total_mag. And the formation energy is all -7.0 as well as there is only 0.0,1.0,2.0 in band gap. I also check other dataset and there are no such error, I guess you might have some mistake when combining all datasets. Could you update it or am I downloaded a wrong file?

EOS error for hBN and BP

/home/kna/.cache/pypoetry/virtualenvs/2d-defects-potential-learning-pYjw2mkT-py3.10/lib64/python3.10/site-packages/pymatgen/io/cif.py:1153: UserWarning: Issues encountered while parsing CIF: Some fractional coordinates rounded to ideal values to avoid issues with finite precision.
  warnings.warn("Issues encountered while parsing CIF: " + "\n".join(self.warnings))
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500/500 [00:08<00:00, 56.96it/s]
/home/kna/.cache/pypoetry/virtualenvs/2d-defects-potential-learning-pYjw2mkT-py3.10/lib64/python3.10/site-packages/pymatgen/io/cif.py:1153: UserWarning: Issues encountered while parsing CIF: Some fractional coordinates rounded to ideal values to avoid issues with finite precision.
  warnings.warn("Issues encountered while parsing CIF: " + "\n".join(self.warnings))
Traceback (most recent call last):
  File "/home/kna/ai4material_design/scripts/parse_csv_cif.py", line 288, in <module>
    main()
  File "/home/kna/ai4material_design/scripts/parse_csv_cif.py", line 204, in main
    unit_cells[material] = eos.get_augmented_struct(unit_cells[material])
  File "/home/kna/ai4material_design/scripts/parse_csv_cif.py", line 95, in get_augmented_struct
    _struct = self.remove_other_species(Structure.from_sites(shells_sites), site)
  File "/home/kna/ai4material_design/scripts/parse_csv_cif.py", line 56, in remove_other_species
    return Structure.from_sites([site for site in structure if site.properties['center_index'] is not None] + [center])
  File "/home/kna/ai4material_design/scripts/parse_csv_cif.py", line 56, in <listcomp>
    return Structure.from_sites([site for site in structure if site.properties['center_index'] is not None] + [center])
KeyError: 'center_index'
(2d-defects-potential-learning-pYjw2mkT-py3.10) [kna@badang ai4material_design]$  python scripts/parse_csv_cif.py --input-name=high_density_defects/BP_spin_500 --normalize-homo-lumo 
  0%|                                                                                                                                 | 0/500 [00:00<?, ?it/s]/home/kna/.cache/pypoetry/virtualenvs/2d-defects-potential-learning-pYjw2mkT-py3.10/lib64/python3.10/site-packages/pymatgen/io/cif.py:1153: UserWarning: Issues encountered while parsing CIF: Some fractional coordinates rounded to ideal values to avoid issues with finite precision.
  warnings.warn("Issues encountered while parsing CIF: " + "\n".join(self.warnings))
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500/500 [00:09<00:00, 52.39it/s]
Traceback (most recent call last):
  File "/home/kna/ai4material_design/scripts/parse_csv_cif.py", line 288, in <module>
    main()
  File "/home/kna/ai4material_design/scripts/parse_csv_cif.py", line 204, in main
    unit_cells[material] = eos.get_augmented_struct(unit_cells[material])
  File "/home/kna/ai4material_design/scripts/parse_csv_cif.py", line 95, in get_augmented_struct
    _struct = self.remove_other_species(Structure.from_sites(shells_sites), site)
  File "/home/kna/ai4material_design/scripts/parse_csv_cif.py", line 56, in remove_other_species
    return Structure.from_sites([site for site in structure if site.properties['center_index'] is not None] + [center])
  File "/home/kna/ai4material_design/scripts/parse_csv_cif.py", line 56, in <listcomp>
    return Structure.from_sites([site for site in structure if site.properties['center_index'] is not None] + [center])
KeyError: 'center_index'

Wrong sparse representation for MoS2 and WSe2

[kna@badang ai4material_design]$ . /home/kna/.cache/pypoetry/virtualenvs/2d-defects-potential-learning-pYjw2mkT-py3.10/bin/activate
(2d-defects-potential-learning-pYjw2mkT-py3.10) [kna@badang ai4material_design]$ python scripts/parse_csv_cif.py --input-name=high_density_defects/MoS2_500 --normalize-homo-lumo --fill-missing-band-properties
  0%|                                                                                                                                 | 0/500 [00:00<?, ?it/s]/home/kna/.cache/pypoetry/virtualenvs/2d-defects-potential-learning-pYjw2mkT-py3.10/lib64/python3.10/site-packages/pymatgen/io/cif.py:1153: UserWarning: Issues encountered while parsing CIF: Some fractional coordinates rounded to ideal values to avoid issues with finite precision.
  warnings.warn("Issues encountered while parsing CIF: " + "\n".join(self.warnings))
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500/500 [00:13<00:00, 36.22it/s]
/home/kna/.cache/pypoetry/virtualenvs/2d-defects-potential-learning-pYjw2mkT-py3.10/lib64/python3.10/site-packages/pymatgen/io/cif.py:1153: UserWarning: Issues encountered while parsing CIF: Some fractional coordinates rounded to ideal values to avoid issues with finite precision.
  warnings.warn("Issues encountered while parsing CIF: " + "\n".join(self.warnings))
/home/kna/.cache/pypoetry/virtualenvs/2d-defects-potential-learning-pYjw2mkT-py3.10/lib64/python3.10/site-packages/pymatgen/core/structure.py:744: UserWarning: Not all sites have property center_index. Missing values are set to None.
  warnings.warn(f"Not all sites have property {k}. Missing values are set to None.")
/home/kna/.cache/pypoetry/virtualenvs/2d-defects-potential-learning-pYjw2mkT-py3.10/lib64/python3.10/site-packages/pymatgen/core/structure.py:744: UserWarning: Not all sites have property shells. Missing values are set to None.
  warnings.warn(f"Not all sites have property {k}. Missing values are set to None.")
Traceback (most recent call last):
  File "/home/kna/ai4material_design/scripts/parse_csv_cif.py", line 288, in <module>
    main()
  File "/home/kna/ai4material_design/scripts/parse_csv_cif.py", line 243, in main
    assert structures.apply(lambda row: len(row[COLUMNS["structure"]["sparse_unrelaxed"]]) == len(
AssertionError
(2d-defects-potential-learning-pYjw2mkT-py3.10) [kna@badang ai4material_design]$ python scripts/parse_csv_cif.py --input-name=high_density_defects/WSe2_500 --normalize-homo-lumo --fill-missing-band-properties
  0%|                                                                                                                                 | 0/500 [00:00<?, ?it/s]/home/kna/.cache/pypoetry/virtualenvs/2d-defects-potential-learning-pYjw2mkT-py3.10/lib64/python3.10/site-packages/pymatgen/io/cif.py:1153: UserWarning: Issues encountered while parsing CIF: Some fractional coordinates rounded to ideal values to avoid issues with finite precision.
  warnings.warn("Issues encountered while parsing CIF: " + "\n".join(self.warnings))
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500/500 [00:14<00:00, 33.35it/s]
/home/kna/.cache/pypoetry/virtualenvs/2d-defects-potential-learning-pYjw2mkT-py3.10/lib64/python3.10/site-packages/pymatgen/io/cif.py:1153: UserWarning: Issues encountered while parsing CIF: Some fractional coordinates rounded to ideal values to avoid issues with finite precision.
  warnings.warn("Issues encountered while parsing CIF: " + "\n".join(self.warnings))
/home/kna/.cache/pypoetry/virtualenvs/2d-defects-potential-learning-pYjw2mkT-py3.10/lib64/python3.10/site-packages/pymatgen/core/structure.py:744: UserWarning: Not all sites have property center_index. Missing values are set to None.
  warnings.warn(f"Not all sites have property {k}. Missing values are set to None.")
/home/kna/.cache/pypoetry/virtualenvs/2d-defects-potential-learning-pYjw2mkT-py3.10/lib64/python3.10/site-packages/pymatgen/core/structure.py:744: UserWarning: Not all sites have property shells. Missing values are set to None.
  warnings.warn(f"Not all sites have property {k}. Missing values are set to None.")
Traceback (most recent call last):
  File "/home/kna/ai4material_design/scripts/parse_csv_cif.py", line 288, in <module>
    main()
  File "/home/kna/ai4material_design/scripts/parse_csv_cif.py", line 243, in main
    assert structures.apply(lambda row: len(row[COLUMNS["structure"]["sparse_unrelaxed"]]) == len(
AssertionError

Multiprocessing leak

When running

WANDB_ENTITY=hse_lambda python run_experiments.py --experiments MoS2-plain-cv --trials megnet_pytorch-sparse megnet_pytorch-sparse-z megnet_pytorch-sparse-z-were --processes-per-gpu=8 --gpus 0

I get the following at the end. The calculations finish fine, though.

Predictions has been saved! /home/kna/ai4material_design/datasets/predictions/MoS2-plain-cv/homo/megnet_pytorch-sparse-z-were.csv.gz
/usr/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 97 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
/usr/lib/python3.8/multiprocessing/resource_tracker.py:229: UserWarning: resource_tracker: '/mp-u3gb9qz0': [Errno 2] No such file or directory
  warnings.warn('resource_tracker: %r: %s' % (name, e))

Might be related to the HPC problems, might be not)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.