GithubHelp home page GithubHelp logo

materialsdatasets's Introduction

Materials datasets for ML

Dr. Hu (University of South Carolina, Department of Computer Science and Engineering)
Machine learning and Evolution Laboratory

Database review papers

Datasets and Databases

MaterialsProject database (144,595+more)
https://materialsproject.org/

Aflowlib dataset (3,541,633)
http://aflowlib.org

OQMD dataset (815,654)
http://www.oqmd.org

ICSD crystal materials dataset
242,828 crystals structures

Carolina Materials Database (Hypothetical new materials)
http://www.carolinamatdb.org/

NREL materials database
https://materials.nrel.gov/queryStd

CGCNN benchmark datasets (3). MP and perovskite
https://github.com/txie-93/cgcnn/tree/master/data/material-data

Roost benchmark 4 datasets (composition only ML)
https://github.com/CompRhys/roost/tree/master/data/datasets

5 datasets for structure based property prediction
MatBench: Benchmarking graph neural networks for materials chemistry
https://github.com/vxfung/MatDeepLearn

18 datasets for property prediction
https://hackingmaterials.lbl.gov/automatminer/datasets.html#down-loading-datasets
Robust model benchmarking and bias-imbalance in data-driven materials science: a case study on MODNet

28 datasets for composition only ML
https://github.com/kaaiian/mse_datasets https://github.com/anthony-wang/CrabNet

Scientific data journal datasets
https://www.nature.com/search?q=materials&journal=sdata&order=relevance

2D materials databases
C2DB 2dmatpedia V2DB-virtual 2d materials 2D semiconductor database

Computational Materials Repository (CMR)
https://wiki.fysik.dtu.dk/cmr/index.html

Thermal conductivity dataset
https://tedesignlab.org/database/ (2700 samples)

materials synthesis data
https://github.com/CederGroupHub/text-mined-synthesis_public

Lithium-Ion Battery Electrolyte (LIBE) dataset
computed properties of over 17,000 molecules relevant to electrolyte and interphase chemistry

Superconductor database
supercon
https://supercon.nims.go.jp/en/

Topological insulator database
https://www.topologicalquantumchemistry.com/#/
https://www.materialscloud.org/discover/topomat 31000

Database of Two-Dimensional Hybrid Perovskite Materials
2D Perovskites Database

property Net
https://github.com/materialsintelligence/propnet

Mendeley materials data
https://www.journals.elsevier.com/materials-science-and-engineering-a/mendeley-datasets

Figshare public materials datasets
https://figshare.com/search?q=materials+dataset or sorted by hits

Materials data on data.world
https://data.world/datasets/materials

NIST Materials data facility
https://www.materialsdatafacility.org/

Northwestern Center for Hierachical materials design
https://chimad.northwestern.edu/research/databases.html

MOF materials structures
https://mof-international.org/mof-structures/\ MOFDB northwestern 163000
QMOF Database

Molecule datasets
QM9 Stanford Open graph challenge datasets ZINC

Drug molecule database
CHEMBL bioactive molecules with drug-like properties\

Phonon and topological phonic materials
http://www.phonon.synl.ac.cn:8080/home \ zhuqiang: Topological Phononic Materials Database

4,914 perovskite oxides
https://figshare.com/articles/dataset/Wolverton_Oxides_Data/7250417

https://github.com/blaiszik/Materials-Databases

JARVIS-dataset
https://jarvis-materials-design.github.io/dbdocs/thedownloads/

JARVIS-DFT
3D-materials curated data
The dataset contains metadata for JARVIS-DFT data for 3D materials. Property keys include: 'jid', 'atoms', 'formation_energy_peratom', 'optb88vdw_bandgap', 'elastic_tensor','effective_masses_300K', 'kpoint_length_unit', 'encut','optb88vdw_total_energy', 'mbj_bandgap', 'epsx', 'mepsx', 'epsy','mepsy', 'epsz', 'mepsz', 'kpoints_array', 'bulk_modulus_kv', 'shear_modulus_gv', 'modes', 'magmom_outcar','magmom_oszicar', 'icsd', 'spillage', 'slme', 'dfpt_piezo_max_eij','dfpt_piezo_max_dij', 'dfpt_piezo_max_dielectric','dfpt_piezo_max_dielectric_electronic', 'dfpt_piezo_max_dielectric_ionic', 'max_ir_mode', 'min_ir_mode', 'n-Seebeck', 'p-Seebeck', 'exfoliation_energy', 'n-powerfact', 'p-powerfact', 'ehull', 'dfpt_piezo_max_dielectric_ioonic'.

2D-materials curated data
The dataset contains metadata for JARVIS-DFT data for 2D materials. Similar properties as 3D materials.

SCF Optimization input/output files
The dataset contains raw input/output files during geometric optmization of crystals. Files included are: POSCAR, INCAR, vasprun.xml, KPOINTS, OSZICAR. Similar files are provided for dataset below.

Optoelectronic data
The dataset contains raw input/output files for bandgap, DOS, frequency dependent dielctric function of crystals using OptB88vdW functional.

OptB88vdW optoelectronic calculation files
The dataset contains raw input/output files for bandgap, DOS and frequency dependent dielctric function of crystals using TBmBJ potential.

TBmBJ optoelectronic calculation files
The dataset contains raw input/output files for bandgap, DOS and frequency dependent dielctric function of crystals using TBmBJ potential.

Finite Difference Elastic Constants Input/Output files
The dataset contains raw input/output files for finite difference based elastic constant and gamma-point phonon calculations.

DFPT raw input/output files
The dataset contains raw input/output files for DFPT based gamma-point phonon and piezoelectric, dielectric calculations.

Computational scanning tunneling microscopy images
The dataset contains STM images for 2D materials.

Wannier Tight-binding Hamiltonians
The dataset contains STM Wannier tight binding Hamiltonians for 3D and 2D materials.

JARVIS-FF
Elastic constants data
The dataset contains elastic constants of materials with several potentials.

Surface and vacancy-formation energies data
The dataset contains vacancy formation energy of materials with several potentials.

JARVIS-ML
JARVIS-CFID descriptors datasets for JARVIS-DFT, MP, OQMD, QM9, AFLOW
The dataset contains CFID descriptor of materials with several properties.

Data-driven Discovery of 3D and 2D Thermoelectric Materials
The dataset contains thermoelectric properties of 3D and 2D materials

Table. A brief summary of datasets available in the JARVIS-DFT.

Material classes	Numbers
3D-bulk	33482
2D-bulk	2293
1D-bulk	235
0D-bulk	413
2D-monolayer	1105
2D-bilayer	102
Molecules	12
Heterostructure	3
Total DFT calculated systems	37646
Table. A brief summary of functionals used in optimizing crystal geometry in the JARVIS-DFT.

Functionals	Numbers
vdW-DF-OptB88 (OPT)	37646
vdW-DF-OptB86b (MK)	109
vdW-DF-OptPBE (OR)	111
PBE	99
LDA	92
Table. A brief summary of material-properties available in the JARVIS-DFT. The database is continuously expanding.

JARVIS-DFT Properties	Numbers
Optimized crystal-structure (OPT)	37646
Formation-energy (OPT)	37646
Bandgap (OPT)	37646
Exfoliation energy (OPT)	819
Bandgap (TBmBJ)	15655
Bandgap (HSE06)	40
Bandgap (PBE0)	40
Bandgap (G0W0)	15
Bandgap (DMFT)	11
Frequency dependent dielectric tensor (OPT)	34045
Frequency dependent dielectric tensor (TBmBJ)	15655
Elastic-constants (OPT)	15500
Finite-difference phonons at Г-point (OPT)	15500
Work-function, electron-affinity (OPT)	1105
Theoretical solar-cell efficiency (SLME) (TBmBJ)	5097
Topological spin-orbit spillage (PBE+SOC)	11500
Wannier tight-binding Hamiltonians (PBE+SOC)	1771
Seebeck coefficient (OPT, BoltzTrap)	22190
Power factor (OPT, BoltzTrap)	22190
Effective mass (OPT, BoltzTrap)	22190
Magnetic moment (OPT)	37528
Piezoelectric constant (OPT, DFPT)	5015
Dielectric tensor (OPT, DFPT)	5015
Infrared intensity (OPT, DFPT)	5015
DFPT phonons at Г-point (OPT)	5015
Electric field gradient (OPT)	15187
Non-resonant Raman intensity (OPT, DFPT)	250
Scanning tunneling microscopy images (PBE+SOC)	770

materialsdatasets's People

Contributors

usccolumbia avatar

Stargazers

Arsenij Pisarevskij avatar  avatar QiaoLin-Yang avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.