Dr. Hu (University of South Carolina, Department of Computer Science and Engineering)
Machine learning and Evolution Laboratory
- Machine Learning for Materials Scientists:An introductory guide towards best practices
- Property-Oriented Material Design Based on Data-Driven Machine Learning Technique
- A Critical Review of Machine Learning of Energy Materials. ONG
- A compact review of molecular property prediction with graph neural networks
- Github:The Collection of Database and Dataset Resources in Materials Science
- https://bio.tools/ dataset and web app discovery engine
MaterialsProject database (144,595+more)
https://materialsproject.org/
Aflowlib dataset (3,541,633)
http://aflowlib.org
OQMD dataset (815,654)
http://www.oqmd.org
ICSD crystal materials dataset
242,828 crystals structures
Carolina Materials Database (Hypothetical new materials)
http://www.carolinamatdb.org/
NREL materials database
https://materials.nrel.gov/queryStd
CGCNN benchmark datasets (3). MP and perovskite
https://github.com/txie-93/cgcnn/tree/master/data/material-data
Roost benchmark 4 datasets (composition only ML)
https://github.com/CompRhys/roost/tree/master/data/datasets
5 datasets for structure based property prediction
MatBench: Benchmarking graph neural networks for materials chemistry
https://github.com/vxfung/MatDeepLearn
18 datasets for property prediction
https://hackingmaterials.lbl.gov/automatminer/datasets.html#down-loading-datasets
Robust model benchmarking and bias-imbalance in data-driven materials science: a case study on MODNet
28 datasets for composition only ML
https://github.com/kaaiian/mse_datasets
https://github.com/anthony-wang/CrabNet
Scientific data journal datasets
https://www.nature.com/search?q=materials&journal=sdata&order=relevance
2D materials databases
C2DB
2dmatpedia
V2DB-virtual 2d materials
2D semiconductor database
Computational Materials Repository (CMR)
https://wiki.fysik.dtu.dk/cmr/index.html
Thermal conductivity dataset
https://tedesignlab.org/database/ (2700 samples)
materials synthesis data
https://github.com/CederGroupHub/text-mined-synthesis_public
Lithium-Ion Battery Electrolyte (LIBE) dataset
computed properties of over 17,000 molecules relevant to electrolyte and interphase chemistry
Superconductor database
supercon
https://supercon.nims.go.jp/en/
Topological insulator database
https://www.topologicalquantumchemistry.com/#/
https://www.materialscloud.org/discover/topomat 31000
Database of Two-Dimensional Hybrid Perovskite Materials
2D Perovskites Database
property Net
https://github.com/materialsintelligence/propnet
Mendeley materials data
https://www.journals.elsevier.com/materials-science-and-engineering-a/mendeley-datasets
Figshare public materials datasets
https://figshare.com/search?q=materials+dataset or sorted by hits
Materials data on data.world
https://data.world/datasets/materials
NIST Materials data facility
https://www.materialsdatafacility.org/
Northwestern Center for Hierachical materials design
https://chimad.northwestern.edu/research/databases.html
MOF materials structures
https://mof-international.org/mof-structures/\
MOFDB northwestern 163000
QMOF Database
Molecule datasets
QM9
Stanford Open graph challenge datasets
ZINC
Drug molecule database
CHEMBL bioactive molecules with drug-like properties\
Phonon and topological phonic materials
http://www.phonon.synl.ac.cn:8080/home \
zhuqiang: Topological Phononic Materials Database
4,914 perovskite oxides
https://figshare.com/articles/dataset/Wolverton_Oxides_Data/7250417
https://github.com/blaiszik/Materials-Databases
JARVIS-dataset
https://jarvis-materials-design.github.io/dbdocs/thedownloads/
JARVIS-DFT
3D-materials curated data
The dataset contains metadata for JARVIS-DFT data for 3D materials. Property keys include: 'jid', 'atoms', 'formation_energy_peratom', 'optb88vdw_bandgap', 'elastic_tensor','effective_masses_300K', 'kpoint_length_unit', 'encut','optb88vdw_total_energy', 'mbj_bandgap', 'epsx', 'mepsx', 'epsy','mepsy', 'epsz', 'mepsz', 'kpoints_array', 'bulk_modulus_kv', 'shear_modulus_gv', 'modes', 'magmom_outcar','magmom_oszicar', 'icsd', 'spillage', 'slme', 'dfpt_piezo_max_eij','dfpt_piezo_max_dij', 'dfpt_piezo_max_dielectric','dfpt_piezo_max_dielectric_electronic', 'dfpt_piezo_max_dielectric_ionic', 'max_ir_mode', 'min_ir_mode', 'n-Seebeck', 'p-Seebeck', 'exfoliation_energy', 'n-powerfact', 'p-powerfact', 'ehull', 'dfpt_piezo_max_dielectric_ioonic'.
2D-materials curated data
The dataset contains metadata for JARVIS-DFT data for 2D materials. Similar properties as 3D materials.
SCF Optimization input/output files
The dataset contains raw input/output files during geometric optmization of crystals. Files included are: POSCAR, INCAR, vasprun.xml, KPOINTS, OSZICAR. Similar files are provided for dataset below.
Optoelectronic data
The dataset contains raw input/output files for bandgap, DOS, frequency dependent dielctric function of crystals using OptB88vdW functional.
OptB88vdW optoelectronic calculation files
The dataset contains raw input/output files for bandgap, DOS and frequency dependent dielctric function of crystals using TBmBJ potential.
TBmBJ optoelectronic calculation files
The dataset contains raw input/output files for bandgap, DOS and frequency dependent dielctric function of crystals using TBmBJ potential.
Finite Difference Elastic Constants Input/Output files
The dataset contains raw input/output files for finite difference based elastic constant and gamma-point phonon calculations.
DFPT raw input/output files
The dataset contains raw input/output files for DFPT based gamma-point phonon and piezoelectric, dielectric calculations.
Computational scanning tunneling microscopy images
The dataset contains STM images for 2D materials.
Wannier Tight-binding Hamiltonians
The dataset contains STM Wannier tight binding Hamiltonians for 3D and 2D materials.
JARVIS-FF
Elastic constants data
The dataset contains elastic constants of materials with several potentials.
Surface and vacancy-formation energies data
The dataset contains vacancy formation energy of materials with several potentials.
JARVIS-ML
JARVIS-CFID descriptors datasets for JARVIS-DFT, MP, OQMD, QM9, AFLOW
The dataset contains CFID descriptor of materials with several properties.
Data-driven Discovery of 3D and 2D Thermoelectric Materials
The dataset contains thermoelectric properties of 3D and 2D materials
Table. A brief summary of datasets available in the JARVIS-DFT.
Material classes Numbers
3D-bulk 33482
2D-bulk 2293
1D-bulk 235
0D-bulk 413
2D-monolayer 1105
2D-bilayer 102
Molecules 12
Heterostructure 3
Total DFT calculated systems 37646
Table. A brief summary of functionals used in optimizing crystal geometry in the JARVIS-DFT.
Functionals Numbers
vdW-DF-OptB88 (OPT) 37646
vdW-DF-OptB86b (MK) 109
vdW-DF-OptPBE (OR) 111
PBE 99
LDA 92
Table. A brief summary of material-properties available in the JARVIS-DFT. The database is continuously expanding.
JARVIS-DFT Properties Numbers
Optimized crystal-structure (OPT) 37646
Formation-energy (OPT) 37646
Bandgap (OPT) 37646
Exfoliation energy (OPT) 819
Bandgap (TBmBJ) 15655
Bandgap (HSE06) 40
Bandgap (PBE0) 40
Bandgap (G0W0) 15
Bandgap (DMFT) 11
Frequency dependent dielectric tensor (OPT) 34045
Frequency dependent dielectric tensor (TBmBJ) 15655
Elastic-constants (OPT) 15500
Finite-difference phonons at Г-point (OPT) 15500
Work-function, electron-affinity (OPT) 1105
Theoretical solar-cell efficiency (SLME) (TBmBJ) 5097
Topological spin-orbit spillage (PBE+SOC) 11500
Wannier tight-binding Hamiltonians (PBE+SOC) 1771
Seebeck coefficient (OPT, BoltzTrap) 22190
Power factor (OPT, BoltzTrap) 22190
Effective mass (OPT, BoltzTrap) 22190
Magnetic moment (OPT) 37528
Piezoelectric constant (OPT, DFPT) 5015
Dielectric tensor (OPT, DFPT) 5015
Infrared intensity (OPT, DFPT) 5015
DFPT phonons at Г-point (OPT) 5015
Electric field gradient (OPT) 15187
Non-resonant Raman intensity (OPT, DFPT) 250
Scanning tunneling microscopy images (PBE+SOC) 770