GithubHelp home page GithubHelp logo

sp8rks / materialsinformatics Goto Github PK

View Code? Open in Web Editor NEW
87.0 8.0 25.0 215.65 MB

MSE5540/6640 Materials Informatics course at the University of Utah

License: MIT License

Jupyter Notebook 99.93% Python 0.07%

materialsinformatics's Introduction

MaterialsInformatics

MSE5540/6640 Materials Informatics course at the University of Utah

This github repo contains coursework content such as class slides, code notebooks, homework assignments, literature, and more for MSE 5540/6640 "Materials Informatics" taught at the University of Utah in the Materials Science & Engineering department.

Below you'll find the approximate calendar for Spring 2024 and videos of the lectures are being placed on the following YouTube playlist https://youtube.com/playlist?list=PLL0SWcFqypCl4lrzk1dMWwTUrzQZFt7y0

My Image

month day Subject to cover Assignment Link
Jan 9 Syllabus. What is machine learning? How are materials discovered? Install software packages together in class
Jan 11 Machine Learning vs Materials Informatics, In class example of fitting Hall-Petch data with linear model Read 5 High Impact Research Areas in ML for MSE (paper1), Read ISLP Chapter 3, but especially Section 3.1 paper1, ISLP
Jan 16 Materials data repositories, get pymatgen running for everybody, examples of MP API, MDF, NOMAD, others Create a new env and make sure you can get the notebooks in the "worked examples/MP_API_example" and "worked examples/foundry" folders running. Materials Project API
Jan 18 Machine Learning Tasks and Types, Featurization in ML, Composition-based feature vector Read Is domain knowledge necessary for MI (paper1). Make sure you can get the CBFV_example notebook running in the ""worked examples/CBFV_example" folder paper1
Jan 23 Classification and cross-validation Read ISLP Sections 4.1-4.5 and Section 5.1. Run through classification notebook ISLP
Jan 25 Structure-based feature vector, crystal graph networks, SMILES vs SELFIES, 2pt statistics read selfies (paper1), two-point statistics (paper2) and intro to graph networks (blog1) paper1, paper2, blog1
Jan 30 Simple linear/nonlinear models. test/train/validation/metrics Read linear vs non-linear (blog1), read best practices (paper1), benchmark dataset (paper2), and loco-cv (paper3). blog1, paper1, paper2, paper3
Feb 1 in-class examples of featurization Run through 2pt statistics, GridRDF, CBFV notebooks HW1 due!
Feb 6 ensemble models, ensemble learning Read ensemble (blog1), and ensemble learning (paper1) blog1, paper1
Feb 8 Extrapolation, support vector machines, clustering Read extrapolation to extraordinary materials (paper1), clustering (blog1) , SVMs (blog2) paper1, blog1, blog2
Feb 13 Artificial neural networks Read the introduction to neural networks (blog1, blog2) blog1, blog2
Feb 15 Advanced deep learning (CNNs, RNNs) HW2 due. Read… blog1, blog2
Feb 20 Transformers Read the introduction to transformers (blog1, blog2) blog1, blog2
Feb 22 Generative ML: Generative Adversarial Networks and variational autoencoders Read about VAEs (blog1, blog2, repo1) and GANS () blog1, blog2, repo1
Feb 27 Diffusion models and Image segmentation Read U-net (paper1) and nuclear forensics (paper2) CrysTens repo
Feb 29 Image segmentation part 2 and in-class coding examples Download CrysTens github repo, read Segment Anything Model (paper 3) paper1, paper2, paper3
Mar 5 NO CLASS, spring break
Mar 7 No CLASS, spring break
Mar 12 Bayesian Inference Read the introduction to Bayesian (blog1), go through Naive Bayes notebook blog1
Mar 14 Gaussian Processes and Bayesian Optimization
Mar 19 Case study: Superhard materials, structure prediction Read superhard (paper1), and structure prediction papers (paper2) paper1, paper2
Mar 21 Case study: CGCNN vs MEGNET vs SchNET Read CGCNN (paper1), MegNET (paper2), SchNET (paper3) paper1, paper2, paper3
Mar 26 Case study: CrabNET vs Roost Read CrabNet (paper1) and Roost (paper2) paper1, paper2
Mar 28 Case study: Cococrab, BRDA HW4 due. Read Cococrab (paper1) and BRDA (paper2) paper1, paper2
Apr 2 Large Language Models part 1 TBD TBD
Apr 4 Large Language Models part 2 TBD TBD
Apr 9 Case study: Element Mover’s Distance, Mat2Vec Read Element mover’s distance (paper1) and Mat2Vec (paper2) paper1, paper2
Apr 11 Case study: Discover algorithm, Robocrystallographer TBD TBD
Apr 16 Final project presentation day 1 Final Project due
Apr 18 Final project presentation day 2 Final Project due

I can recommend the book Introduction to Machine Learning found here https://www.statlearning.com/

materialsinformatics's People

Contributors

andrewfalkowski avatar sgbaird avatar sp8rks avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

materialsinformatics's Issues

HW2, task 2: supposed to say "using the arbitrary cut-off value of 10−2 Ω cm for *electrical resistivity*?"

Says conductivity

(b) task 2. Using a support vector machine classifier and the composition-based feature vector (magpie descriptor set), construct a model that will categorize materials as metals or insulators using the arbitrary cut-off value of 10−2 Ω cm for electrical conductivity. The model should take chemical formula as an input and take into account temperature as a feature (300, 400, 700, and 1000K).

Suggestion for hw 1, pb 1 next time course is taught: use of Zotero shared group folder and shared annotations for literature extraction

To help with making the literature data extraction FAIR, going with a Zotero group library where PDF files, annotations, etc. are shared between group members can help out by:

  1. keeping all the references in one place
  2. highlights/annotations are shared and are easily made searchable, plus "click link to go directly to annotation" (makes curation easier)
  3. easy to make the list of references public, and the copyrighted files along with the annotations can be easily shared upon request

(planning to update later with some examples/images)

See also: https://github.com/sparks-baird/auto-paper#annotations

HW 1 general suggestions - `webplotdigitizer` and `MPRester` tips

Problem 1

  • when you have variables in the chemical formula, pay extra attention to which formulas correspond to which values of x. For example, in some plots, x=0 --> x=1.0 might go from the top to the bottom, whereas in others x=0 starts from the bottom.

  • pay attention to units, e.g. Kelvin vs. Celsius, 10^4 S/m vs. S/m vs. S/cm, make sure units are converted correctly based on what's listed on the spreadsheet e.g. electrical conductivity: S*cm^-1.

  • I found it useful to add all images (where the image includes figure caption) to a single session in webplotdigitizer, and use "Point Groups" corresponding to each of the chemical formulas if grabbing multiple traces from an image. Additionally, if multiple types of data were in the same figure, I made copies of each figure and named them e.g. fig5-electrical-conductivity.png, fig5-thermal-conductivity.png ... even though they're the exact same figure. This makes it easier to retain the caption and separate calibrations for each dataset. Rename your dataset appropriately, e.g. electrical-conductivity to make it easier to keep track and so that when you export the CSV it auto-populates the name.

  • a trick to using "Point Groups" (not the crystallographic kind) is to add a group for each composition (e.g. Cu0.98GaTe2, Cu0.985GaTe2, Cu0.99GaTe2, CuGaTe2) and then select the points in order for a given temperature. For example, click points in the following order:

    1. Cu0.98GaTe2@300K
    2. Cu0.985GaTe2@300K
    3. Cu0.99GaTe2@300K
    4. CuGaTe2@300K
    5. Cu0.98GaTe2@400K
    6. Cu0.985GaTe2@400K
    ...
    ...
    16. CuGaTe2@800K

Then click on your dataset, View Data, and Sort By --> Groups (dropdown). You can also export to CSV from this interface.

  • I suggest saving your images, raw CSV data, and your webplotdigitizer project (JSON and TAR format) data organized into folders based on the article, or at least save a copy of your data somewhere other than Google Sheets (e.g. your local computer) for data redundancy.

Problem 2

One of the best resources for getting an intro to MPRester is via the Materials Project workshop tutorial.

On YouTube, there is Taylor's prerecorded lecture and (what I'm pretty sure is) the corresponding video for the workshop tutorial mentioned above.

In addition to the customized examples given by Taylor in this repository, here are some additional examples "in practice" at RoboCrab (archived repo) and mat_discover.

Task 5

grp_df.hist("count", bins=100, log=True)

In this case, the large majority of compounds have fewer than 20 polytopes, but there is one chemical formula with 200 repeats!?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.