GithubHelp home page GithubHelp logo

cedergrouphub / matbert-synthesis-classifier Goto Github PK

View Code? Open in Web Editor NEW
4.0 3.0 3.0 31 KB

A synthesis paragraph classifier built by fine-tuning MatBERT

License: MIT License

Python 94.91% JavaScript 4.40% Dockerfile 0.69%
classifier materials-science bert-model nlp

matbert-synthesis-classifier's Introduction

MatBERT synthesis classifier

A synthesis paragraph classifier built by fine-tuning MatBERT. Built by the text-mining team at CEDER group.

Usage

from synthesis_classifier import get_model, get_tokenizer, run_batch

model = get_model()
tokenizer = get_tokenizer()

paragraphs = [...]

for batch in batches:
    result = run_batch(batch, model, tokenizer)
    print(result)

# Output:
# [{
# 'text': '10.1063/1.3676216: The raw materials were BaCO3, ZnO, Nb2O5, 
# and Ta2O5 powders with purity of more than 99.5%. Ba[Zn1/3 (Nb1−xTax)2/3]O3 
# (BZNT, x\u2009=\u20090.0, 0.2, 0.4, 0.6, 0.8, 1.0) solid solutions were 
# synthesized by conventional solid-state sintering technique. Oxide compounds 
# were mixed for 12\u2009h in polyethylene jars with zirconia balls and then 
# dried and calcined at 1100 °C for 2\u2009h. After remilling, the powders were 
# dried and pressed into discs of 15\u2009mm\u2009×\u20091\u2009mm and next 
# sintered at 1500 °C for 3\u2009h.', 
# 
# 'tokens': ['10', '.', '106', '##3', '/', '1', '.', '367', '##62', '##16', ':', 
# 'The', 'raw', 'materials', 'were', 'Ba', '##CO3', ',', 'ZnO', ',', 'Nb2O5', ',', 
# 'and', 'Ta2O5', 'powders', 'with', 'purity', 'of', 'more', 'than', '99', '.', 
# '5', '%', '.', 'Ba', '[', 'Zn', '##1', '/', '3', '(', 'Nb', '##1−x', '##Ta', 
# '##x', ')', '2', '/', '3', ']', 'O3', '(', 'BZ', '##NT', ',', 'x', '=', '0', 
# '.', '0', ',', '0', '.', '2', ',', '0', '.', '4', ',', '0', '.', '6', ',', '0', 
# '.', '8', ',', '1', '.', '0', ')', 'solid', 'solutions', 'were', 'synthesized', 
# 'by', 'conventional', 'solid', '-', 'state', 'sintering', 'technique', '.', 
# 'Oxid', '##e', 'compounds', 'were', 'mixed', 'for', '12', 'h', 'in', 'polyethylene', 
# 'j', '##ars', 'with', 'zirconia', 'balls', 'and', 'then', 'dried', 'and', 'calcined', 
# 'at', '1100', '°C', 'for', '2', 'h', '.', 'After', 'rem', '##illing', ',', 'the', 
# 'powders', 'were', 'dried', 'and', 'pressed', 'into', 'discs', 'of', '15', 'mm', 
# '×', '1', 'mm', 'and', 'next', 'sintered', 'at', '1500', '°C', 'for', '3', 'h', '.'], 
#
# 'scores': {
# 'solid_state_ceramic_synthesis': 0.9992626309394836, 
# 'sol_gel_ceramic_synthesis': 0.00024707740521989763, 
# 'hydrothermal_ceramic_synthesis': 8.356467151315883e-05, 
# 'precipitation_ceramic_synthesis': 8.224111661547795e-05, 
# 'something_else': 0.00032462377566844225}}

Citing

While we are working on a new paper, it's always nice to cite our previous paper:

@article{huo2019semi,
  title={Semi-supervised machine-learning classification of materials synthesis procedures},
  author={Huo, Haoyan and Rong, Ziqin and Kononova, Olga and Sun, Wenhao and Botari, Tiago and He, Tanjin and Tshitoyan, Vahe and Ceder, Gerbrand},
  journal={npj Computational Materials},
  volume={5},
  number={1},
  pages={1--7},
  year={2019},
  publisher={Nature Publishing Group}
}

matbert-synthesis-classifier's People

Contributors

hhaoyan avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

matbert-synthesis-classifier's Issues

Document

Is there any documentation or paper available for the matbert classifier, something like "Semi-supervised machine-learning classification of materials synthesis procedures" that is available for LDA classifier.

Issues about the running of "predict_json.py" in example file

When I ran this program, I had the following problem:
Traceback (most recent call last):
File "/home/dicp2020/Desktop/BERT-project/MatBERT-synthesis-classifier/examples/predict_json.py", line 20, in
result = run_batch(batch, model, tokenizer)
File "/home/dicp2020/anaconda3/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 26, in decorate_context
return func(*args, **kwargs)
File "/home/dicp2020/anaconda3/lib/python3.7/site-packages/synthesis_classifier-0.0.1-py3.7.egg/synthesis_classifier/pclassifier.py", line 67, in run_batch
scores = get_classification_scores(outputs)
File "/home/dicp2020/anaconda3/lib/python3.7/site-packages/synthesis_classifier-0.0.1-py3.7.egg/synthesis_classifier/pclassifier.py", line 34, in get_classification_scores
outputs = model_output.cpu().numpy()
AttributeError: 'str' object has no attribute 'cpu'

I am looking forward to your reply, thank you very much!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.