Text-Classification-Based-Approach-for-Evaluating-and-Enhancing-Machine-Interpretability-of-Building
--author: zhengzhe
--date: 2022.10.26
- Description: Code and dataset for the paper named "Text Classification-Based Approach for Evaluating and Enhancing Machine Interpretability of Building Codes".
- This is a Pytorch-based Bert Chinese text classification approach.
- Bert models thanks to Bert-Chinese-Text-Classification-Pytorch
- Other models thanks to Chinese-Text-Classification-Pytorch
python 3.7
torch 1.12.1+cu116
boto3 1.24.28
matplotlib 3.5.3
tqdm
sklearn
tensorboardX
- Description: Chinese rule dataset including seven categories are established to classify the interpretability level of each rule in a building code
- The original labeled dataset can be found in CivilRules/dataset
- The training, validation, and test dataset can be found in CivilRules/data
Category | Definition | Interpretability |
---|---|---|
direct | The required information is explicitly available from the BIM model | Easy |
indirect | The required information is implicitly stored in the BIM model. A set of derivations and calculations should be performed. | Easy |
method | An extended data structure and domain-specific knowledge are required. | Medium |
reference | The external information, including pictures, formulas, tables, and other rules or appendices in the current code or other codes, is required. | Medium |
general | The rules provide macro design guidance. | Hard |
term | The rules define the terms used in the codes. | Hard |
other | The rules do not belong to the above six categories. | Hard |
model | Weighted F1 score |
---|---|
TextCNN | 86.3% |
TextRNN | 72.2% |
TextRNN-Att | 81.5% |
Transformers | 74.0% |
Bert | 88.04% |
RuleBERT | 93.68% |
- The original Bert model can be found in google drive
- Please put the original Bert model in ./bert_pretrain
- The further pretrained domain-specific Bert model (RuleBERT) can be found in google drive
- Please put the RuleBERT model in ./bert_pretraindc
- The well trained BERT models (.ckpt files) can be found in google drive
- Please put these models in ./CivilRules/save_dict
- The well-trained models (TextCNN, TextRNN, TextRNN-Att, Transformers) can be found in google drive
- Reproduce the result can use the code from Chinese-Text-Classification-Pytorch
- assert the bert models and the finetune models have been put into the right place
- put test dataset (test.txt) in to ./CivilRules/data
# validate the bert model weighted F1 score
python test.py --model bert
# validate the RuleBERT model weighted F1 score
python test.py --model bertDC
- prepare your own test dataset in to ./CivilRules/data
- modify the dataset, learning_rates, batch_sizes in grid_search.py
# to finetune bert model
python grid_search.py --model bert
# to finetune RuleBERT model
python grid_search.py --model bertDC
- prepare your own prediction dataset (predict.txt) and named it to dev.txt, and then put it in to ./CivilRules/data
- modify the dataset in application.py
- prepare well-trained bert model in to ./CivilRules/save_dict
python application.py --model bert
python application.py --model bertDC
- the result will be saved in ./CivilRules/predict