This repo contains the source code and dataset for the following papers:
- Ling Luo, Zhihao Yang, Lei Wang, Yin Zhang, Hongfei Lin and Jian Wang. KeSACNN: a protein-protein interaction article classification approach based on deep neural network. International Journal of Data Mining and Bioinformatics, 2019, 22(2): 131-148.
- Ling Luo, Zhihao Yang, Lei Wang, Yin Zhang, Hongfei Lin, Jian Wang, Liang Yang, Kan Xu and Yijia Zhang. Protein-Protein Interaction Article Classification: A Knowledge-enriched Self-Attention Convolutional Neural Network Approach. Procceding of 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2018.
KeSACNN uses the following dependencies:
-
corpus
- BioCreative II corpus
- BioCreative III corpus
-
src
- attention_keras.py: the self-attention layer
- BiLSTM.py: a BiLSTM baseline
- CNN-Kim.py: a CNN baseline proposed by Kim
- SACNN.py: Our Self-Attention CNN model
- KeSACNN-arc1.py: Our KeSACNN-I model
- KeSACNN-arc2.py: Our KeSACNN-2 model
The trained feature embeddings can be downloaded from https://www.kaggle.com/lingluodlut/kesacnn.
To train a basic SACNN model, you need to provide the file of the training set, testing set and word embedding, and run the SACNN.py script:
python SACNN.py
To train our KeSACNN-I model, you need to provide the file of the training set, testing set, word embedding, BEN embedding and CUI embedding, and run the KeSACNN-arc1.py script:
python KeSACNN-arc1.py
To train our KeSACNN-II model, you need to provide the file of the training set, testing set, word embedding, BEN embedding and CUI embedding, and run the KeSACNN-arc2.py script:
python KeSACNN-arc2.py