PyRBP: A Python Framework for Reliable Identification and Characterization of High-Throughput RNA-Binding Protein Events
Release:
Source |
Changelog
Links:
Getting Started |
API Reference |
Examples
PyRBP is a Python library for quick generating characterization matrices, feature selection, models evaluation, feature analysis and performance visualization of circRNA or linear RNA sequence data. Currently, PyRBP includes more than 10 RNA sequence characterization methods, including three classes of characterization views: dynamic and static semantic information, RNA secondary structure information and RNA physicochemical properties.
PyRBP is constructed based on multiple RNA-RBP binding semantic models (RBPBERT, FastText, GloVe, Word2Vec, Doc2Vec) developed by ourselves. It provides four advanced features:
- ๐ Unified, easy-to-use APIs, detailed documentation and examples.
- ๐ Capable for out-of-the-box RNA-RBP binding event characterization and downstream experiments.
- ๐ Powerful, customizable performance and feature analysis visualizer.
- ๐ Full compatibility with other popular packages like scikit-learn and yellowbrick.
# Generate RNA physicochemical properties
from PyRBP.Features import generateBPFeatures
from PyRBP.featureSelection import cife
from PyRBP.metricsPlot import shap_interaction_scatter
bp_features = generateBPFeatures(sequences, PGKM=True)
# Filter the original features
refined_features = cife(bp_features, label, num_features=10)
# Performance visualization of SVM using PyRBP
clf = SVC(probability=True)
shap_interaction_scatter(refined_features, label, clf=clf, sample_size=(0, 100), feature_size=(0, 10), image_path='./')
It is recommended to use git for installation.
$ conda create -n PyRBP python=3.7.6 # create a virtual environment named PyRBP
$ conda activate PyRBP # activate the environment
$ git clone https://github.com/no-banana/PyRBP.git # clone this repository
$ cd PyRBP
$ pip install -r requirement.txt # install the dependencies of PyRBP
After this, the torch also needs to be installed separately according to the cuda version of your device, e.g. CUDA 10.2 can be used with the following command.
$ pip install torch==1.8.1+cu102 torchvision==0.9.1+cu102 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html
scikit-feature can be installed in a virtual environment with the following command
git clone https://github.com/jundongl/scikit-feature.git
cd scikit-feature
python setup.py install
The language models used in PyRBP can be downloaded from figshare
Note for OSX users: due to its use of OpenMP, glove-python-binary does not compile under Clang. To install it, you will need a reasonably recent version of gcc
(from Homebrew for instance). This should be picked up by setup.py
.
git clone https://github.com/maciejkula/glove-python.git
cd glove-python
python setup.py install
PyRBP requires following dependencies:
- Python (>=3.6)
- gensim (>=3.8.3)
- GloVe (>=0.2.0)
- numpy (>=1.19.5)
- pandas (>=1.3.5)
- scipy (>=0.19.1)
- joblib (>=0.11)
- scikit-learn (>=0.24.2)
- matplotlib (>=3.5.3)
- seaborn (>=0.11.2)
- shap (>=0.41.0)
- skfeature (>=1.0.0)
- tensorflow-gpu (>=2.4.0)
- torch (>=1.8.1)
- transformers (4.12.5)
- yellowbrick (>=1.3)
- tqdm (>=4.64.0)
- ๐ Unified, easy-to-use APIs
The functions in each module in PyRBP have individual unified APIs. - ๐ Extended functionalities, wider application scenarios.
PyRBP provides interfaces for conducting downstream RNA-RBP binding event experiments, including feature selection, model cross validation, feature and performance analysis visualization. - ๐ Detailed training log, quick intuitive visualization.
We provide additional parameters in characterization functions for users to control the window to capture information of different views they want to monitor during the sequence encoding. We also implement anmetricsPlot
to quickly visualize the results of feature analysis or model evaluation for providing further information/conducting comparison. See an example here. - ๐ Wide compatiblilty.
IMBENS is designed to be compatible with scikit-learn (sklearn) and also other projects like yellowbrick. Therefore, users can take advantage of various utilities from the sklearn community for cross validation or result visualization, etc.
Currently (v0.1.0, 2023/03), 13 RNA-RBP binding event characterization methods were implemented:
(Click to jump to the document page)
- RNA-RBP binding semantic based
- Dynamic global semantic information
- Static local semantic information
- [
FastText
] - [
GloVe
] - [
Word2Vec
] - [
Doc2Vec
]
- [
- RNA secondary structure based
- RNA physicochemical properties
Note:
PyRBP
is still under development, please see API reference for the latest list.
Thank you for using PyRBP! Any questions, suggestions or advices are welcome.
email address:[email protected], [email protected]