GithubHelp home page GithubHelp logo

santiagoahl / rna-taxonomy-prediction Goto Github PK

View Code? Open in Web Editor NEW
3.0 2.0 0.0 69.85 MB

We worked with an open csv-dataset which consist on RNA sequences with several taxonomies. Using python we were able to create an XGBoost model that classifies that sequence into 1 of 19 differents taxonomies. We also worked with Markov chains in order to treat the data.

Python 0.19% Jupyter Notebook 99.81%
rna-seq random-forest markov-chain scikit-learn cross-validation gridsearchcv machine-learning normalization xgboost smote

rna-taxonomy-prediction's Introduction


WHR
RNA Taxonomy Classification

An XGBoost Multiclass classifier built in scikit-learn using Markov Chains.

scikit-learn Numpy joblib json

Key FeaturesHow To UseCreditsLicense

screenshot

Key Features

  • This machine learning model takes a RNA sequence and predicts what class does it belong to. Classes are taken as taxonomies. The avaible taxonomies are the following 19:

    • Orthomyxoviridae
    • Rhabdoviridae
    • Arteriviridae
    • Coronaviridae
    • Reoviridae
    • Caliciviridae
    • Phenuiviridae
    • Hantaviridae
    • Picornaviridae
    • Betaflexiviridae
    • Astroviridae
    • Closteroviridae
    • Flaviviridae
    • Potyviridae
    • Retroviridae
    • Togaviridae
    • Paramyxoviridae
    • Hepeviridae
    • Pneumoviridae
  • Before Prediction the model computes a markov chain whose states are the 64 writeable codons with the nucleoids A, C, G, T and then computes metrics over its adjacent associated matrix: 8 of them are matricial norms and the missing 10 parameters are the first eigenvalues complex norms ascending ordered. Namely:

    • Frobenius Norm
    • Nuclear Norm
    • Infty Norm
    • Neg Infty Norm
    • Neg L1 Norm
    • L1 Norm
    • Neg L2 Norm
    • L2 Norm
    • eig 1
    • eig 2
    • eig 3
    • eig 4
    • eig 5
    • eig 6
    • eig 7
    • eig 8
    • eig 9
    • eig 10

With these new metrics, we built a new dataset. and we found this scatter plot: screenshot

  • We implemented a Random Forest model whose train data is taken from the new dataset. screenshot
  • We archieved a 96.9% of F1 score on validation set.
  • The confusion matrix is the following

screenshot

  • The learning curve is the following screenshot

How To Use

To clone and run this application, follow these steps

# Clone this repository
$ git clone https://github.com/santiagoahl/rna-taxonomy-prediction.git

# Go into the repository
$ cd rna-taxonomy-prediction

# Go to jupyter notebooks
$ jupyter-notebook

# Run the Libraries & Modules cell
# Run the Model Import cell

Credits

This software uses the following packages:

License

MIT


Web Site santiagoal.super.site  ·  GitHub @santiagoahl  ·  Twitter @sahumadaloz

rna-taxonomy-prediction's People

Contributors

santiagoahl avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.