GithubHelp home page GithubHelp logo

alexsweeten / rule-based-learning Goto Github PK

View Code? Open in Web Editor NEW
4.0 2.0 1.0 1.18 MB

Repo for the hackseq19 project "Rule Based Learning for Transcriptional Regulation"

Jupyter Notebook 41.93% Python 58.07%

rule-based-learning's Introduction

Rule Based Learning for Transcriptional Regulation

This is the GitHub repo for the hackseq19 project: Rule Based Learning for Transcriptional Regulation!

Rationale

Gene regulatory sites, such as Transcription Factor Binding Sites (TFBS) and Promoters, are extremely important regions within both eukaryotic and prokaryotic genomes. Predicting whether or not a site acts as a regulatory element is an important, yet surprisingly difficult task. There has been a lot of focus in recent years towards building machine learning (ML) approaches for automatically detecting these genomic regions. In this hackathon, we hope to experiment with some of these tools.

Goals

Our goals during hackseq19 are to:

  • a) Build an accurate classifier for a given gene regulation dataset.
  • b) Build an interpretable classifier that outputs useful rules, describing each dataset.

We will experiment with many different classifiers, including decision trees, random forests, support vector machines, and neural networks. Accuracy is measured using F1 score, which we can visualize on our leaderboard (see below). Interpretability is measured by how clearly we can deduce rules from our dataset. An example rule:

IF Position[2] == "G" AND Position[3] == "C" THEN Class == "TFBS"

Data

Our leaderboard page is available here. You are required to sign in using your Google account. Once signed in, you can choose your username and submit files to the leaderboard. The leaderboard is based on a hacked version of my Natural Language Processing course professor's website.

Datasets:

  • 1 Human Chromosome #1 TFBS
  • 1 Ecoli K12 TFBS
  • 2 Ecoli K12 Promoter Region
  • 1 Pokemon

These come from a variety of sources, including gene regulation databases and previous Kaggle competitions.

Results

The following graph represents our progress improving classifier accuracy over the course of hackseq19. x-axis is measure in hours of time since the start of our hackathon, y-axis is measure in terms of F1 Score. We annotated times when we noticeably improved our position on the leaderboard. Dashed lines represent our "oracle", representing the highest recorded accuracy in the literature. As you can see, we beat the oracle score for Huamn SP1 TFBS!

Team Members

Team Lead:
Alex Sweeten

Participants:
Aris Grout

Chahat Upreti

Jade Chen

Kate Gibson

Oriol Fornes

Priyanka Mishra

Shawn Hsueh

Zakhar Krekhno

rule-based-learning's People

Contributors

alexsweeten avatar kfgibson avatar oriolfornes avatar shawnhsueh avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

shawnhsueh

rule-based-learning's Issues

Peer review of README.md

Your readme looks great! Some details about the methods by which you are processing the data and getting your end-result would be appreciated, but overall it is fantastic. Very visually pleasing - looking forward to seeing the end result of your project!

Viz

The second deliverable for hackseq19 is to present our project for 2 minutes. A cool visualization of our project would.

We should assign 1 or 2 people to work on the visualization side of things.

Add to the README

One of our deliverables for hackseq19 is to "peer-review" other projects README. We should make sure that ours is in top shape :)

1 person should be in charge of posting our progress on the README. Another should volunteer to review other project README's.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.