GithubHelp home page GithubHelp logo

tb-tad's Introduction

TB-TAD

Collaborators

Project goals and descriptions

Our goal is to create an unsupervised learning algorithm to detect topologically associating domains (TADs) in chromosomes. TADs are regions with localized highly interacting units of heredity known as genes and have been extensively researched over the last 5 years. Furthermore, sub units in these regions tend to interact more frequently with each other than neighboring regions, thus demarcating boundary lines. Data is generated using chromatin conformation capture (3C) and is stored as matrixes of interaction counts. With this data, the spatial organization of chromosomes can be investigated.

There are a number of published tools already. ClusterTAD has Java and MATLAB implementations to detected TADs through an unsupervised method. PredTAD classifies regions as boundary or non boundary regions by looking at intra-chromosomal and epigenetic data, such as where methylation or protein interactions occur. Using epigenetic markers also helps to expand available inputs (add data from multiple experiments) to train the model, instead of relying entirely on interaction counts found from one single experiment. We think the latter paper will be a good place to start; they use a gradient boosting machine to classify regions as boundary or non boundary areas, which demarcates where TADs exist. In their results, they successfully identify TADs and compete with other models that use Bayesian Additive Regression Trees and Position Specific Linear Models. Additionally, they are able to determine which features are most important for determining TADs in cancer data.

Our goal is to reimplement this approach and compare our results against other published models. Our metrics of success will include finding TAD boundaries and identifying top features to determine boundaries. We will also use Random Forests to classify chromosomal regions as boundary or non-boundary of TADs. We will use data from GEO and ENCODE databases to train our model, which will include contact matrices and epigentic markers. We then hope to use actual data collected by our mentor in a separate project to see how well it predicts TADs.

tb-tad's People

Contributors

bgpalmer avatar tcollins2011 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.