GithubHelp home page GithubHelp logo

greenelab / computational-reagents Goto Github PK

View Code? Open in Web Editor NEW
2.0 6.0 7.0 1.22 MB

Rigor, Reproducibility, Transparency, and Reagent Validity for Computational Biologists

License: Creative Commons Attribution 4.0 International

HTML 100.00%
slides continuous-integration reproducibility discussion

computational-reagents's Introduction

Reagent validation in computational biology

Introduction to this repository

I was asked to put together a one hour guided discussion of reproducibility and transparency in the computational sciences for Penn GCB students. The resulting presentation uses remark and is visible wherever the web is available. The work is licensed CC-BY. Feel free to contribute any improvements via GitHub issues.

How to modify

This document is HTML with Markdown inside it. If you want to make changes - for example maybe you want to add a new reason to a list - find the appropriate slide markdown inside index.html and edit it. If you've never made a pull request before, GitHub has some documentation on the process.

Here's a screenshot of where you might want to edit: Where to edit

computational-reagents's People

Contributors

apexamodi avatar bemert avatar cgreene avatar dhimmel avatar gwaybio avatar oryoruk avatar sklasfeld avatar zz327 avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

computational-reagents's Issues

KnitR/Jupyter

Knitr and IPython notebook allow scientists to compose documents with inline code and figures.

https://rpubs.com/marschmi/105639
https://www.r-bloggers.com/reproducible-research-training-wheels-and-knitr/
http://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/what_is_jupyter.html

These tools can help computational scientists write more readable and reproducible code as well as more transparent figures/results. By no means are these tools sufficient for reproducible computational biology research.

benchling.com

Benchling is an online lab notebook specifically made for people doing biology. I like it because it is the first electronic lab notebook (of all the ones I previously tried) that I can add code to and that people who do not code also like to use. It also connects to my Google account which is nice. You can share your entries with collaborators and it has a function to make a lab group. Sometimes the formatting can be weird and the protocol section is limited to only two bullet points deep so I use the notebook for everything. Since it is a new site, the people who work there are very quick to help if you have any questions. However, one time they messaged me because they noticed that I was having formatting issues which is creepy.

Paper examples for discussion

Share the papers that you've identified as relevant to our reproducibility discussion here. You do not need to note in this issue which ones fall into each category.

GitHub:)

GitHub can help with:

  • transparency
  • e.g. making scripts, codes public in a public github repo.
  • reproducibility
  • e.g. having the commit history can others to use the exact version of a program in reproducing an analysis.

It addresses some of the issues with transparency and reproducibility, but not all so it's not sufficient.

Docker

Docker is a tool to containerize and version an entire base compute environment. It acts as a virtual machine that can be shipped to users in conjunction with code/software to ensure that the underlying base image is consistent.

It is not sufficient to ensure reproducibility however, since often times the image will require external packages that need to be downloaded with each pipeline run. It is extremely helpful however and can be used with any analysis across operating systems.

Bioconductor

Bioconductor provides tools for the analysis and comprehension of high-throughput genomic data.
It helps - Everyone could download and re-run the program in R language;
- It usually comes with test example and Reference Manual
- Anybody who had problem with those package can submit a report and Bioconductor would directly contact the author of the package
However, not all of the code are open source, which is quite disappointing.

Continuous Analysis

I think that Continuous Analysis (thanks Brett and Casey!) is helpful, necessary, but not sufficient for reproducing computational biology experiments. It addresses issues like version control/development environment by making one researcher's computational environment easy to transfer between different researchers.

http://biorxiv.org/content/early/2016/08/11/056473

CoGe (genomevolution.org)

This is a website for genome visualization (and phylogenetic analysis). It contains a JBrowse to visualize genomic and epigenomic features. Users can import their own genomes and other data files as well as share them. However, there are still a lot of bugs to work out. For example, there may be redundant data which could cause confusion. Scientists could map their data to one genome version on CoGe and others could map their data to another version on Coge. However, they may actually be the same genome uploaded from different users. There is also no requirements for information one must post about the data they are uploading. Therefore, people could upload data that may not be reproducible.

Galaxy

Galaxy is an online tool that can perform analyses on large genomic data sets. Various genomic tools are integrated into the platform and the parameters used to run each tool can be shared and saved allowing for greater reproducibility and transparency.

The tool is helpful as an introduction to doing large scale genomic analyses, however you are limited by the capacity of the server in terms of how much data and how fast analyses can be run. Additionally parameters in certain tools available on Galaxy cannot be fully customized.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.