GithubHelp home page GithubHelp logo

irzamsarfraz / snakemake-workshop Goto Github PK

View Code? Open in Web Editor NEW

This project forked from brite-reu/snakemake-workshop

0.0 0.0 0.0 31 KB

Worklflow managers are good mkay

Shell 4.08% Python 82.02% R 13.90%

snakemake-workshop's Introduction

snakemake-workshop

This workshop should serve both as an introduction to Snakemake as well as git + GitHub.

readthedocs

Workshop Overview

In this workshop you will use snakemake to create a workflow to download, process, and plot single-cell data.

Getting Started

  1. Confirm that you have read through the pre-workshop material and are familiar with the following concepts (if not, please do this now):
  2. Log on to the SCC
    • Hint: Start an interactive node
  3. Fork the workshop repository to your own account: https://github.com/BRITE-REU/snakemake-workshop
  4. Run each line of the install script line-by-line on the command line
    • Check: Is the workshop environment activated? (try 'conda info')
  5. Make the pipeline work
    • Hint: Navigate to the Issues board on your forked repository. The issues are numbered in the order in which you should work on them. When you finish an issue, mark it as closed. You can then view your progress in the Projects board.
    • Hint: Use git to commit and push your changes as you work

Some Hints

  • To test your work as you go, we suggest you comment out all rules that follow the current rule you are implementing. Then when you call the Snakfile, specify which output file you want to create.

snakemake-workshop's People

Contributors

dakota-hawkins avatar ebriars avatar

snakemake-workshop's Issues

2. Implement the preprocess_data rule

Hint: This rule uses the preprocess.py step

This rule should do the following:

  • Take the output of the download_data rule as input
  • Specify parameters to filter the data such that the minimum number of cells is 3, the minimum number of genes is 200, the maximum percent of mitochondrial reads is 5%, and the number of highly variable genes to retain is 2000
  • Output a processed '.h5ad' file. You should put this in some type of data directory. (Hint: can you specify different directories for raw and processed data?)

4. Implement plot_clusters rule

Hint: This uses the plot_cells.R script

This rule should do the following to make plots of the clusters found in the single-cell data:

  • Take as input the three output files from the cluster_cells rule
  • Specify which attribute to color the data by (Hint: an attribute is usually a piece of metadata or categorical label for a data point. In this example, the attribute is the column titled "louvain")
  • Output a '.png' plot of clustered data

6. Generate a snakemake report

Now that the pipeline is working, add functionality to the snakefile to generate an html report. To do this you will need to:

  • Look at the snakemake documentation for the report feature
  • Add a report flag to output(s) you want to display in the report (e.g. a plot)
  • Specify categories for the output(s) in the report (i.e. sections)
  • Run snakemake with the report flag

Hint: snakemake documentatiion

3. Implement the cluster_cells rule

Hint: This uses the cluster_cells.py step

This rule should do the following to cluster the data and output it in an R-readable format:

  • Take the output of the preprocess_data rule as input
  • Specify parameters such that the number of clusters is 15 and the resolution is 1
  • Create three output files for the count matrix, cell metadata, and gene metadata. (Tip: think about directory structure when choosing where this clustered data should save to)

5. Create a config.yaml file to specify the parameters

Up until now, you have hardcoded the parameters needed into each rule. A more elegant solution is to have the Snakefile use a config.yaml file that specifies these parameters. In this step you should do the following:

  • Create a config.yaml file
  • Specify the parameters in the config.yaml file
  • Import the config.yaml file into the Snakefile
  • Have the Snakefile use the parameters specified in the config.yaml file

1. Implement download_data rule

Hint: This rule uses the script download_data.py

This rule should do the following:

  • Download the dataset 'pbmc3k'
  • Output a '.h5ad' file. You should output this file to some type of data directory

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.