GithubHelp home page GithubHelp logo

crispr_tools's Introduction

crispr_tools

Collection of tools for CRISPR-Cas9 genetic screen analysis.

Introduction

These scripts neatly pipeline read counting from FASTQ and wrap differential analysis with DrugZ and MAGeCK in batch.

Pipeline overview

CRISPR screen analysis using crispr_tools proceeds by the following steps:

  1. From pre-aligned FASTQ, determine read counts per protospacer and map them to the reference CRISPR library.
  2. Specify replicate-to-condition mappings in the experimental design and select differential methods to run.
  3. Run the analysis in batch.

Tutorial

A tutorial is provided in the doc directory.

Reading counts

The count_reads.py script handles read counting from pre-aligned FASTQ files and mapping to a reference library, if supplied.

count_reads.py [FASTQ_FILES] -s SLICE --merge-samples --just-go -p OUTDIR --library=LIB

The slice (SLICE) is range that specifies the position of the protospacer along the reads. It is supplied as zero-based half-open indices. FASTQ files must be pre-aligned and have a constant slice across all reads in all runs.

The slice can be determined by inspecting nucleotide frequencies as a function of cycle. It appears as a region of uniform probability across the four bases. The ntByCycle.R script in the exorcise package (https://github.com/SimonLammmm/exorcise/) can generate such nucleotide traces.

ntByCycle.R -f [FASTQ_FILE] -o OUTDIR

Specify --merge-samples if you have technical replicates. You'll know this is the case if you have FASTQ files with _L001 and _L002 or similar.

Specify the path of the reference library (LIB). It must have seq, gene, and guide columns, indicating protospacer sequence, gene symbol, and unique guide IDs, respectively.

Running the count_reads.py script generates in the OUTDIR raw counts per protospacer for each FASTQ. It will also generate counts per gene symbol if specified with a reference library.

Design matrix

Populate the Excel workbook with experimental details pertaining to the run.

In the “Experimental details” sheet, you must supply the analysis version and file prefix. Results will be saved at file_prefix/analysis_version/. All other fields are for your own convenience should you need to look at this analysis again in the future.

In the “Sample details” sheet, ensure that entries in the “Replicate” column correspond to column names in the counts file. Biological replicates should be given the same value in the “Sample” column. All other columns are optional but are here for your convenience.

In the “Control groups” sheet, ensure that control and test samples defined here match those in the “Sample” column of the “Sample details” sheet. In the “Group” column, use names to group different comparisons together. You can define different settings for each group in the “Analyses” sheet.

In the “Analyses” sheet, ensure that the “Control group” entries match those in the “Group” column of the “Control groups” sheet. For “Paired”, set TRUE if the replicates for the control and test samples with the same index are meaningfully paired; otherwise set FALSE. In “Method”, use drugz, mageck, or drugz,mageck according to which analyses you want to conduct. In “Counts file”, supply the absolute or relative path of the counts file. In “Add pseudocount”, use 5 unless you have a reason to use a different number. The other fields can be left blank.

Run batch differential analysis

This step is the last step and uses crispr_pipeline.py.

crispr_pipeline.py DET

Specify as a positional argument the experimental details Excel workbook. You will find in file_prefix/analysis_version the standard results files from a DrugZ and/or MAGeCK analysis, depending on your selection.

License

crispr_tools is open-source software and is licensed under a Creative Commons Zero v1.0 Universal license.

crispr_tools's People

Contributors

johncthomas avatar simonlammmm avatar

Watchers

 avatar

Forkers

simonlammmm

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.