Barcoded pdmH1N1 influenza virus single-cell sequencing

Single-cell sequencing of barcoded pdmH1N1 influenza virus; David Bacsik and Jesse Bloom.

Pre-print of results is titled Influenza virus transcription and progeny production are poorly correlated in single cells and is available at https://www.biorxiv.org/content/10.1101/2022.08.30.505828v1.

Repository version

A static version of the repository used to generate the figures in this pre-print is tagged at: https://github.com/jbloomlab/barcoded_flu_pdmH1N1/releases/tag/bioRxiv_v1.

Data availability

All data used in this study is available in GEO under accession number GSE214938.

Summary of workflow and results

The workflow for this project has two main steps. First, the Snakemake pipeline is run which takes raw sequencing data as input and generates a CSV containing information about viral transcription and progeny production in single influenza-infected cells. Then, the final_analysis.py.ipynb is run manually to visualize the results.

For a summary of the Snakemake pipeline, see the report.html file that is placed in the ./results/ subdirectory.

Organization of repository

This repository is organized as followed (based loosely on this example snakemake repository):

Snakefile is the snakemake file that runs the analysis.
environment.yml and environment_unpinned.yml give the version pinned and unpinned conda environment used to run the Snakemake pipeline.
config.yaml contains the configuration for the analysis.
cluster.yaml contains the cluster configuration for running tha analysis on the Fred Hutch cluster.
./rules/ contains snakemake rules.
./notebooks/ contains Jupyter notebooks that are run by Snakefile using the snakemake notebook functionality.
./scripts/ contains scripts used by Snakefile.
./pymodules/ contains Python modules with some functions used by Snakefile.
./report/ contains workflow description and captions used to create the snakemake report.
./data/ contains the input data, specifically:
- ./data/flu_sequences/ gives the flu sequences used in the experiment. See the README in that subdirectory for details.
- ./data/flu_sequences/pacbio_amplions gives the famplicon sequences generated for pacbio sequencing. See the README in that subdirectory for details.
./results/ is a created directory with all results, most of which are not tracked in this repository.
./results/figures/ contains the figures generated for the manuscript.
./results/viral_fastq10x/ contains two CSV files containing key processed data:
- integrate_data.csv contains viral transcription and genotype information for all cells in the dataset.
- complete_measurement_cells_data.csv contains progeny production information ,viral transcription information, and genotype information for the set of cells with complete sequencing and progeny production measurements.

Running the analysis

Installing software

The conda environment for the pipeline in this repo is specified in environment.yml; note also that an unpinned version of this environment is specified in environment_unpinned.yml. If you are on the Hutch cluster and set up to use the BloomLab conda installation, then this environment is already built and you can activate it simply with:

conda activate barcoded_flu_pdmH1N1

Otherwise you need to first build the conda environment from environment.yml and then activate it as above.

In addition to building and activating the conda environment, you also need to install cellranger and bcl2fastq into the current path; the current analysis uses cellranger version 4.0.0 and bcl2fastq version 2.20.

Run the Snakemake pipeline

Once the barcoded_flu_pdmH1N1 conda environment and other software have been activated, simply enter the commands to run Snakefile and then generate a snakemake report, at ./results/report.html. These commands with the configuration for the Fred Hutch cluster are in the shell script. run_Hutch_cluster.bash. You probably want to submit the script itself via sbatch, using:

sbatch run_Hutch_cluster.sbatch

Run the final analysis and generate plots

When the Snakeamke pipeline has run completely, the processed output data is exported to a CSV file at results/viral_fastq10x/{expt}_integrate_data.csv. A stable version of this file is available at https://github.com/jbloomlab/barcoded_flu_pdmH1N1/blob/main/results/viral_fastq10x/scProgenyProduction_trial3_integrate_data.csv and can be used to re-analyze the data without running the Snakemake pipeline.

In this repo, the CSV file is used to perform the final analysis and generate figures in the final_analysis.py.ipynb notebook. This notebook is run manually. This notebook must be run with the barcoded_flu_pdmH1N1_final_anlaysis conda environment activated.

To activate this environment, first build it from envs/barcoded_flu_pdmH1N1_final_analysis.yml and then activate it with:

conda activate barcoded_flu_pdmH1N1_final_analysis

Development

Linting the code

Ideally, before you a new branch is committed, you should run the linting in lint.bash with the command:

bash ./lint.bash

This script runs:

snakemake linting
a snakemake dry run
a flake8 analysis of the Python code
a flake8_nb analysis of the Jupyter notebooks.

For the Jupyter notebook linting, it may be easiest to lint while you are still developing notebook with run cells rather then before you put the empty notebook in ./notebooks/, as the linting results are labeled by cell run number.

chenddathku / barcoded_flu_pdmh1n1 Goto Github PK

barcoded_flu_pdmh1n1's Introduction

Barcoded pdmH1N1 influenza virus single-cell sequencing

Repository version

Data availability

Summary of workflow and results

Organization of repository

Running the analysis

Installing software

Run the Snakemake pipeline

Run the final analysis and generate plots

Development

Linting the code

barcoded_flu_pdmh1n1's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs