NanoSwe is a preliminary analysis toolkit for experiments that involve sequencing data from ONT's PromethION device. It has also been used for other long-read SweGen data (e.g. PacBio).
Purpose | Program |
---|---|
Quality Control | NanoPlot for QC and NanoComp |
Mapping to the reference | Minimap2-2.14 |
Sorting, Indexing, and calculating statistics | Samtools 1.9 |
Subsampling | Sambamba 0.7.1 |
BAM QC Statistics | Qualimap 2.2.1 |
Structural Variant Calling | Sniffles 1.0.10 |
Data Extraction (VCF Files only) | bcftools 1.9 |
Finding intersection in genomic regions | Survivor 1.0.7 |
Evaluation of SVs | Survivor 1.0.7 and surpyvor: 0.5.0 |
Removing control DNA sequences | NanoLyse |
Trimming Short Reads | BBMap/BBTools |
Homology Detection | Blast 2.7.1+ |
Data Visualisation | R version 3.5.3. See the scripts directory for information on libraries/packages used. |
Example tree structure of nanopore sequencing data files
βββ /basecalled/<sample>/<flowcell>/
β βββ fastq_0.fastq
β βββ fastq_850.fastq
β βββ sequencing_summary_0.txt
β βββ sequencing_summary_850.txt
β βββ reads (1)
β βββ 0 (2)
β β βββ file_read_1_ch_90_strand.fast5
β β βββ file_read_41_ch_40_strand2.fast5
β β βββ file_read_300_ch_40_strand2.fast5
β βββ 850
β βββ file_read_1000_ch_200_strand.fast5
β βββ file_read_9000_ch_100_strand.fast5
β βββ file_read_95000_ch_1000_strand2.fast5
βββ /bin/
(1) Each folder contains ~8000 fast5 files
(2) fast5 file named e.g. PCT0001_YYYYMMDD_0001A20B002222C_{flowcell}_sequencing_run_{library_full_name}__read_{number}_ch_{number}_strand.fast5)
Example tree structure of data organisation
βββ /basecalled/<sample>/<flowcell>/
β βββ FASTQ_files
β β βββ fastq_0.fastq
β β βββ fastq_850.fastq
β βββ sequencing_summary
β β βββ sequencing_summary_0.txt
β β βββ sequencing_summary_850.txt
β βββ reads *
β β βββ 0 *
β β β βββ file_read_1_ch_90_strand.fast5
β β β βββ file_read_41_ch_40_strand2.fast5
β β β βββ file_read_300_ch_40_strand2.fast5
β β βββ 850
β β βββ file_read_1000_ch_200_strand.fast5
β β βββ file_read_9000_ch_100_strand.fast5
β β βββ file_read_95000_ch_1000_strand2.fast5
β βββ <sample>_analysis
β βββ reference_genome.fna
| βββ reference_genome.fna.fai
β βββ Snakefile
β βββ /bam_files/
β βββ /vcf_files/
β βββ /logs/
βββ /bin/
./scRipts
- R scripts created for visulisation of long read data.
commands.md
- Tool commands used for different analyses.
- SweGen: a whole-genome data resource of genetic variability in a cross-section of the Swedish population
- De novo Assembly of Two Swedish Genomes Reveals Missing Segments from the Human GRCh38 Reference and Improves Variant Calling of Population-Scale Sequencing Data.
- Multi-platform discovery of haplotype-resolved structural variation in human genomes
- Which human reference genome to use?
- Structural variants identified by Oxford Nanopore PromethION sequencing of the human genome
- The thesis
- Evaluating nanopore sequencing data processing pipelines for structural variation identification
If you plan to use repository as a guide, simply and kindly mention the link https://github.com/Nazeeefa/NanoSwe for acknowledgment. To cite our publication, you can cite it as as shown below otherwise visit citeas.org to choose a different format. Thank you.
Fatima N, Petri A, Gyllensten U, Feuk L, Ameur A. Evaluation of Single-Molecule Sequencing Technologies for Structural Variant Detection in Two Swedish Human Genomes. Genes. 2020; 11(12):1444.
Fatima, Nazeefa; Petri, Anna; Gyllensten, Ulf; Feuk, Lars; Ameur, Adam. 2020. "Evaluation of Single-Molecule Sequencing Technologies for Structural Variant Detection in Two Swedish Human Genomes." Genes 11, no. 12: 1444.