This workflow aligns sequence data provided as fastq files against a genomic reference using bwa (burrows-wheeler-aligner). Prior to alignment, there are options to remove 5' umi sequence and to trim off 3' sequencing adapter. Readgroup information to be injected into the bam header needs to be provided. The workflow can also split the input data into a requested number of chunks, align each separately then merge the separate alignments into a single bam file. This decreases the workflow run time. Optional bwa mem parameters can be provided to the workflow.
- bwa 0.7.17
- samtools 1.9
- cutadapt 1.8.3
- slicer 0.3.0
- python 3.7
- barcodex-rs 0.1.2
- rust 1.2
- gsi software modules : samtools 1.9 bwa 0.7.17
- gsi hg38 modules : hg38-bwa-index-with-alt 0.7.17
- gsi hg19 modules : hg19-bwa-index 0.7.17
- gsi mm10 modules :mm10-bwa-index 0.7.17
java -jar cromwell.jar run bwaMem.wdl --inputs inputs.json
Parameter | Value | Description |
---|---|---|
fastqR1 |
File | Fastq file for read 1 |
outputFileNamePrefix |
String | Prefix for output files |
reference |
String | The genome reference build. For example: hg19, hg38, mm10 |
runBwaMem.readGroups |
String | The readgroup information to be injected into the bam header |
Parameter | Value | Default | Description |
---|---|---|---|
fastqR2 |
File? | None | Fastq file for read 2 |
numChunk |
Int | 1 | Number of chunks to split fastq file [1, no splitting] |
doUMIextract |
Boolean | false | If true, UMI will be extracted before alignment [false] |
doTrim |
Boolean | false | If true, adapters will be trimmed before alignment [false] |
numReads |
Int? | None | Number of reads |
Parameter | Value | Default | Description |
---|---|---|---|
countChunkSize.modules |
String | "python/3.7" | Required environment modules |
countChunkSize.jobMemory |
Int | 16 | Memory allocated for this job |
countChunkSize.timeout |
Int | 48 | Hours before task timeout |
slicerR1.modules |
String | "slicer/0.3.0" | Required environment modules |
slicerR1.jobMemory |
Int | 16 | Memory allocated for this job |
slicerR1.timeout |
Int | 48 | Hours before task timeout |
slicerR2.modules |
String | "slicer/0.3.0" | Required environment modules |
slicerR2.jobMemory |
Int | 16 | Memory allocated for this job |
slicerR2.timeout |
Int | 48 | Hours before task timeout |
extractUMIs.umiList |
String | "umiList" | Reference file with valid UMIs |
extractUMIs.outputPrefix |
String | "extractUMIs_output" | Specifies the start of the output files |
extractUMIs.pattern1 |
String | "(?P<umi_1>^[ACGT]{3}[ACG])(?P<discard_1>T) | (?P<umi_2>^[ACGT]{3})(?P<discard_2>T)" |
extractUMIs.pattern2 |
String | "(?P<umi_1>^[ACGT]{3}[ACG])(?P<discard_1>T) | (?P<umi_2>^[ACGT]{3})(?P<discard_2>T)" |
extractUMIs.modules |
String | "barcodex-rs/0.1.2 rust/1.45.1" | Required environment modules |
extractUMIs.jobMemory |
Int | 24 | Memory allocated for this job |
extractUMIs.timeout |
Int | 12 | Time in hours before task timeout |
adapterTrimming.modules |
String | "cutadapt/1.8.3" | Required environment modules |
adapterTrimming.doUMItrim |
Boolean | false | If true, do umi trimming |
adapterTrimming.umiLength |
Int | 5 | The number of bases to trim when doUMItrim is true. If the given length is positive, the bases are removed from the beginning of each read. If it is negative, the bases are removed from the end |
adapterTrimming.trimMinLength |
Int | 1 | Minimum length of reads to keep |
adapterTrimming.trimMinQuality |
Int | 0 | Minimum quality of read ends to keep |
adapterTrimming.adapter1 |
String | "AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC" | Adapter sequence to trim from read 1 |
adapterTrimming.adapter2 |
String | "AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT" | Adapter sequence to trim from read 2 |
adapterTrimming.addParam |
String? | None | Additional cutadapt parameters |
adapterTrimming.jobMemory |
Int | 16 | Memory allocated for this job |
adapterTrimming.timeout |
Int | 48 | Hours before task timeout |
runBwaMem.addParam |
String? | None | Additional BWA parameters |
runBwaMem.threads |
Int | 8 | Requested CPU threads |
runBwaMem.jobMemory |
Int | 32 | Memory allocated for this job |
runBwaMem.timeout |
Int | 96 | Hours before task timeout |
bamMerge.jobMemory |
Int | 32 | Memory allocated indexing job |
bamMerge.modules |
String | "samtools/1.9" | Required environment modules |
bamMerge.timeout |
Int | 72 | Hours before task timeout |
indexBam.jobMemory |
Int | 12 | Memory allocated indexing job |
indexBam.modules |
String | "samtools/1.9" | Modules for running indexing job |
indexBam.timeout |
Int | 48 | Hours before task timeout |
adapterTrimmingLog.jobMemory |
Int | 12 | Memory allocated indexing job |
adapterTrimmingLog.timeout |
Int | 48 | Hours before task timeout |
Output | Type | Description | Labels |
---|---|---|---|
bwaMemBam |
File | Output Alignment BAM file | vidarr_label: bwaMemBam |
bwaMemIndex |
File | Index of the Output Alignment file, BAI | vidarr_label: bwaMemIndex |
log |
File? | Optional log file | vidarr_label: log |
cutAdaptAllLogs |
File? | Optional log file for cutAdapt | vidarr_label: cutAdaptAllLogs |
This section lists command(s) run by bwaMem workflow
Split the fastq files into chunks to parallelize the alignment (optional). If requested, subsequent steps will be run on each fastq chunk
if [ -z "~{numReads}" ]; then
totalLines=$(zcat ~{fastqR1} | wc -l)
else totalLines=$((~{numReads}*4))
fi
python3 -c "from math import ceil; print (int(ceil(($totalLines/4.0)/~{numChunk})*4))"
slicer -i ~{fastqR} -l ~{chunkSize} --gzip
barcodex-rs --umilist ~{umiList} --prefix ~{outputPrefix} --separator "__" inline \
--pattern1 '~{pattern1}' --r1-in ~{fastq1} \
~{if (defined(fastq2)) then "--pattern2 '~{pattern2}' --r2-in ~{fastq2} " else ""}
cat ~{outputPrefix}_UMI_counts.json > umiCounts.txt
tr [,] ',\n' < umiCounts.txt | sed 's/[{}]//' > tmp.txt
echo "{$(sort -i tmp.txt)}" > new.txt
tr '\n' ',' < new.txt | sed 's/,$//' > ~{outputPrefix}_UMI_counts.json
cutadapt -q ~{trimMinQuality} \
-m ~{trimMinLength} \
-a ~{adapter1} \
-o ~{resultFastqR1} \
~{if (defined(fastqR2)) then "-A ~{adapter2} -p ~{resultFastqR2} " else ""} \
~{if (doUMItrim) then "-u ~{umiLength} -U ~{umiLength} " else ""} \
~{addParam} \
~{fastqR1} \
~{fastqR2} > ~{resultLog}
mkdir -p ~{tmpDir}
bwa mem -M \
-t ~{threads} ~{addParam} \
-R ~{readGroups} \
~{bwaRef} \
~{read1s} \
~{read2s} \
| \
samtools sort -O bam -T ~{tmpDir} -o ~{resultBam} -
samtools merge \
-c \
~{resultMergedBam} \
~{sep=" " bams}
samtools index ~{inputBam} ~{resultBai}
COMMANDS NOT SHOWN, see WDL for details
For support, please file an issue on the Github project or send an email to [email protected] .
Generated with generate-markdown-readme (https://github.com/oicr-gsi/gsi-wdl-tools/)