I encountered an error while running the assemble workflow on Atlas (version 1.0.19). From the counts_per_region_.log, the error was
Warning: Unknown annotation format: gtf. GTF format is used.
ERROR: invalid parameter: '−−minOverlap'
It seems the encoding of the two dashes ('--') used in the minimum overlap parameter is the issue.
import chardet
s_encoding = chardet.detect('−−minOverlap')['encoding']
print s_encoding
utf-8
Possible solution was to retype dashes from "−−minOverlap" parameter.
s_retyped = chardet.detect('--minOverlap')['encoding']
print s_retyped
ascii
I also added a space before the backslash on lines 684,685, and 689 from the assemble.snakefile. After both changes, the assemble workflow was successfully completed.
Below is the complete log
Building DAG of jobs...
Creating conda environment /home/william/miniconda3/envs/atlas.72/lib/python3.6/site-packages/pnnl_atlas-1.0.19-py3.6.egg/atlas/envs/optional_genome_binning.yaml...
Environment for ../../../../../home/william/miniconda3/envs/atlas.72/lib/python3.6/site-packages/pnnl_atlas-1.0.19-py3.6.egg/atlas/envs/optional_genome_binning.yaml created (location: .snakemake/conda/9596cb25)
Creating conda environment /home/william/miniconda3/envs/atlas.72/lib/python3.6/site-packages/pnnl_atlas-1.0.19-py3.6.egg/atlas/envs/required_packages.yaml...
Environment for ../../../../../home/william/miniconda3/envs/atlas.72/lib/python3.6/site-packages/pnnl_atlas-1.0.19-py3.6.egg/atlas/envs/required_packages.yaml created (location: .snakemake/conda/212a2e89)
Provided cores: 24
Rules claiming more threads will be scaled down.
Unlimited resources: mem
Job counts:
count jobs
1 QC_report
1 add_contig_metadata
1 align_reads_to_final_contigs
1 all
1 build_decontamination_db
2 calculate_contigs_stats
1 calculate_insert_size
1 calculate_prefiltered_contig_coverage_stats
1 combine_insert_stats
1 combine_read_counts
1 combine_read_length_stats
1 convert_gff_to_gtf
1 convert_sam_to_bam
1 decontamination
1 deduplicate
1 error_correction
1 filter_by_coverage
1 finalize_QC
1 finalize_contigs
1 find_counts_per_region
1 init_QC
1 initialize_checkm
1 make_maxbin_abundance_file
1 merge_pairs
1 merge_sample_tables
1 normalize_coverage_across_kmers
1 parse_blastp
1 pileup
1 postprocess_after_decontamination
1 quality_filter
5 read_stats
1 rename_contigs
1 rename_megahit_output
1 run_checkm_lineage_wf
1 run_checkm_tree_qa
1 run_diamond_blastp
1 run_maxbin
1 run_megahit
1 run_prokka_annotation
1 sort_munged_blast_hits
1 update_prokka_tsv
46
rule init_QC:
input: /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/data/PHH12_O-8024.3.89990.GGTAGC.fastq.gz
output: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_raw_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_raw_R2.fastq.gz
log: PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_init.log
jobid: 32
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC
priority: 80
threads: 24
resources: mem=40
reformat.sh in=/media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/data/PHH12_O-8024.3.89990.GGTAGC.fastq.gz interleaved=t out1=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_raw_R1.fastq.gz out2=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_raw_R2.fastq.gz qout=33 overwrite=true verifypaired=t addslash=t trimreaddescription=t threads=24 -Xmx40G 2> PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_init.log
Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/212a2e89.
Finished job 32.
1 of 46 steps (2%) done
rule read_stats:
input: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_raw_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_raw_R2.fastq.gz
output: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/raw.zip, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/raw_read_counts.tsv
log: PHH12-O-8024.3.89990.GGTAGC/logs/read_stats.log
jobid: 13
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC, step=raw
priority: 30
threads: 24
resources: mem=40
Finished job 13.
2 of 46 steps (4%) done
rule deduplicate:
input: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_raw_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_raw_R2.fastq.gz
output: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_deduplicated_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_deduplicated_R2.fastq.gz
log: PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_deduplicate.log
jobid: 33
benchmark: logs/benchmarks/deduplicate/PHH12-O-8024.3.89990.GGTAGC.txt
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC
threads: 24
resources: mem=40
clumpify.sh in=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_raw_R1.fastq.gz in2=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_raw_R2.fastq.gz out1=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_deduplicated_R1.fastq.gz out2=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_deduplicated_R2.fastq.gz overwrite=true dedupe=t dupesubs=2 optical=f threads=24 -Xmx40G 2> PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_deduplicate.log
Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/212a2e89.
Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_raw_R1.fastq.gz.
Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_raw_R2.fastq.gz.
Finished job 33.
3 of 46 steps (7%) done
rule read_stats:
input: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_deduplicated_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_deduplicated_R2.fastq.gz
output: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/deduplicated.zip, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/deduplicated_read_counts.tsv
log: PHH12-O-8024.3.89990.GGTAGC/logs/read_stats.log
jobid: 17
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC, step=deduplicated
priority: 30
threads: 24
resources: mem=40
Finished job 17.
4 of 46 steps (9%) done
rule quality_filter:
input: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_deduplicated_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_deduplicated_R2.fastq.gz
output: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_filtered_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_filtered_R2.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_filtered_se.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_quality_filtering_stats.txt
log: PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_quality_filter.log
jobid: 19
benchmark: logs/benchmarks/quality_filter/PHH12-O-8024.3.89990.GGTAGC.txt
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC
threads: 24
resources: mem=40
bbduk2.sh in1=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_deduplicated_R1.fastq.gz in2=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_deduplicated_R2.fastq.gz out1=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_filtered_R1.fastq.gz out2=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_filtered_R2.fastq.gz outs=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_filtered_se.fastq.gz rref=/home/william/dbs/atlas_db.v2/adapters.fa lref=/home/william/dbs/atlas_db.v2/adapters.fa mink=8 qout=33 stats=PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_quality_filtering_stats.txt hdist=1 k=27 trimq=10 qtrim=rl threads=24 minlength=51 trd=t minbasefrequency=0.05 interleaved=t overwrite=true ecco=t -Xmx40G 2> PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_quality_filter.log
Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/212a2e89.
Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_deduplicated_R1.fastq.gz.
Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_deduplicated_R2.fastq.gz.
Finished job 19.
5 of 46 steps (11%) done
rule read_stats:
input: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_filtered_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_filtered_R2.fastq.gz
output: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/filtered.zip, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/filtered_read_counts.tsv
log: PHH12-O-8024.3.89990.GGTAGC/logs/read_stats.log
jobid: 15
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC, step=filtered
priority: 30
threads: 24
resources: mem=40
Finished job 15.
6 of 46 steps (13%) done
rule build_decontamination_db:
output: ref/genome/1/summary.txt
log: logs/build_decontamination_db.log
jobid: 31
threads: 24
resources: mem=40
bbsplit.sh -Xmx40G ref_PhiX=/home/william/dbs/atlas_db.v2/phiX174_virus.fa ref_rRNA=/home/william/dbs/atlas_db.v2/silva_rfam_all_rRNAs.fa threads=24 k=13 local=t 2> logs/build_decontamination_db.log
Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/212a2e89.
Finished job 31.
7 of 46 steps (15%) done
rule decontamination:
input: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_filtered_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_filtered_R2.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_filtered_se.fastq.gz, ref/genome/1/summary.txt
output: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_clean_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_clean_R2.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_clean_se.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/contaminants/PhiX_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/contaminants/PhiX_R2.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/contaminants/PhiX_se.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/contaminants/rRNA_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/contaminants/rRNA_R2.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/contaminants/rRNA_se.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_decontamination_reference_stats.txt
log: PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_decontamination.log
jobid: 12
benchmark: logs/benchmarks/decontamination/PHH12-O-8024.3.89990.GGTAGC.txt
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC
threads: 24
resources: mem=40
if [ "true" = true ] ; then
bbsplit.sh in1=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_filtered_R1.fastq.gz in2=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_filtered_R2.fastq.gz outu1=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_clean_R1.fastq.gz outu2=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_clean_R2.fastq.gz basename="PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/contaminants/%_R#.fastq.gz" maxindel=20 minratio=0.65 minhits=1 ambiguous=best refstats=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_decontamination_reference_stats.txt threads=24 k=13 local=t -Xmx40G 2> PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_decontamination.log
fi
bbsplit.sh in=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_filtered_se.fastq.gz outu=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_clean_se.fastq.gz basename="PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/contaminants/%_se.fastq.gz" maxindel=20 minratio=0.65 minhits=1 ambiguous=best refstats=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_decontamination_reference_stats.txt append interleaved=f threads=24 k=13 local=t -Xmx40G 2>> PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_decontamination.log
Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/212a2e89.
Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_filtered_R2.fastq.gz.
Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_filtered_se.fastq.gz.
Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_filtered_R1.fastq.gz.
Finished job 12.
8 of 46 steps (17%) done
rule read_stats:
input: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_clean_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_clean_R2.fastq.gz
output: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/clean.zip, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/clean_read_counts.tsv
log: PHH12-O-8024.3.89990.GGTAGC/logs/read_stats.log
jobid: 14
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC, step=clean
priority: 30
threads: 24
resources: mem=40
Finished job 14.
9 of 46 steps (20%) done
localrule postprocess_after_decontamination:
input: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_clean_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_clean_R2.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_clean_se.fastq.gz
output: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R2.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_se.fastq.gz
jobid: 11
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC
localrule initialize_checkm:
output: logs/checkm_init.txt
log: logs/initialize_checkm.log
jobid: 29
python /home/william/miniconda3/envs/atlas.72/lib/python3.6/site-packages/pnnl_atlas-1.0.19-py3.6.egg/atlas/rules/initialize_checkm.py /home/william/dbs/atlas_db.v2/checkm logs/checkm_init.txt logs/initialize_checkm.log
Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/9596cb25.
Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_clean_R1.fastq.gz.
Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_clean_se.fastq.gz.
Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_clean_R2.fastq.gz.
Finished job 11.
10 of 46 steps (22%) done
Finished job 29.
11 of 46 steps (24%) done
rule read_stats:
input: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R2.fastq.gz
output: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/QC.zip, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/QC_read_counts.tsv
log: PHH12-O-8024.3.89990.GGTAGC/logs/read_stats.log
jobid: 16
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC, step=QC
priority: 30
threads: 24
resources: mem=40
Finished job 16.
12 of 46 steps (26%) done
rule normalize_coverage_across_kmers:
input: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R2.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_se.fastq.gz
output: PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized_R2.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized_se.fastq.gz
log: PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_normalization.log
jobid: 45
benchmark: logs/benchmarks/normalization/PHH12-O-8024.3.89990.GGTAGC.txt
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC
threads: 24
resources: mem=40
if [ in=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_se.fastq.gz != "null" ];
then
bbnorm.sh in=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_se.fastq.gz extra=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R1.fastq.gz,PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R2.fastq.gz out=PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized_se.fastq.gz k=21 t=100 interleaved=f minkmers=15 prefilter=t threads=24 -Xmx40G 2> PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_normalization.log
fi
if [ t = "t" ];
then
bbnorm.sh in=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R1.fastq.gz in2=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R2.fastq.gz extra=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_se.fastq.gz out=PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized_R1.fastq.gz out2=PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized_R2.fastq.gz k=21 t=100 interleaved=f minkmers=15 prefilter=t threads=24 -Xmx40G 2>> PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_normalization.log
fi
Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/212a2e89.
Finished job 45.
13 of 46 steps (28%) done
rule merge_pairs:
input: PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized_R2.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized_se.fastq.gz
output: PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged_R2.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged_se.fastq.gz
log: PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_merge_pairs.log
jobid: 44
benchmark: logs/benchmarks/merge_pairs/PHH12-O-8024.3.89990.GGTAGC.txt
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC, previous_steps=normalized
threads: 24
resources: mem=40
Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized_R2.fastq.gz.
Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized_se.fastq.gz.
Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized_R1.fastq.gz.
Finished job 44.
14 of 46 steps (30%) done
rule error_correction:
input: PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged_R2.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged_se.fastq.gz
output: PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged.errorcorr_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged.errorcorr_R2.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged.errorcorr_se.fastq.gz
log: PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_error_correction.log
jobid: 43
benchmark: logs/benchmarks/error_correction/PHH12-O-8024.3.89990.GGTAGC.txt
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC, previous_steps=normalized.merged
threads: 24
resources: mem=40
tadpole.sh -Xmx40G prealloc=1 in1=PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged_R1.fastq.gz,PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged_se.fastq.gz in2=PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged_R2.fastq.gz out1=PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged.errorcorr_R1.fastq.gz,PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged.errorcorr_se.fastq.gz out2=PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged.errorcorr_R2.fastq.gz mode=correct threads=24 ecc=t ecco=t 2>> PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_error_correction.log
Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/212a2e89.
Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged_R2.fastq.gz.
Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged_R1.fastq.gz.
Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged_se.fastq.gz.
Finished job 43.
15 of 46 steps (33%) done
rule run_megahit:
input: PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged.errorcorr_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged.errorcorr_R2.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged.errorcorr_se.fastq.gz
output: PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_prefilter.contigs.fa
log: PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_megahit.log
jobid: 40
benchmark: logs/benchmarks/assembly/PHH12-O-8024.3.89990.GGTAGC.txt
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC
threads: 8
resources: mem=50
Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged.errorcorr_R2.fastq.gz.
Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged.errorcorr_se.fastq.gz.
Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged.errorcorr_R1.fastq.gz.
Finished job 40.
16 of 46 steps (35%) done
localrule rename_megahit_output:
input: PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_prefilter.contigs.fa
output: PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_raw_contigs.fasta
jobid: 34
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC
cp PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_prefilter.contigs.fa PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_raw_contigs.fasta
Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_prefilter.contigs.fa.
Finished job 34.
17 of 46 steps (37%) done
rule rename_contigs:
input: PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_raw_contigs.fasta
output: PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_prefilter_contigs.fasta
jobid: 23
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC
rename.sh in=PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_raw_contigs.fasta out=PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_prefilter_contigs.fasta ow=t prefix=PHH12-O-8024.3.89990.GGTAGC
Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/212a2e89.
Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_raw_contigs.fasta.
Finished job 23.
18 of 46 steps (39%) done
rule calculate_prefiltered_contig_coverage_stats:
input: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R2.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_se.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_prefilter_contigs.fasta
output: PHH12-O-8024.3.89990.GGTAGC/assembly/contig_stats/prefilter_coverage_stats.txt, PHH12-O-8024.3.89990.GGTAGC/sequence_alignment/alignement_to_prefilter_contigs.sam
log: PHH12-O-8024.3.89990.GGTAGC/assembly/logs/prefiltered_contig_coverage_stats.log
jobid: 35
benchmark: logs/benchmarks/calculate_prefiltered_contig_coverage_stats/PHH12-O-8024.3.89990.GGTAGC.txt
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC
threads: 24
resources: mem=40
bbwrap.sh nodisk=t ref=PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_prefilter_contigs.fasta in1=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R1.fastq.gz,PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_se.fastq.gz in2=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R2.fastq.gz,null fast=t interleaved=auto threads=24 -Xmx40G append out=PHH12-O-8024.3.89990.GGTAGC/sequence_alignment/alignement_to_prefilter_contigs.sam 2> PHH12-O-8024.3.89990.GGTAGC/assembly/logs/prefiltered_contig_coverage_stats.log
pileup.sh ref=PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_prefilter_contigs.fasta in=PHH12-O-8024.3.89990.GGTAGC/sequence_alignment/alignement_to_prefilter_contigs.sam threads=24 -Xmx40G covstats=PHH12-O-8024.3.89990.GGTAGC/assembly/contig_stats/prefilter_coverage_stats.txt physcov 2>> PHH12-O-8024.3.89990.GGTAGC/assembly/logs/prefiltered_contig_coverage_stats.log
Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/212a2e89.
Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/sequence_alignment/alignement_to_prefilter_contigs.sam.
Finished job 35.
19 of 46 steps (41%) done
rule filter_by_coverage:
input: PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_prefilter_contigs.fasta, PHH12-O-8024.3.89990.GGTAGC/assembly/contig_stats/prefilter_coverage_stats.txt
output: PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_final_contigs.fasta, PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_discarded_contigs.fasta
log: PHH12-O-8024.3.89990.GGTAGC/assembly/logs/filter_by_coverage.log
jobid: 24
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC
resources: mem=40
filterbycoverage.sh in=PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_prefilter_contigs.fasta cov=PHH12-O-8024.3.89990.GGTAGC/assembly/contig_stats/prefilter_coverage_stats.txt out=PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_final_contigs.fasta outd=PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_discarded_contigs.fasta minc=5 minp=40 minr=0 minl=2200 trim=100 -Xmx40G 2> PHH12-O-8024.3.89990.GGTAGC/assembly/logs/filter_by_coverage.log
Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/212a2e89.
rule calculate_contigs_stats:
input: PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_prefilter_contigs.fasta
output: PHH12-O-8024.3.89990.GGTAGC/assembly/contig_stats/prefilter_contig_stats.txt
jobid: 4
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC, assembly_step=prefilter
resources: mem=40
stats.sh in=PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_prefilter_contigs.fasta format=3 -Xmx40G > PHH12-O-8024.3.89990.GGTAGC/assembly/contig_stats/prefilter_contig_stats.txt
Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/212a2e89.
Finished job 4.
20 of 46 steps (43%) done
Finished job 24.
21 of 46 steps (46%) done
rule calculate_contigs_stats:
input: PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_final_contigs.fasta
output: PHH12-O-8024.3.89990.GGTAGC/assembly/contig_stats/final_contig_stats.txt
jobid: 5
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC, assembly_step=final
resources: mem=40
stats.sh in=PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_final_contigs.fasta format=3 -Xmx40G > PHH12-O-8024.3.89990.GGTAGC/assembly/contig_stats/final_contig_stats.txt
Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/212a2e89.
localrule finalize_contigs:
input: PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_final_contigs.fasta
output: PHH12-O-8024.3.89990.GGTAGC/PHH12-O-8024.3.89990.GGTAGC_contigs.fasta
jobid: 36
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC
cp PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_final_contigs.fasta PHH12-O-8024.3.89990.GGTAGC/PHH12-O-8024.3.89990.GGTAGC_contigs.fasta
Finished job 36.
22 of 46 steps (48%) done
Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_final_contigs.fasta.
Finished job 5.
23 of 46 steps (50%) done
rule align_reads_to_final_contigs:
input: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R2.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_se.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/PHH12-O-8024.3.89990.GGTAGC_contigs.fasta
output: PHH12-O-8024.3.89990.GGTAGC/sequence_alignment/PHH12-O-8024.3.89990.GGTAGC.sam, PHH12-O-8024.3.89990.GGTAGC/assembly/unmapped_post_filter/PHH12-O-8024.3.89990.GGTAGC_unmapped_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/assembly/unmapped_post_filter/PHH12-O-8024.3.89990.GGTAGC_unmapped_R2.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/assembly/unmapped_post_filter/PHH12-O-8024.3.89990.GGTAGC_unmapped_se.fastq.gz
log: PHH12-O-8024.3.89990.GGTAGC/assembly/logs/contig_coverage_stats.log
jobid: 38
benchmark: logs/benchmarks/align_reads_to_filtered_contigs/PHH12-O-8024.3.89990.GGTAGC.txt
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC
threads: 24
resources: mem=40
bbwrap.sh nodisk=t ref=PHH12-O-8024.3.89990.GGTAGC/PHH12-O-8024.3.89990.GGTAGC_contigs.fasta in1=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R1.fastq.gz,PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_se.fastq.gz in2=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R2.fastq.gz,null trimreaddescriptions=t outm=PHH12-O-8024.3.89990.GGTAGC/sequence_alignment/PHH12-O-8024.3.89990.GGTAGC.sam outu1=PHH12-O-8024.3.89990.GGTAGC/assembly/unmapped_post_filter/PHH12-O-8024.3.89990.GGTAGC_unmapped_R1.fastq.gz,PHH12-O-8024.3.89990.GGTAGC/assembly/unmapped_post_filter/PHH12-O-8024.3.89990.GGTAGC_unmapped_se.fastq.gz outu2=PHH12-O-8024.3.89990.GGTAGC/assembly/unmapped_post_filter/PHH12-O-8024.3.89990.GGTAGC_unmapped_R2.fastq.gz,null threads=24 pairlen=1000 pairedonly=t mdtag=t xstag=fs nmtag=t sam=1.3 local=t ambiguous=best secondary=t ssao=t maxsites=10 -Xmx40G 2> PHH12-O-8024.3.89990.GGTAGC/assembly/logs/contig_coverage_stats.log
Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/212a2e89.
Finished job 38.
24 of 46 steps (52%) done
rule pileup:
input: PHH12-O-8024.3.89990.GGTAGC/PHH12-O-8024.3.89990.GGTAGC_contigs.fasta, PHH12-O-8024.3.89990.GGTAGC/sequence_alignment/PHH12-O-8024.3.89990.GGTAGC.sam
output: PHH12-O-8024.3.89990.GGTAGC/assembly/contig_stats/postfilter_base_coverage.txt.gz, PHH12-O-8024.3.89990.GGTAGC/assembly/contig_stats/postfilter_coverage_histogram.txt, PHH12-O-8024.3.89990.GGTAGC/assembly/contig_stats/postfilter_coverage_stats.txt, PHH12-O-8024.3.89990.GGTAGC/assembly/contig_stats/postfilter_coverage_binned.txt
log: PHH12-O-8024.3.89990.GGTAGC/assembly/logs/contig_coverage_stats.log
jobid: 42
benchmark: logs/benchmarks/align_reads_to_filtered_contigs/PHH12-O-8024.3.89990.GGTAGC_pileup.txt
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC
threads: 24
resources: mem=40
pileup.sh ref=PHH12-O-8024.3.89990.GGTAGC/PHH12-O-8024.3.89990.GGTAGC_contigs.fasta in=PHH12-O-8024.3.89990.GGTAGC/sequence_alignment/PHH12-O-8024.3.89990.GGTAGC.sam threads=24 -Xmx40G covstats=PHH12-O-8024.3.89990.GGTAGC/assembly/contig_stats/postfilter_coverage_stats.txt hist=PHH12-O-8024.3.89990.GGTAGC/assembly/contig_stats/postfilter_coverage_histogram.txt basecov=PHH12-O-8024.3.89990.GGTAGC/assembly/contig_stats/postfilter_base_coverage.txt.gz concise=t physcov=t secondary=f bincov=PHH12-O-8024.3.89990.GGTAGC/assembly/contig_stats/postfilter_coverage_binned.txt 2>> PHH12-O-8024.3.89990.GGTAGC/assembly/logs/contig_coverage_stats.log
Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/212a2e89.
Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/assembly/contig_stats/postfilter_base_coverage.txt.gz.
Finished job 42.
25 of 46 steps (54%) done
rule convert_sam_to_bam:
input: PHH12-O-8024.3.89990.GGTAGC/sequence_alignment/PHH12-O-8024.3.89990.GGTAGC.sam
output: PHH12-O-8024.3.89990.GGTAGC/sequence_alignment/PHH12-O-8024.3.89990.GGTAGC.bam
jobid: 27
wildcards: file=PHH12-O-8024.3.89990.GGTAGC/sequence_alignment/PHH12-O-8024.3.89990.GGTAGC
threads: 24
samtools view -@ 24 -bSh1 PHH12-O-8024.3.89990.GGTAGC/sequence_alignment/PHH12-O-8024.3.89990.GGTAGC.sam | samtools sort -m 1536M -@ 24 -T /tmp/PHH12-O-8024.3.89990.GGTAGC/sequence_alignment/PHH12-O-8024.3.89990.GGTAGC_tmp -o PHH12-O-8024.3.89990.GGTAGC/sequence_alignment/PHH12-O-8024.3.89990.GGTAGC.bam -O bam -
Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/212a2e89.
Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/sequence_alignment/PHH12-O-8024.3.89990.GGTAGC.sam.
Finished job 27.
26 of 46 steps (57%) done
rule run_prokka_annotation:
input: PHH12-O-8024.3.89990.GGTAGC/PHH12-O-8024.3.89990.GGTAGC_contigs.fasta
output: PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.err, PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.faa, PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.ffn, PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.fna, PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.fsa, PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.gbk, PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.gff, PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.log, PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.sqn, PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.tbl, PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.tsv, PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.txt
jobid: 25
benchmark: logs/benchmarks/prokka/PHH12-O-8024.3.89990.GGTAGC.txt
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC
threads: 24
prokka --outdir PHH12-O-8024.3.89990.GGTAGC/annotation/prokka --force --prefix PHH12-O-8024.3.89990.GGTAGC --locustag PHH12-O-8024.3.89990.GGTAGC --kingdom Bacteria --metagenome --cpus 24 PHH12-O-8024.3.89990.GGTAGC/PHH12-O-8024.3.89990.GGTAGC_contigs.fasta
Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/212a2e89.
Finished job 25.
27 of 46 steps (59%) done
rule calculate_insert_size:
input: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R2.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_se.fastq.gz
output: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/QC_insert_size_hist.txt, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/QC_read_length_hist.txt
log: PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_calculate_insert_size.log
jobid: 18
benchmark: logs/benchmarks/merge_pairs/PHH12-O-8024.3.89990.GGTAGC_insert_size.txt
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC
threads: 24
resources: mem=40
bbmerge.sh -Xmx40G threads=24 in1=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R1.fastq.gz in2=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R2.fastq.gz loose ecct k=62 extend2=50 ihist=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/QC_insert_size_hist.txt merge=f mininsert0=35 minoverlap0=8 prealloc=t prefilter=t minprob=0.8 2> >(tee PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_calculate_insert_size.log)
readlength.sh in=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R1.fastq.gz in2=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R2.fastq.gz out=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/QC_read_length_hist.txt 2> >(tee PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_calculate_insert_size.log)
Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/212a2e89.
Finished job 18.
28 of 46 steps (61%) done
rule convert_gff_to_gtf:
input: PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.gff
output: PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.gtf
jobid: 28
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC
localrule combine_insert_stats:
input: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/QC_insert_size_hist.txt
output: stats/insert_stats.tsv
jobid: 21
localrule combine_read_length_stats:
input: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/QC_read_length_hist.txt
output: stats/read_length_stats.tsv
jobid: 22
localrule finalize_QC:
input: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R2.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_se.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_decontamination_reference_stats.txt, PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_quality_filtering_stats.txt, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/raw.zip, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/deduplicated.zip, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/filtered.zip, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/clean.zip, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/QC.zip, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/raw_read_counts.tsv, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/deduplicated_read_counts.tsv, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/filtered_read_counts.tsv, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/clean_read_counts.tsv, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/QC_read_counts.tsv, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/QC_read_length_hist.txt
output: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/finished_QC, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/read_counts.tsv
jobid: 2
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC
rule update_prokka_tsv:
input: PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.gff
output: PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC_plus.tsv
jobid: 6
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC
rule make_maxbin_abundance_file:
input: PHH12-O-8024.3.89990.GGTAGC/assembly/contig_stats/postfilter_coverage_stats.txt
output: PHH12-O-8024.3.89990.GGTAGC/genomic_bins/PHH12-O-8024.3.89990.GGTAGC_contig_coverage.tsv
jobid: 39
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC
atlas gff2tsv PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.gff PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC_plus.tsv
Finished job 28.
29 of 46 steps (63%) done
Finished job 39.
30 of 46 steps (65%) done
Finished job 21.
31 of 46 steps (67%) done
Finished job 22.
32 of 46 steps (70%) done
Touching output file PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/finished_QC.
Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/raw_read_counts.tsv.
Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/clean_read_counts.tsv.
Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/filtered_read_counts.tsv.
Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/QC_read_counts.tsv.
Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/deduplicated_read_counts.tsv.
Finished job 2.
33 of 46 steps (72%) done
localrule combine_read_counts:
input: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/read_counts.tsv
output: stats/read_counts.tsv
jobid: 20
Finished job 6.
34 of 46 steps (74%) done
Finished job 20.
35 of 46 steps (76%) done
rule run_diamond_blastp:
input: PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.faa, /home/william/dbs/atlas_db.v2/refseq.dmnd
output: PHH12-O-8024.3.89990.GGTAGC/annotation/refseq/PHH12-O-8024.3.89990.GGTAGC_hits.tsv
jobid: 41
benchmark: logs/benchmarks/run_diamond_blastp/PHH12-O-8024.3.89990.GGTAGC.txt
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC
threads: 24
diamond blastp --threads 24 --outfmt 6 --out PHH12-O-8024.3.89990.GGTAGC/annotation/refseq/PHH12-O-8024.3.89990.GGTAGC_hits.tsv --query PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.faa --db /home/william/dbs/atlas_db.v2/refseq.dmnd --top 2 --evalue 1e-06 --id 50 --query-cover 50 --gapopen 11 --gapextend 1 --tmpdir /tmp --block-size 2 --index-chunks 4
Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/212a2e89.
Finished job 41.
36 of 46 steps (78%) done
rule add_contig_metadata:
input: PHH12-O-8024.3.89990.GGTAGC/annotation/refseq/PHH12-O-8024.3.89990.GGTAGC_hits.tsv, PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.gff
output: PHH12-O-8024.3.89990.GGTAGC/annotation/refseq/PHH12-O-8024.3.89990.GGTAGC_hits_plus.tsv
jobid: 37
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC
localrule QC_report:
input: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/finished_QC, stats/read_counts.tsv, stats/insert_stats.tsv, stats/read_length_stats.tsv
output: finished_QC
jobid: 3
atlas munge-blast PHH12-O-8024.3.89990.GGTAGC/annotation/refseq/PHH12-O-8024.3.89990.GGTAGC_hits.tsv PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.gff PHH12-O-8024.3.89990.GGTAGC/annotation/refseq/PHH12-O-8024.3.89990.GGTAGC_hits_plus.tsv
if [ -d ref ]; then
rm -r ref
fi
Touching output file finished_QC.
Finished job 3.
37 of 46 steps (80%) done
Finished job 37.
38 of 46 steps (83%) done
rule sort_munged_blast_hits:
input: PHH12-O-8024.3.89990.GGTAGC/annotation/refseq/PHH12-O-8024.3.89990.GGTAGC_hits_plus.tsv
output: PHH12-O-8024.3.89990.GGTAGC/annotation/refseq/PHH12-O-8024.3.89990.GGTAGC_hits_plus_sorted.tsv
jobid: 26
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC
sort -k1,1 -k2,2 -k13,13rn PHH12-O-8024.3.89990.GGTAGC/annotation/refseq/PHH12-O-8024.3.89990.GGTAGC_hits_plus.tsv > PHH12-O-8024.3.89990.GGTAGC/annotation/refseq/PHH12-O-8024.3.89990.GGTAGC_hits_plus_sorted.tsv
Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/annotation/refseq/PHH12-O-8024.3.89990.GGTAGC_hits_plus.tsv.
Finished job 26.
39 of 46 steps (85%) done
rule run_maxbin:
input: PHH12-O-8024.3.89990.GGTAGC/PHH12-O-8024.3.89990.GGTAGC_contigs.fasta, PHH12-O-8024.3.89990.GGTAGC/genomic_bins/PHH12-O-8024.3.89990.GGTAGC_contig_coverage.tsv
output: PHH12-O-8024.3.89990.GGTAGC/genomic_bins/PHH12-O-8024.3.89990.GGTAGC.summary, PHH12-O-8024.3.89990.GGTAGC/genomic_bins/PHH12-O-8024.3.89990.GGTAGC.marker
log: PHH12-O-8024.3.89990.GGTAGC/logs/maxbin2.log
jobid: 30
benchmark: logs/benchmarks/maxbin2/PHH12-O-8024.3.89990.GGTAGC.txt
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC
threads: 24
run_MaxBin.pl -contig PHH12-O-8024.3.89990.GGTAGC/PHH12-O-8024.3.89990.GGTAGC_contigs.fasta -abund PHH12-O-8024.3.89990.GGTAGC/genomic_bins/PHH12-O-8024.3.89990.GGTAGC_contig_coverage.tsv -out PHH12-O-8024.3.89990.GGTAGC/genomic_bins/PHH12-O-8024.3.89990.GGTAGC -min_contig_length 200 -thread 24 -prob_threshold 0.9 -max_iteration 50 > PHH12-O-8024.3.89990.GGTAGC/logs/maxbin2.log
Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/9596cb25.
Finished job 30.
40 of 46 steps (87%) done
rule run_checkm_lineage_wf:
input: logs/checkm_init.txt, PHH12-O-8024.3.89990.GGTAGC/genomic_bins/PHH12-O-8024.3.89990.GGTAGC.marker
output: PHH12-O-8024.3.89990.GGTAGC/genomic_bins/checkm/completeness.tsv
jobid: 10
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC
threads: 24
rm -r PHH12-O-8024.3.89990.GGTAGC/genomic_bins/checkm && checkm lineage_wf --file PHH12-O-8024.3.89990.GGTAGC/genomic_bins/checkm/completeness.tsv --tab_table --quiet --extension fasta --threads 24 PHH12-O-8024.3.89990.GGTAGC/genomic_bins PHH12-O-8024.3.89990.GGTAGC/genomic_bins/checkm
Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/9596cb25.
Finished job 10.
41 of 46 steps (89%) done
rule find_counts_per_region:
input: PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.gtf, PHH12-O-8024.3.89990.GGTAGC/sequence_alignment/PHH12-O-8024.3.89990.GGTAGC.bam
output: PHH12-O-8024.3.89990.GGTAGC/annotation/feature_counts/PHH12-O-8024.3.89990.GGTAGC_counts.txt.summary, PHH12-O-8024.3.89990.GGTAGC/annotation/feature_counts/PHH12-O-8024.3.89990.GGTAGC_counts.txt
log: PHH12-O-8024.3.89990.GGTAGC/logs/counts_per_region.log
jobid: 9
wildcards: sample=PHH12-O-8024.3.89990.GGTAGC
threads: 24
featureCounts -p −−minOverlap 1 -B -F gtf -T 24 --primary -O --fraction -t CDS -g ID -a PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.gtf -o PHH12-O-8024.3.89990.GGTAGC/annotation/feature_counts/PHH12-O-8024.3.89990.GGTAGC_counts.txt PHH12-O-8024.3.89990.GGTAGC/sequence_alignment/PHH12-O-8024.3.89990.GGTAGC.bam 2> PHH12-O-8024.3.89990.GGTAGC/logs/counts_per_region.log
Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/212a2e89.
Error in rule find_counts_per_region:
jobid: 9
output: PHH12-O-8024.3.89990.GGTAGC/annotation/feature_counts/PHH12-O-8024.3.89990.GGTAGC_counts.txt.summary, PHH12-O-8024.3.89990.GGTAGC/annotation/feature_counts/PHH12-O-8024.3.89990.GGTAGC_counts.txt
log: PHH12-O-8024.3.89990.GGTAGC/logs/counts_per_region.log
RuleException:
CalledProcessError in line 683 of /home/william/miniconda3/envs/atlas.72/lib/python3.6/site-packages/pnnl_atlas-1.0.19-py3.6.egg/atlas/rules/assemble.snakefile:
Command 'source activate /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/212a2e89; set -euo pipefail; featureCounts -p −−minOverlap 1 -B -F gtf -T 24 --primary -O --fraction -t CDS -g ID -a PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.gtf -o PHH12-O-8024.3.89990.GGTAGC/annotation/feature_counts/PHH12-O-8024.3.89990.GGTAGC_counts.txt PHH12-O-8024.3.89990.GGTAGC/sequence_alignment/PHH12-O-8024.3.89990.GGTAGC.bam 2> PHH12-O-8024.3.89990.GGTAGC/logs/counts_per_region.log ' returned non-zero exit status 255.
File "/home/william/miniconda3/envs/atlas.72/lib/python3.6/site-packages/pnnl_atlas-1.0.19-py3.6.egg/atlas/rules/assemble.snakefile", line 683, in __rule_find_counts_per_region
File "/home/william/miniconda3/envs/atlas.72/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Will exit after finishing currently running jobs.
Exiting because a job execution failed. Look above for error message
Complete log: /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/log/2018-01-10T094504.256226.snakemake.log