gunzivan28 / rmap Goto Github PK

Bacterial analysis toolbox for full ESKAPE pathogen characterization and profiling the resistome, mobilome, virulome & phylogenomics using WGS

License: GNU General Public License v3.0

Shell 82.94% Perl 4.51% Python 12.55%

bacteria resistome eskape wgs illumina phylogenomics mobilome

rmap's Introduction

Thorough easy-to-use resistome profiling bioinformatics pipeline for ESKAPE (Enterococcus faecium, Staphylococcus aureus, Klebsiella pneumoniae, Acinetobacter baumannii, Pseudomonas aeruginosa, and Enterobacter species) pathogens using Illumina Whole-genome sequencing (WGS) paired-end reads

🎬 Introduction

The evolution of the genomics era has led to generation of sequencing data at an unprecedented rate. Many bioinformatics tools have been created to analyze this data; however, very few tools can be utilized by individuals without prior reasonable bioinformatics training

rMAP(Rapid Microbial Analysis Pipeline) was designed using already pre-existing tools to automate analysis WGS Illumina paired-end data for the clinically significant ESKAPE group pathogens. It is able to exhaustively decode their resistomes whilst hiding the technical impediments faced by inexperienced users. Installation is fast and straight forward. A successful run generates a .html report that can be easily interpreted by non-bioinformatics personnel to guide decision making

🏷️ Pipeline Features

The rMAP pipeline toolbox is able to perform:

Download raw sequences from NCBI-SRA archive
Run quality control checks
Adapter and poor quality read trimming
De-novo assembly using shovill or megahit
Contig and scaffold annotation using prokka
Variant calling using freebayes and annotation using snpEff
SNP-based phylogeny inference using Maximum-Likelihood methods using iqtree
Antimicrobial resistance genes, plasmid, virulence factors and MLST profiling
Insertion sequences detection
Pangenome analysis
Interactive visual .HTML report generation using R packages and Markdown language

⚙️ Installation

Install Miniconda by running the following commands:
For Linux Users: wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh

For MacOS Users: wget https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
bash Miniconda3-latest-MacOSX-x86_64.sh

export PATH=~/miniconda3/bin:$PATH and source using source ~/.bashrc
git clone https://github.com/GunzIvan28/rMAP.git
cd rMAP
conda update -n base -y -c defaults conda

Select the appropriate installer for your computer (either rMAP-1.0-Linux-installer.yml or rMAP-1.0-macOs-installer.yml)

For Linux Users: conda env create -n rMAP-1.0 --file rMAP-1.0-Linux-installer.yml
For MacOS Users: conda env create -n rMAP-1.0 --file rMAP-1.0-macOs-installer.yml

conda activate rMAP-1.0
bash setup.sh
cd && bash clean.sh
rm -rf clean.sh
rMAP -h

This is rMAP 1.0
Developed and maintained by Ivan Sserwadda & Gerald Mboowa

SYPNOSIS:
    Bacterial analysis Toolbox for profiling the Resistome of ESKAPE pathogens using WGS paired-end reads

USAGE:
    rMAP [options] --input <DIR> --output <OUTDIR> --reference <REF>

GENERAL:
    -h/--help       Show this help menu
    -v/--version    Print version and exit
    -x/--citation   Show citation and exit

OBLIGATORY OPTIONS:
    -i/--input      Location of the raw sequences to be analyzed by the pipeline [either .fastq or .fastq.gz]

    -o/--output     Path and name of the output directory

    -r/--reference  Path to reference genome(.gbk). Provide '.gbk' to get annotated vcf files and insertion
                    sequences  [default="REF.gbk"]

    -c/--config     Install and configure full software dependencies

OTHER OPTIONS:
    -d/--download   Download sequences from NCBI-SRA. Requires 'list.txt' of  sample ids saved at $HOME
                    directory

    -f/--quality    Generate .html reports with quality statistics for the samples

    -q/--trim       Trims adapters off raw reads to a phred quality score[default=27]

    -a/--assembly   Perform De novo assembly [default=megahit] Choose either 'shovill' or 'megahit'

    -vc/--varcall   Generates SNPs for each sample and a merged 'all-sample ID' VCF file to be used to infer
                    phylogeny in downstream analysis

    -t/--threads    Number of threads to use <integer> [default=4]

    -m/--amr        Profiles any existing antimicrobial resistance genes, virulence factors, mlsts and plasmids
                    present within each sample id.

    -p/--phylogeny  Infers phylogeny using merged all-sample ID VCF file to determine diversity and evolutionary
                    relationships using Maximum Likelihood(ML) in 1000 Bootstraps

    -s/--pangenome  Perform pangenome analysis. A minimum of 3 samples should be provided to run this option

    -g/--gen-ele    Interrogates and profiles for mobile genomic elements(MGE) and insertion sequences(IS) that
                    may exist in the sequences

For further explanation please visit: https://github.com/GunzIvan28/rMAP

Before starting the pipeline, run the command below to install and enjoy the full functionality of the software. This is done only once rMAP -t 8 --config or rMAP -t 8 -c

📀 Snippets of commandline arguments

Using a sample-ID 'list.txt' saved at $HOME, use rMAP to download sequences from NCBI-SRA

rMAP -t 8 --download

Perform a full run of rMAP using rMAP -t 8 --reference full_genome.gbk --input dir_name --output dir_name --quality --assembly shovill --amr --varcall --trim --phylogeny --pangenome --gen-ele

The short notation for the code above can be run as follows:
rMAP -t 8 -r full_genome.gbk -i dir_name -o dir_name -f -a shovill -m -vc -q -p -s -g

🚀 Arguments

⚡ Mandatory

-c | --config This installs R-packages and other dependancies required for downstream analysis. It is run only once, mandatory and the very first step performed before any analysis
-i | --input Location of sequences to be analyzed either in .fastq or .fastq.gz formats. If reads are not qzipped, rMAP will compress them for the user for optimization
-o | --output Name of directory to output results. rMAP creates the specified folder if it does not exist
-r | --reference Provide the recommended reference genome in genbank format renamed with extension .gbk e.g reference_name.gbk required for variant calling. A reference in fasta format e.g reference_name.fasta or reference_name.fa can be used but will not produce annotated vcf files

🎨 Other options

-o | --download This option downloads sequences from NCBI-Sequence Read Archive. Create a text file 'list.txt' containing the IDs of the samples to be downloaded and save it at $HOME directory. The downloaded samples will be saved at $HOME/SRA_READS
-f | --quality Generates quality metrics for the input sequences visualized as .html reports
-q | --trim Identifies and trims illumina library adapters off the raw reads and poor quality reads below a phred quality score of 27 with minimum length of 80bp set as the default for the software
-a | --assembly Performs De-novo assembly for the trimmed reads. Two assemblers are available for this step: shovill or megahit. Selecting "shovill" will perform genome mapping and several polishing rounds with removal of 'inter-contig' gaps to produce good quality contigs and scaffolds but is SLOW. Selecting "megahit" produces contigs with relatively lower quality assembly metrics but is much FASTER
-vc | --varcall Maps reads to the reference genome and calls SNPs saved in vcf format. A merged 'all-sample ID' VCF file to be used to infer phylogeny in downstream analysis is also generated at this stage
-t | --threads Specifies the number of cores to use as an integer. Default cores are set to 4
-m | --amr Provides a snapshot of the existing resistome (antimicrobial resistance genes, virulence factors, mlsts and plasmids) present in each sample id
-p | phylogeny Uses the vcf file containing SNPs for all of the samples combined as an input, transposes it into a multiple alignment fasta file and infers phylogenetic analysis using Maximum-Likelihood method. The trees generated are in 1000 Bootstrap values
-s | --pangenome Performs pangenome analysis for the samples using Roary. A minimum of 3 samples is required for this step
-g | --gene-ele This interrogates for any Insertion Sequences that may have been inserted anywhere within the genomes of the samples. These sequences are compared against a database of the commonly reported insertion Sequences found in organisms originating from the ESKAPE fraternity
-h | --help Shows the main menu
-v | version Prints software version and exits
-x | citation Shows citation and exits

📗 Report visualization

A sample of the interractive HTML report generated from the pipeline can be viewed at this link. The pipeline also retains the intermediate files and respective folders within the reports directory to be interrogated further by experienced users for any particular genes that may be of interest.

📝 Information

How to cite

When using rMAP, please cite as:

Sserwadda, I., & Mboowa, G. (2021). rMAP: the Rapid Microbial Analysis Pipeline for ESKAPE bacterial group whole-genome sequence data. Microbial genomics, 7(6), 10.1099/mgen.0.000583. https://doi.org/10.1099/mgen.0.000583. PMID: 34110280

🎞️ Credits

This pipeline was written by Ivan Sserwadda GunzIvan28 and Gerald Mboowa gmboowa. If you want to contribute, please open an issue or a pull request and ask to be added to the project - everyone is welcome to contribute

✍️ Authors

💡 Tutorial

rMAP was built on the philosophy of universal userbility. Compilation and successful usage of the pipeline can turn out to be a nightmare for individuals without commandline experience. The authors created this short basic tutorial to be used as a reference for mainstream analysis and troubleshooting purposes

1. Installation

Follow the installation procedures using the Miniconda installation instructions above by copying and pasting line-by-line in your terminal.

2. Downloading sample datasets

A dataset comprised of 3 paired-end Acinetobacter Whole-genome sequences and a reference genome can be downloaded using rMAP_datasets link.

3. Preparing files for rMAP run

Open your command line terminal and run the following commands

Change to your home directory and create a folder named "rMAP_datasets" cd && mkdir rMAP_datasets
Unzip the downloaded datasets from the link. Copy and paste the fastq.gz sequences of ERR1989084, ERR1989100,ERR1989115 into rMAP_datasets. Copy and paste the reference genome acinetobacter.gbk to the $HOME directory
Activate rMAP environment using conda activate rMAP-1.0 and install the pipeline's additional full packages using rMAP -t 8 -c.

4. Running rMAP

After confirming that you have folder rMAP_datasets containing sequences ERR1989084, ERR1989100,ERR1989115, a reference genome acinetobacter.gbk saved at the home directory, and the rMAP dependencies fully installed, run the following command: rMAP -t 8 --reference acinetobacter.gbk --input rMAP_datasets --output Acinetobacter_output --quality --assembly shovill --amr --varcall --trim --phylogeny --pangenome --gen-ele WHERE:
- -t 8 specifies usage of 8 threads by default. User can specify more if available
- --reference specifies the path of reference genome
- --input contains our whole genome sequence datasets i.e rMAP_datasets
- --output specifies an output path called Acinetobacter_output for the intermediate files and results
- --quality, --assembly shovill, --amr, --varcall, --trim, --phylogeny ,--pangenome, --gen-ele activate the options for quality control, genome assembly using shovill assembler, antimicrobial resistance gene profiling, sequence trimming, phylogenetic analysis, pangenome and insertion sequence characterization respectively in the rMAP run.
- A successful run should generate a HTML report similar to the one in this link. Submit any queries or bugs to the Issue Tracker platform and the developers will see to it that it is rectified.

🙏 Acknowledgements

rMAP was inspired and adapted from the TORMES pipeline, developed by Quijada et al. (2019) and reachable at https://github.com/nmquijada/tormes. The reporting format for rMAP was mainly adapted and modified from the Tormes pipeline. Other alternative tools similar to rMAP that you could consider depending on the type analysis to be computed:

AQUAMIS
Deneke C, Brendebach H, Uelze L, Borowiak M, Malorny B, Tausch SH. Species-Specific Quality Control, Assembly and Contamination Detection in Microbial Isolate Sequences with AQUAMIS. Genes. 2021;12. doi:10.3390/genes12050644

ASA³P
Schwengers O, Hoek A, Fritzenwanker M, Falgenhauer L, Hain T, Chakraborty T, Goesmann A. ASA³P: An automatic and scalable pipeline for the assembly, annotation and higher-level analysis of closely related bacterial isolates. PLoS Comput Biol 2020;16:e1007134. https://doi.org/10.1371/journal.pcbi.1007134.

MicroPIPE
Murigneux V, Roberts LW, Forde BM, Phan M-D, Nhu NTK, Irwin AD, Harris PNA, Paterson DL, Schembri MA, Whiley DM, Beatson SA MicroPIPE: validating an end-to-end workflow for high-quality complete bacterial genome construction. BMC Genomics, 22(1), 474. (2021) https://doi.org/10.1186/s12864-021-07767-z

Nullarbor
Seemann T, Goncalves da Silva A, Bulach DM, Schultz MB, Kwong JC, Howden BP. Nullarbor Github https://github.com/tseemann/nullarbor

ProkEvo
Pavlovikj N, Gomes-Neto JC, Deogun JS, Benson AK ProkEvo: an automated, reproducible, and scalable framework for high-throughput bacterial population genomics analyses. PeerJ, e11376 (2021) https://doi.org/10.7717/peerj.11376

Public Health Bacterial Genomics
Libuit K, Ambrosio F, Kapsak C Public Health Bacterial Genomics GitHub https://github.com/theiagen/public_health_bacterial_genomics

🔌 Third Party Plugins

This softwares' foundation is built using pre-existing tools. When using it, please don't forget to cite the following:

🐛 To report bugs, ask questions or seek help

The software developing team works round the clock to ensure the bugs within the program are captured and fixed. For support or any inquiry: You can submit your query using the Issue Tracker

rmap's People

Contributors

Stargazers

Watchers

Forkers

winfrednyoroka gmboowa skiyaga vikash84 angelamuraya omarcabrero yemilawal phancanhtrinh sparrow0hawk wook2014 tauqeer9 vincenzopennone lurialeslie emande gilmahu erssebaggala marynjerey

rmap's Issues

No results generated by rMAP

Dear Ivan,

I have installed rMAP as per the instruction but when i run :
rMAP -t 8 --config

following was the result:

rMAP will now configure the system: Please be patient...

rMAP will now install missing R-packages....
Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Bioconductor version 3.14 (BiocManager 1.30.16), R 4.1.1 (2021-08-10)
Old packages: 'crosstalk', 'fansi', 'knitr', 'later', 'tinytex', 'xfun'
Warning message:
package(s) not installed when version(s) same as current; use `force = TRUE` to
  re-install: 'ggtree' 'rmarkdown' 
Bioconductor version 3.14 (BiocManager 1.30.16), R 4.1.1 (2021-08-10)
Old packages: 'crosstalk', 'fansi', 'knitr', 'later', 'tinytex', 'xfun'
Warning message:
package(s) not installed when version(s) same as current; use `force = TRUE` to
  re-install: 'data.table' 'formattable' 
R-packages installed successfully....

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Extra-conda packages installed successfully....

ALL REQUIRED PACKAGES HAVE BEEN INSTALLED, enjoy rMAP!!....
running: amrfinder --update
Software directory: '/home/isd/miniconda3/envs/rMAP-1.0/bin/'
Software version: 3.10.16
Running: /home/isd/miniconda3/envs/rMAP-1.0/bin/amrfinder_update -d /home/isd/miniconda3/envs/rMAP-1.0/share/amrfinderplus/data
Looking up databases at https://ftp.ncbi.nlm.nih.gov/pathogen/Antimicrobial_resistance/AMRFinderPlus/database/
WARNING: '/home/isd/miniconda3/envs/rMAP-1.0/share/amrfinderplus/data/2021-09-30.1/' contains the latest version: 2021-09-30.1
Skipping update, use amrfinder --force_update to overwrite the existing database

Database directory: '/home/isd/miniconda3/envs/rMAP-1.0/share/amrfinderplus/data/2021-09-30.1'
Database version: 2021-09-30.1
REMEMBER: '--config' is only run once....


rMAP Set-up Lasted Approximately: 339 seconds.

But when i run the command it ran continuously but produces no results or error. The command was:

rMAP -i /home/isd/Desktop/salmonellaTyphi/Fastq/ -o /home/isd/Desktop/salmonellaTyphi/rMAP/rMAP_output_salmonella/ -r /home/isd/Desktop/salmonellaTyphi/rMAP/sequenceSalmonella.gbk -t 4 -a -q -vc -p -s -g -m

Can you please help me what is the issue that is not producing any screen logs or results??

Thank you in advance

Jia

Setup. Sh

I have trouble running the setup.sh, I use the wsl in windows pc

rMAP error

Would greatly appreciate advice on next steps.

Requesting modified output of report

I've thought that this can cause problem generating report, I think it's should be erased of $HOME/ dir

Trimming quality step dropping all reads

Dear Ivan,

I want to know that how can i skip the step of trimming as this step is dropping all of my fastq reads or how can i make changes in the trailing and leading parameters of the trimming step? As many of my samples after trimming step generating empty clean-read files.

Thanks
SAR

Multiple results file are empty

Dear rMAP Team,

I have ran rMAP on Shigella flexneri raw sequenceing using the following command:

rMAP -i /mnt/e/Working/Shigella/Data/shigella_flexneri/africa/ -o rMAP_output_africa_flex -t 2 -m -p -s -g -q -a shovill -vc

I have the following folders generated:

Assembly summary statistics - ran perfectly
SNP-Variant Calling No results generated
Phylogenetic inference As no snps were generated so no phylogeny was done
Antimicrobial Resistance Profiling perfectly ran
Plasmid Profiling perfectly ran
Virulence Factor Determination perfectly ran
Multi-Locus Sequence Typing (MLST) perfectly ran
Pangenome Analysis perfectly ran
Insertion sequence characterization (IS) It does not generate the summary file correctly as it only give the name of insertion sequences found but does not tell which sample has which insertion sequences and whats the percent identity as well when this step was processing the screen prompt the following error:

cat: 'rMAP_output_africa_flex/insertion_sequences/ERR573382/ERR573382.clean/ISKpn23/*.txt': No such file or directory
same error for all the samples

as well the same following error for all the samples:

Processing sample: ERR126963
Traceback (most recent call last):
File "/home/sar/miniconda3/envs/rMAP-1.0/bin/ismap", line 7, in
from ismap import main
File "/home/sar/miniconda3/envs/rMAP-1.0/bin/ismap.py", line 12, in
from mapping_to_query import map_to_is_query
File "/home/sar/miniconda3/envs/rMAP-1.0/bin/mapping_to_query.py", line 6, in
from Bio.Alphabet import generic_dna
File "/home/sar/anaconda3/envs/rMAP-1.0/lib/python3.7/site-packages/Bio/Alphabet/init.py", line 21, in
"Bio.Alphabet has been removed from Biopython. In many cases, the alphabet can simply be ignored and removed from scripts. In a few cases, you may need to specify the molecule_type as an annotation on a SeqRecord for your script to work correctly. Please see https://biopython.org/wiki/Alphabet for more information."
ImportError: Bio.Alphabet has been removed from Biopython. In many cases, the alphabet can simply be ignored and removed from scripts. In a few cases, you may need to specify the molecule_type as an annotation on a SeqRecord for your script to work correctly. Please see https://biopython.org/wiki/Alphabet for more information.

The other error found was:

WARNING! This alignment consists of closely-related and very-long sequences.
WARNING! FastTree (or other standard maximum-likelihood tools)
may not be appropriate for aligments of very closely-related sequences
like this one, as FastTree does not account for recombination or gene conversion

And the html report was not generated because the rmarkdown

packageError in library(rmarkdown) : there is no package called 'rmarkdown' was not found:

How to resolve these errors please help.

Thank you in advance

SAR

Solving environment failed with conda version 23.3.1

I'm having trouble installing the dependencies with a fresh install of conda 23.3.1 with current (1 November 2023) bioconda, conda-forge etc. The log can be found here.

Is there perhaps a container with the environment pre-installed that can be used?

Consider using relative paths for setup.sh

The current setup.sh assumes that miniconda3 is installed in the default location, which can lead to a failed installation. Please consider using the $CONDA_PREFIX environment variable. It is also a little risky running any rm -rf commands - perhaps just leaving the directory there? The initial PATH export should also be unnecessary, as the install requires activating the conda environment (i.e., it is already in the $PATH). It is also unclear why read/write/execute permissions are given system wide.

Current setup.sh:

#! /usr/bin/env bash

export PATH=~/miniconda3/bin:$PATH
cp -rf config-files ../miniconda3/envs/rMAP-1.0/
chmod 777 ../miniconda3/envs/rMAP-1.0/config-files/*
cp -rf bin/rMAP ../miniconda3/envs/rMAP-1.0/bin/
chmod 777 ../miniconda3/envs/rMAP-1.0/bin/rMAP

echo -e "rMAP is all set up!!! Run 'rMAP -h' to confirm."
touch ../clean.sh &
echo "#! /usr/bin/env bash" >../clean.sh
echo "rm -rf rMAP" >>../clean.sh
chmod 777 ../clean.sh
cd
bash clean.sh

Proposed change:

#! /usr/bin/env bash

cp -rf config-files $CONDA_PREFIX
cp -rf bin/rMAP $CONDA_PREFIX/bin/
chmod +x $CONDA_PREFIX/bin/rMAP

echo -e "rMAP is all set up!!! Run 'rMAP -h' to confirm."

If this is acceptable, I'm happy to make a PR!

Error on step: rMAP is will now perform Variant Calling

Greetings,
Thank you to the author(s) for putting this package together! I have a few issues I'm hoping you can shed light on.

We have rMAP set up on a Linux HPC, using the conda approach in the docs.

We are executing the program with:

rMAP \
-t 8 \
--input /private/data/test/ \
--output /private/results/02-rmap-debugging \
--reference /private/data/Staph_CP0000461.gbk \
--quality --assembly shovill --amr --varcall --trim --phylogeny --pangenome --gen-ele

At the rMAP step, we are seeing:

rMAP is will now perform Variant Calling...�[0m
�[1;31m.gbk found !!! Annotation Mode Enabled...Preparing Annotation files...�[0m
Traceback (most recent call last):
  File "/data/p_magnuson_lab/conda/envs/rMAP-1.0/bin/biopython.convert", line 10, in <module>
    sys.exit(main())
  File "/data/p_magnuson_lab/conda/envs/rMAP-1.0/lib/python3.7/site-packages/biopython_convert/__main__.py", line 7, in main
    convert(*get_args(sys.argv[1:]))
  File "/data/p_magnuson_lab/conda/envs/rMAP-1.0/lib/python3.7/site-packages/biopython_convert/__init__.py", line 194, in convert
    with input_path.open("r") as handle:
  File "/data/p_magnuson_lab/conda/envs/rMAP-1.0/lib/python3.7/pathlib.py", line 1208, in open
    opener=self._opener)
  File "/data/p_magnuson_lab/conda/envs/rMAP-1.0/lib/python3.7/pathlib.py", line 1063, in _opener
    return self._accessor.open(self, flags, mode)
FileNotFoundError: [Errno 2] No such file or directory: '/private/results/02-rmap-debugging/references/private/data/Staph_CP0000461.gbk'
Traceback (most recent call last):
  File "/data/p_magnuson_lab/conda/envs/rMAP-1.0/bin/biopython.convert", line 10, in <module>
    sys.exit(main())
  File "/data/p_magnuson_lab/conda/envs/rMAP-1.0/lib/python3.7/site-packages/biopython_convert/__main__.py", line 7, in main
    convert(*get_args(sys.argv[1:]))
  File "/data/p_magnuson_lab/conda/envs/rMAP-1.0/lib/python3.7/site-packages/biopython_convert/__init__.py", line 194, in convert
    with input_path.open("r") as handle:
  File "/data/p_magnuson_lab/conda/envs/rMAP-1.0/lib/python3.7/pathlib.py", line 1208, in open
    opener=self._opener)
  File "/data/p_magnuson_lab/conda/envs/rMAP-1.0/lib/python3.7/pathlib.py", line 1063, in _opener
    return self._accessor.open(self, flags, mode)
FileNotFoundError: [Errno 2] No such file or directory: '/private/results/02-rmap-debugging/references/private/data/Staph_CP0000461.gbk'
[bwa_idx_build] fail to open file '/private/results/02-rmap-debugging/references/*.fa' : No such file or directory
[E::fai_build3_core] Failed to open the file /private/results/02-rmap-debugging/references/*.fa
[faidx] Could not build fai index /private/results/02-rmap-debugging/references/*.fa.fai
�[1;34mProcessing sample: 7285-NS-10�[0m
[E::bwa_idx_load_from_disk] fail to locate the index files
[samclip] ERROR: Can't see '/private/results/02-rmap-debugging/references/*.fai' index. Run 'samtools faidx /private/results/02-rmap-debugging/references/*.fai' ?
samtools sort: failed to read header from "-"
[bam_mating_core] ERROR: Couldn't read header
samtools sort: failed to read header from "-"
[markdup] error reading header
samtools index: "/private/results/02-rmap-debugging/variant_calling/7285-NS-10.mrkdup.bam" is in a format that cannot be usefully indexed

The first error is FileNotFoundError: [Errno 2] No such file or directory: '/private/results/02-rmap-debugging/references/private/data/Staph_CP0000461.gbk', which appears to be concatenating two paths together:

/private/results/02-rmap-debugging/references
/private/data/Staph_CP0000461.gbk

I suspect the downstream errors are due to this initial error, but I have no idea why the paths would get concatenated as such. We provided the paths as absolute.

Also, is there a way to resume the job from the last point of failure, vs. repeating from the beginning?

Cheers!

Incorrect snpEff path and package in Linux install. Error on step: rMAP is will now perform Annotation of Variants

Greetings,

We found this issue in Linux install:

rMAP is will now perform Annotation of Variants...
chmod: cannot access '/home/user/miniconda3/envs/rMAP-1.0/share/snpeff-4.5covid19-1/snpEff.jar': No such file or directory
chmod: cannot access '/home/user/miniconda3/envs/rMAP-1.0/share/snpeff-4.5covid19-1/*': No such file or directory

There's two things wrong with the above:

Our conda envs aren't installed in /home/$USER, then are installed elsewhere. I guess we can create symlinks for now, but it would be better for rMAP to use $CONDA_PREFIX - I'm not sure how feasible that is for you to do, but certainly easy for me to suggest :)
Our install doesn't have snpeff-4.5covid19-1, we have snpeff-5.0-1. What is the best to work around this?

Thanks!

Solving environment failed

Hi Ivan,
I am trying to setup rMAP on my macbook M1 Big Sur 11.3.
I did the initial step but when I tried to install rMAP with the command "conda env create -n rMAP-1.0 --file rMAP-1.0-macOs-installer.yml"
I receive this :

Collecting package metadata (repodata.json): done
Solving environment: failed

ResolvePackageNotFound:

ncbi-amrfinderplus=3.8.4
python=3.7.8
freebayes=1.3.2
fasttree=2.1.10
mafft=7.471
prodigal=2.6.3
parallel=20200722
iqtree=2.0.3
quast=5.0.2
samtools=1.9
sra-tools
megahit=1.2.9
snippy=4.3.6
assembly-stats=1.0.1
r-base=4.0.2
roary=3.13.0
bwa=0.7.17
lxml=4.5.2
unicycler
vt
Then the subsequent steps cannot work like "conda activate rMAP-1.0" give
Could not find conda environment: rMAP-1.0

Could you advise what I am doing wrong and how to solve this issue?

unable to make assembly contig.fa

Dear Ivan

Im unable to get assembly and as the assembly fails all the other steps are fail.

Please help me withe error:

`rMAP is will now perform De-novo Genome Assembly using shovill...
Processing sample: 7
Unknown option: assembler
Synopsis:
  Faster de novo assembly pipeline based around Spades
Usage:
  shovill [options] --outdir DIR --R1 R1.fq.gz --R2 R2.fq.gz
Author:
  Torsten Seemann <[email protected]>
Options:
  --help          This help
  --version       Print version and exit
  --check         Check dependencies are installed
  --debug         Debug info (default: OFF)
  --cpus N        Number of CPUs to use (default: 16)
  --outdir XXX    Output folder (default: '')
  --namefmt XXX   Format of contig FASTA IDs in 'printf' style (default: 'contig%05d')
  --force         Force overwite of existing output folder (default: OFF)
  --R1 XXX        Read 1 FASTQ (default: '')
  --R2 XXX        Read 2 FASTQ (default: '')
  --depth N       Sub-sample --R1/--R2 to this depth. Disable with --depth 0 (default: 100)
  --gsize XXX     Estimated genome size <blank=AUTODETECT> (default: '')
  --kmers XXX     K-mers to use <blank=AUTO> (default: '')
  --opts XXX      Extra SPAdes options eg. --plasmid --sc ... (default: '')
  --nocorr        Disable post-assembly correction (default: OFF)
  --trim          Use Trimmomatic to remove common adaptors first (default: OFF)
  --trimopt XXX   Trimmomatic options (default: 'ILLUMINACLIP:/home/isd/miniconda3/envs/rMAP-1.0/bin/../db/trimmomatic.fa:1:30:11 LEADING:3 TRAILING:3 MINLEN:30 TOPHRED33')
  --minlen N      Minimum contig length <0=AUTO> (default: 1)
  --mincov n.nn   Minimum contig coverage <0=AUTO> (default: 2)
  --asm XXX       Spades result to correct: before_rr contigs scaffolds (default: 'contigs')
  --tmpdir XXX    Fast temporary directory (default: '/tmp')
  --ram n.nn      Try to keep RAM usage below this many GB (default: 8)
  --keepfiles     Keep intermediate files (default: OFF)
Documentation:
  https://github.com/tseemann/shovill
mv: cannot stat '/home/isd/Desktop/salmonellaTyphi/rMAP/rMAP_output_s7/assembly/7/contigs.fa': No such file or directory
/home/isd/miniconda3/envs/rMAP-1.0/bin/rMAP: line 412: /home/isd/Desktop/salmonellaTyphi/rMAP/rMAP_output_s7/assembly/7/7-assembly-stats.tab: No such file or directory
/home/isd/miniconda3/envs/rMAP-1.0/bin/rMAP: line 413: /home/isd/Desktop/salmonellaTyphi/rMAP/rMAP_output_s7/assembly/7/7-assembly-stats.txt: No such file or directory
Your Assembly Run Took Approximately: 0 seconds.

My command is ::

rMAP -t 8 --reference /home/isd/Desktop/salmonellaTyphi/rMAP/sequenceSalmonella.gbk --input /home/isd/Desktop/salmonellaTyphi/rMAP/rMAP_datasets --output rMa_output_sal --assembly shovill --amr --varcall --trim --phylogeny --pangenome --gen-ele`

Even when reached to variant calling step it gave the following error:

rMAP is will now perform Variant Calling...
.gbk found !!! Annotation Mode Enabled...Preparing Annotation files...
Traceback (most recent call last):
  File "/home/isd/miniconda3/envs/rMAP-1.0/bin/biopython.convert", line 10, in <module>
    sys.exit(main())
  File "/home/isd/miniconda3/envs/rMAP-1.0/lib/python3.7/site-packages/biopython_convert/__main__.py", line 7, in main
    convert(*get_args(sys.argv[1:]))
  File "/home/isd/miniconda3/envs/rMAP-1.0/lib/python3.7/site-packages/biopython_convert/__init__.py", line 194, in convert
    with input_path.open("r") as handle:
  File "/home/isd/miniconda3/envs/rMAP-1.0/lib/python3.7/pathlib.py", line 1208, in open
    opener=self._opener)
  File "/home/isd/miniconda3/envs/rMAP-1.0/lib/python3.7/pathlib.py", line 1063, in _opener
    return self._accessor.open(self, flags, mode)
FileNotFoundError: [Errno 2] No such file or directory: 'rMa_output_sal/references/home/isd/Desktop/salmonellaTyphi/rMAP/sequenceSalmonella.gbk'
Traceback (most recent call last):
  File "/home/isd/miniconda3/envs/rMAP-1.0/bin/biopython.convert", line 10, in <module>
    sys.exit(main())
  File "/home/isd/miniconda3/envs/rMAP-1.0/lib/python3.7/site-packages/biopython_convert/__main__.py", line 7, in main
    convert(*get_args(sys.argv[1:]))
  File "/home/isd/miniconda3/envs/rMAP-1.0/lib/python3.7/site-packages/biopython_convert/__init__.py", line 194, in convert
    with input_path.open("r") as handle:
  File "/home/isd/miniconda3/envs/rMAP-1.0/lib/python3.7/pathlib.py", line 1208, in open
    opener=self._opener)
  File "/home/isd/miniconda3/envs/rMAP-1.0/lib/python3.7/pathlib.py", line 1063, in _opener
    return self._accessor.open(self, flags, mode)
FileNotFoundError: [Errno 2] No such file or directory: 'rMa_output_sal/references/home/isd/Desktop/salmonellaTyphi/rMAP/sequenceSalmonella.gbk'
[bwa_idx_build] fail to open file 'rMa_output_sal/references/*.fa' : No such file or directory
[E::fai_build3_core] Failed to open the file rMa_output_sal/references/*.fa
[faidx] Could not build fai index rMa_output_sal/references/*.fa.fai
Processing sample: 7
[E::bwa_idx_load_from_disk] fail to locate the index files
[samclip] ERROR: Can't see 'rMa_output_sal/references/*.fai' index. Run 'samtools faidx rMa_output_sal/references/*.fai' ?
samtools sort: failed to read header from "-"
[bam_mating_core] ERROR: Couldn't read header
samtools sort: failed to read header from "-"
[markdup] error reading header
samtools index: "rMa_output_sal/variant_calling/7.mrkdup.bam" is in a format that cannot be usefully indexed
Your Alignment Run Took Approximately: 1 seconds.

rMAP is will now perform Variant Calling ...
Processing sample: 7
could not open rMa_output_sal/references/*.fa
normalize v0.5

options:     input VCF file                                  -
         [o] output VCF file                                 -
         [w] sorting window size                             10000
         [n] no fail on reference inconsistency for non SNPs false
         [q] quiet                                           false
         [d] debug                                           false
         [r] reference FASTA file                            rMa_output_sal/references/*.fa

Failed to read from rMa_output_sal/variant_calling/7.raw.vcf: unknown file type
[bcf_ordered_reader.cpp:49 BCFOrderedReader] Not a VCF/BCF file: -
Failed to read from standard input: unknown file type
Your Variant Call Run Took Approximately: 0 seconds.


rMAP is will now perform Annotation of Variants...
chmod: cannot access '/home/isd/miniconda3/envs/rMAP-1.0/share/snpeff-4.5covid19-1/snpEff.jar': No such file or directory
chmod: cannot access '/home/isd/miniconda3/envs/rMAP-1.0/share/snpeff-4.5covid19-1/*': No such file or directory
Processing sample: 7
cat: /home/isd/miniconda3/envs/rMAP-1.0/share/snpeff-4.5covid19-1/snpEff.config: No such file or directory
cp: cannot stat 'rMa_output_sal/references/.fa': No such file or directory
cp: cannot stat 'rMa_output_sal/references/.gff3': No such file or directory
gzip: rMa_output_sal/variant_calling/snps/7/references/ref/genes.gff: No such file or directory
00:00:00	SnpEff version SnpEff 5.0e (build 2021-03-09 06:01), by Pablo Cingolani
00:00:00	Command: 'build'
00:00:00	Building database for 'ref'
00:00:00	Reading configuration file 'rMa_output_sal/variant_calling/snps/7/references/snpEff.config'. Genome: 'ref'
00:00:00	Reading config file: /home/isd/rMa_output_sal/variant_calling/snps/7/references/snpEff.config
java.lang.RuntimeException: Error parsing property 'ref..codonTable'. No such codon table 'Bacterial_and_Plant_Plastid'
	at org.snpeff.snpEffect.Config.createCodonTables(Config.java:173)
	at org.snpeff.snpEffect.Config.readConfig(Config.java:662)
	at org.snpeff.snpEffect.Config.init(Config.java:487)
	at org.snpeff.snpEffect.Config.<init>(Config.java:121)
	at org.snpeff.SnpEff.loadConfig(SnpEff.java:449)
	at org.snpeff.snpEffect.commandLine.SnpEffCmdBuild.run(SnpEffCmdBuild.java:365)
	at org.snpeff.SnpEff.run(SnpEff.java:1188)
	at org.snpeff.SnpEff.main(SnpEff.java:168)
00:00:00	Logging
00:00:02	Done.
Error: Unable to access jarfile /home/isd/miniconda3/envs/rMAP-1.0/share/snpeff-4.5covid19-1/snpEff.jar
Loading reference: rMa_output_sal/variant_calling/snps/7/references/genomes/ref.fa

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Could not read file 'rMa_output_sal/variant_calling/snps/7/references/genomes/ref.fa': No such file or directory
STACK: Error::throw
STACK: Bio::Root::Root::throw /home/isd/miniconda3/envs/rMAP-1.0/lib/site_perl/5.26.2/Bio/Root/Root.pm:447
STACK: Bio::Root::IO::_initialize_io /home/isd/miniconda3/envs/rMAP-1.0/lib/site_perl/5.26.2/Bio/Root/IO.pm:268
STACK: Bio::SeqIO::_initialize /home/isd/miniconda3/envs/rMAP-1.0/lib/site_perl/5.26.2/Bio/SeqIO.pm:513
STACK: Bio::SeqIO::fasta::_initialize /home/isd/miniconda3/envs/rMAP-1.0/lib/site_perl/5.26.2/Bio/SeqIO/fasta.pm:87
STACK: Bio::SeqIO::new /home/isd/miniconda3/envs/rMAP-1.0/lib/site_perl/5.26.2/Bio/SeqIO.pm:389
STACK: Bio::SeqIO::new /home/isd/miniconda3/envs/rMAP-1.0/lib/site_perl/5.26.2/Bio/SeqIO.pm:435
STACK: /home/isd/miniconda3/envs/rMAP-1.0/bin/snippy-vcf_to_tab:39
-----------------------------------------------------------
Your Variant Annotation Run Took Approximately: 2 seconds.
chmod: cannot access '/home/isd/miniconda3/envs/rMAP-1.0/share/snpeff-4.5covid19-1/snpEff.jar': No such file or directory
chmod: cannot access '/home/isd/miniconda3/envs/rMAP-1.0/share/snpeff-4.5covid19-1/*': No such file or directory
cat: /home/isd/miniconda3/envs/rMAP-1.0/share/snpeff-4.5covid19-1/snpEff.config: No such file or directory
cp: cannot stat 'rMa_output_sal/references/.fa': No such file or directory
cp: cannot stat 'rMa_output_sal/references/.gff3': No such file or directory
gzip: rMa_output_sal/variant_calling/snps/combined-snps/references/ref/genes.gff: No such file or directory
00:00:00	SnpEff version SnpEff 5.0e (build 2021-03-09 06:01), by Pablo Cingolani
00:00:00	Command: 'build'
00:00:00	Building database for 'ref'
00:00:00	Reading configuration file 'rMa_output_sal/variant_calling/snps/combined-snps/references/snpEff.config'. Genome: 'ref'
00:00:00	Reading config file: /home/isd/rMa_output_sal/variant_calling/snps/combined-snps/references/snpEff.config
java.lang.RuntimeException: Error parsing property 'ref..codonTable'. No such codon table 'Bacterial_and_Plant_Plastid'
	at org.snpeff.snpEffect.Config.createCodonTables(Config.java:173)
	at org.snpeff.snpEffect.Config.readConfig(Config.java:662)
	at org.snpeff.snpEffect.Config.init(Config.java:487)
	at org.snpeff.snpEffect.Config.<init>(Config.java:121)
	at org.snpeff.SnpEff.loadConfig(SnpEff.java:449)
	at org.snpeff.snpEffect.commandLine.SnpEffCmdBuild.run(SnpEffCmdBuild.java:365)
	at org.snpeff.SnpEff.run(SnpEff.java:1188)
	at org.snpeff.SnpEff.main(SnpEff.java:168)
00:00:00	Logging
00:00:01	Done.
Error: Unable to access jarfile /home/isd/miniconda3/envs/rMAP-1.0/share/snpeff-4.5covid19-1/snpEff.jar
Loading reference: rMa_output_sal/variant_calling/snps/combined-snps/references/genomes/ref.fa

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Could not read file 'rMa_output_sal/variant_calling/snps/combined-snps/references/genomes/ref.fa': No such file or directory
STACK: Error::throw
STACK: Bio::Root::Root::throw /home/isd/miniconda3/envs/rMAP-1.0/lib/site_perl/5.26.2/Bio/Root/Root.pm:447
STACK: Bio::Root::IO::_initialize_io /home/isd/miniconda3/envs/rMAP-1.0/lib/site_perl/5.26.2/Bio/Root/IO.pm:268
STACK: Bio::SeqIO::_initialize /home/isd/miniconda3/envs/rMAP-1.0/lib/site_perl/5.26.2/Bio/SeqIO.pm:513
STACK: Bio::SeqIO::fasta::_initialize /home/isd/miniconda3/envs/rMAP-1.0/lib/site_perl/5.26.2/Bio/SeqIO/fasta.pm:87
STACK: Bio::SeqIO::new /home/isd/miniconda3/envs/rMAP-1.0/lib/site_perl/5.26.2/Bio/SeqIO.pm:389
STACK: Bio::SeqIO::new /home/isd/miniconda3/envs/rMAP-1.0/lib/site_perl/5.26.2/Bio/SeqIO.pm:435
STACK: /home/isd/miniconda3/envs/rMAP-1.0/bin/snippy-vcf_to_tab:39
-----------------------------------------------------------
Your Run Took Approximately: 0 seconds.
VCF Annotation Successfuly Completed in: 0 seconds.
Your Variant Call Run Took Approximately: 1 seconds.

Can you please help me with these errors?

Thank you in advance!

Best wishes,

Jia