GithubHelp home page GithubHelp logo

mvirs's Introduction

mVIRs: Localisation of inducible prophages using NGS data

mVIRs is a command-line tool that localizes and extracts genome sequences of inducible prophages in bacterial host genomes using paired-end DNA sequencing data as input. The approach relies on identifying DNA segments that are predicted to exist in a circularized or concatenated form upon induction. To achieve this, mVIRs uses information on the orientation of short, paired-end sequencing (e.g., Illumina) reads that are aligned to the genome of a lysogenic host as a reference. The identified segments can be length-filtered and classified by prediction tools (e.g., VirSorter2, VirFinder, VIBRANT or Prophage Hunter), to identify putative prophage candidates.

The tool was developed by Hans-Joachim Ruscheweyh, Mirjam Zünd and Shinichi Sunagawa. It is distributed under License GPL v3.

If you use mVIRs, please cite:

Zünd M, Ruscheweyh HJ, Field CM, Meyer N, Cuenca M, Hoces D, Hardt WD, Sunagawa S. High throughput sequencing provides exact genomic locations of inducible prophages and accurate phage-to-host ratios in gut microbial strains. Microbiome, 9: 77, 2021.

Analyses in the publication were executed using version 1.0.0.

Questions/Comments? Write a github issue.

Installation

The tools is written in Python and has the following dependencies:

Installation using conda

The easiest way to install mVIRs is to use the conda package manager using the bioconda channel, which will automatically create an environment with the correct versions of the dependencies and then install mVIRs using pip.

# Install dependencies

$ conda create -n mvirs python==3.7 pip bwa samtools pysam -c bioconda 
$ conda activate mvirs
$ python -m pip install mvirs

# Test installation

$ mvirs -h

Program: mVIRs - Localisation of inducible prophages using NGS data
Version: 1.1.1
Reference: Zünd, Ruscheweyh, et al.
High throughput sequencing provides exact genomic locations of inducible
prophages and accurate phage-to-host ratios in gut microbial strains.
Microbiome (2021). doi:10.1186/s40168-021-01033-w

Usage: mvirs <command> [options]
Command:

    index   create index files for reference used in the
            mvirs oprs routine

    oprs    align reads against reference and used clipped
            alignment positions and OPRs to extract potential
            prophages

    test    run mVIRs for a public dataset
  

Manual installation

Although installation using conda is recommended, manual installation of dependencies is also possible. pip is then used to install mVIRs:

# Manually install dependencies
...

# Install mVIRs
$ python -m pip install mvirs

# Test installation
$ mvirs -h

...

Running mVIRs

The mVIRs toolkit includes three commands, index, oprs and test. The index command takes a reference sequence file as input and builds the reference database files that are needed for the execution of the oprs command. The oprs command aligns paired-end reads against the reference database to detect so called outward-oriented paired-end reads (OPRs) and uses soft-clipped alignments (clipped reads) to identify the location and extract the sequence of potential prophages. The test function executes mVIRs on a test dataset that is downloaded as part of this function.

mvirs index

This step takes a FASTA-formatted reference sequence file as input and builds an index using the bwa index command. This command needs to be executed before running the oprs command.

$ mvirs index

Program: mVIRs - Localisation of inducible prophages using NGS data
Version: 1.1.1
Reference: Zünd, Ruscheweyh, et al.
High throughput sequencing provides exact genomic locations of inducible
prophages and accurate phage-to-host ratios in gut microbial strains.
Microbiome, 9: 77, 2021. doi:10.1186/s40168-021-01033-w

Usage: mvirs index [options]

    Input:
        -f  FILE   Reference FASTA file. Can be gzipped. [Required]

Example

mvirs index reference.fasta

mvirs oprs

This step takes paired-end read files as input (one for each forward and reverse reads) and the name of the reference database produced by mvirs index. It aligns the reads against the reference database and uses the alignment information to identify potential prophage sequences within the reference genome using coverage information from OPRS and clipped reads.

$ mvirs oprs

Program: mVIRs - Localisation of inducible prophages using NGS data
Version: 1.1.1
Reference: Zünd, Ruscheweyh, et al.
High throughput sequencing provides exact genomic locations of inducible
prophages and accurate phage-to-host ratios in gut microbial strains.
Microbiome, 9: 77, 2021. doi:10.1186/s40168-021-01033-w

Usage: mvirs oprs [options]

    Input:
        -f  FILE   Forward reads file. FastA/Q. Can be gzipped. [Required]
        -r  FILE   Reverse reads file. FastA/Q. Can be gzipped. [Required]
        -db FILE   Reference database file (prefix) created by mvirs index. [Required]

    Output:
        -o  PATH   Prefix for output files. [Required]

    Options:
        -t  INT    Number of threads. [1] 
        -ml INT    Minimum sequence length for extraction. [4000]
        -ML INT    Maximum sequence length for extraction. [800000]
        -m         Allow full contigs/scaffolds/chromosomes to be reported 
	           (When OPRs and clipped reads are found at the start and end of contigs/scaffolds/chromosomes)

Example

Run mvirs oprs on the read files (reads.1.fq.gz and reads.2.fq.gz) using the same reference sequence file (reference.fasta) that was used as input for mvirs index.

$ mvirs oprs -f reads.1.fq.gz -r reads.2.fq.gz -db reference.fasta -o mvirs.output

# Will produce the following files (see below for explanation of the files)
# mvirs.output.bam --> Alignments
# mvirs.output.oprs --> The OPR positions in a tab separated file
# mvirs.output.clipped --> The clipped alignment positions in a tab separated file
# mvirs.output.fasta --> The potential prophage regions as a fasta file

mvirs test

The mvirs test command downloads example read and reference files and launches the default mvirs oprs using the downloaded files as input.

$ mvirs.py test

2021-09-20 08:44:31,283 INFO: Starting mVIRs
Program: mVIRs - Localisation of inducible prophages using NGS data
Version: 1.1.1
Reference: Zünd, Ruscheweyh, et al.
High throughput sequencing provides exact genomic locations of inducible
prophages and accurate phage-to-host ratios in gut microbial strains.
Microbiome, 9: 77, 2021. doi:10.1186/s40168-021-01033-w
Usage: mvirs test [options]

    Input:
        -o  PATH   Output folder. [Required]

Example

$ mvirs test ~/mVIRs_test/
# Will produce the following files (see below for explanation of the files)
# mvirs.output.bam --> Alignments
# mvirs.output.oprs --> The OPR positions in a tab separated file
# mvirs.output.clipped --> The clipped alignment positions in a tab separated file
# mvirs.output.fasta --> The potential prophage regions as a fasta file

Output files

mvirs oprs produces four output files (mvirs.output.bam. mvirs.output.oprs, mvirs.output.clipped and mvirs.output.fasta) of which we will explain the latter three.

mvirs.output.fasta

The mvirs.output.fasta is a FASTA-formatted file with the potential prophage sequences that were extracted from the reference genome. The FASTA headers include information on the

  • source scaffold
  • start and end coordinates of the extracted sequence
  • number of supporting OPRs
  • number of supporting clipped alignments
  • fraction of the scaffold length that is covered by the extracted region

Example

>SalmonellaLT2:1213986-1255756	ORPs=3868-HSs=1473-SF=0.852597
ATTCGTAATGCGAAGGTCGTAGGTTCGACTCCTATTATCGGCACCAGTTAAATCAAATACTTAC...

# SalmonellaLT2:1213986-1255756 --> Scaffold:START-STOP
# ORPs=3868 --> Number of OPRs = 3868
# HSs=1473 --> Number of hard- and soft-clipped alignments
# SF=0.852597 --> 0.85% of the scaffold length is covered by the extracted region

mvirs.output.oprs

The mvirs.output.oprs file lists the inserts of paired-end reads that align either with an unusual orientation (e.g. OPR or SAME) or have an unexpected large insert size (IPR) when compared to the estimated insert size (See section Concepts for the definition of OPR, SAME and IPR)

The columns of the file are:

  • INSERTNAME: The name of the insert
  • REFERENCE: Name of the scaffold/genomic region of the reference sequence
  • INSERT_SIZE: The size of the insert
  • R1_ORIENTATION: The orientation of the R1 read on the reference
  • R2_ORIENTATION: The orientation of the R2 read on the reference
  • BWA_SCORE: The sum of scores reported by bwa for the insert
  • R1_START: The leftmost coordinate on the reference where the R1 read aligns
  • R2_START: The leftmost coordinate on the reference where the R2 read aligns
  • R1_ALNLENGTH: The length of the alignment of the R1 read on the reference
  • R2_ALNLENGTH: The length of the alignment of the R2 read on the reference
  • INSERT_ORIENTATION: The orientation of the both reads to each other. Can either be IPR, OPR or SAME

An example output is below:

#MIN_REASONABLE_INSERTSIZE=0
#MAX_REASONABLE_INSERTSIZE=1628
#INSERTNAME REFERENCE INSERT_SIZE R1_ORIENTATION R2_ORIENTATION BWA_SCORE R1_START R2_START R1_ALNLENGTH R2_ALNLENGTH INSERT_ORIENTATION
K00206:180:H2CJWBBXY:8:1107:6644:49230 SalmonellaLT2 41477 forward reverse 297 1255437 1214111 151 147 OPR
K00206:180:H2CJWBBXY:8:1107:7182:12181 SalmonellaLT2 41392 forward reverse 288 1255606 1214365 151 143 OPR
K00206:180:H2CJWBBXY:8:1107:7436:46873 SalmonellaLT2 41449 reverse forward 302 1214126 1255424 151 151 OPR
K00206:180:H2CJWBBXY:8:1107:8582:43304 SalmonellaLT2 1351429 reverse reverse 225 4216570 2865291 150 80 SAME
K00206:180:H2CJWBBXY:8:1107:9404:2176 SalmonellaLT2 41222 forward reverse 291 1255124 1214053 151 145 OPR
K00206:180:H2CJWBBXY:8:1107:10470:14959 SalmonellaLT2 41453 reverse forward 302 1214268 1255570 151 151 OPR
K00206:180:H2CJWBBXY:8:1107:10724:29958 SalmonellaLT2 140 reverse forward 201 475072 475135 140 76 OPR

mvirs.output.clipped

The mvirs.output.clipped file contains the name, orientation and position of aligned reads that were clipped (i.e., the read could only be partially aligned).

The columns of the file are:

  • INSERTNAME: Name of the insert
  • READ ORIENTATION: R1 or R2
  • HARD/SOFTCLIP: Reported clip type by BWA (Soft --> longer part of the alignment. Hard --> shorter part of the alignment)
  • DIRECTION: Direction of the alignment with respect to the reference sequence
  • POSITION: Leftmost coordinate of the alignment on the reference sequence
  • REFERENCE: Name of the scaffold/genomic region of the reference sequence

An example output is below:

#INSERT READORIENTATION HARD/SOFTCLIP DIRECTION POSITION REFERENCE
K00206:180:H2CJWBBXY:8:1101:1336:1982 R2 S -> 252072 SalmonellaLT2
K00206:180:H2CJWBBXY:8:1101:1418:9895 R2 S <- 1946641 SalmonellaLT2
K00206:180:H2CJWBBXY:8:1101:1468:42231 R2 S -> 1492581 SalmonellaLT2
K00206:180:H2CJWBBXY:8:1101:1864:3670 R2 S <- 2886283 SalmonellaLT2
K00206:180:H2CJWBBXY:8:1101:1915:46346 R1 S <- 1255756 SalmonellaLT2
K00206:180:H2CJWBBXY:8:1101:1915:46346 R1 H -> 1213986 SalmonellaLT2
K00206:180:H2CJWBBXY:8:1101:1996:25738 R2 S <- 4147832 SalmonellaLT2
K00206:180:H2CJWBBXY:8:1101:2037:2457 R2 S -> 465110 SalmonellaLT2
K00206:180:H2CJWBBXY:8:1101:2087:10422 R1 S <- 4769629 SalmonellaLT2

Concepts

IPRs, OPRs and SAME

A paired-end read can align in the following orientations:

# IPRs --> If the insert orientation matches the reference sequence

REFERENCE ---------------------------------------
R1              -------->
R2                        <--------

# OPRs --> If the insert orientation does not match the reference genome (e.g., paired-end reads from circularized phage genomes spanning across attP sites)

REFERENCE ---------------------------------------
R1              <--------
R2                        -------->


# SAME -> If the insert orientation does not match the reference genome (e.g., due to inversions)

REFERENCE ---------------------------------------
R1              -------->
R2                        -------->
 

This tool reports IPRs with unreasonable insert sizes and OPRs.

Algorithm

The algorithm for potential prophage genome detection consists of three steps.The first two steps scan the alignment file (mvirs.output.bam) and report OPRs and clipped alignments. The last step uses the information generated in the first two steps to detect potential prophages and report their sequences as a FASTA formatted output file.

1. Read Alignment Orientation

  1. Reads are conceptually paired as inserts according to their naming.
  2. Upper and lower maxima for reasonable insert sizes are estimated using the mean insert size +/- seven standard deviations for uniquely mapping inward-oriented paired-end reads (IPRs).
  3. For each insert:
  • Find the best-scoring alignment pairs within 3% of the best alignment score.
  • Report the insert as OPR if there is no IPR with a reasonable insert size within the 3% cutoff and if the OPR is the best scoring alignment.

2. Clipped Alignments

The alignment of a read is clipped if it can not be fully aligned against the reference genome. Two reasons for clipped alignments are:

  1. The representation of circular bacterial chromosome sequence in linear form. Reads that align at the beginning of the linearised chromosome (here named reference) will also align at the end. A single full length alignment is not possible. In the example below, the first three bases of a given read align at the end of the reference, the last 3 bases at the beginning:

    REFERENCE ---------------------------------------
    READ      ---                                 ---
              456                                 123  
    

    Similarly, if a read originates from a circularized phage genome that is also encoded as a prophage in the reference, the reads will align at the end and the beginning of the integrated prophage genome:

    # Phage genome integrated in reference chromosome (denoted as P)
    REFERENCE ----------------PPPPPPPPPPP------------
    READ                      ---     ---
                              456     123
    
  2. The reference genome contains an element (e.g., an integrated phage), but the read originates from a genome of a naive host (i.e., without the prophage). In the example below, the read originates from a naive host that does not contain the prophage genome (denoted as P), and thus flanks the phage integration site.

    REFERENCE ----------------PPPPPPPP---------------
    READ                   ---        ---
                           123        456
    

3. Identification of potential prophages

Regions where an accumulation of clipped alignments and OPRs are detected are reported as potential phages in the output fasta file.

The start and end positions of OPRs are distributed around the phage insertion sites. As such, they are indicative for potential prophages; however, they cannot be used to determine their exact positions. The locations of clipped alignments are precise; however, they often miss part of the alignment due to ambiguous alignments or not passing the criteria for minimal alignment lengths. Therefore, a genomic region requires the support from at least 1 OPR and at least 1 clipped alignment to be considered as a potential inducible prophage. Furthermore, potential sites must have in total a sum of 5 or more OPRs and clipped alignments.

Potential prophage regions are also filtered by length, with a minimum requirement of 4kb and a maximum of 800kb. These limits can be modified with the -ml (minimum) and -ML (maximum) parameters.

If the -m flag is set, potential prophage regions that cover entire contigs/scaffolds will be reported, otherwise they will be discarded.

mvirs's People

Contributors

cmfield avatar hjruscheweyh avatar mirjamzuend avatar sushiatgit avatar valentynbez avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

mvirs's Issues

OPRS command: Is it possible to run with multiple forward/reverse reads?

Cool tool! I really like the idea.

I am just wondering if I can include multiple viral sequence runs simultaneously with OPRS command.
Was not clear from wording in the manual. I am guessing 'no' due to fact they are positional arguments?

If not, would it be possible to add in the future?

Thanks for making this tool!

Issue with 'read_seq_file' function.

I'm using unziped fastq files for my read sequence files. I get this error.
File "mVIRs/mVIRs/mvirs.py", line 515, in read_seq_file if lines[0].startswith('@'): IndexError: list index out of range

I think it is because the function only works on gzipped files due to function:
def read_seq_file(seq_file): lines = [] if seq_file.endswith('gz'): with gzip.open(seq_file, 'rt') as handle: for line in handle: lines.append(line.strip()) if len(lines) == 1000: break modulo = 2 if lines[0].startswith('@'): modulo = 4 seq_headers = [] for cnt, line in enumerate(lines): if cnt % modulo == 0: seq_headers.append(line.split()[0]) return seq_headers

I gzipped the read and no longer had the issue.

Failure to install `mVIRs`

Python version: 3.8.0
bzip2 version : 1.0.8

After running command pip install mvirs within environment with installed dependencies I receive the following error:

Collecting mvirs
  Using cached mVIRs-1.1.1.tar.gz (37 kB)
  Preparing metadata (setup.py) ... done
Collecting pysam
  Using cached pysam-0.19.1.tar.gz (3.9 MB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [139 lines of output]
      # pysam: no cython available - using pre-compiled C
      # pysam: htslib mode is shared
      # pysam: HTSLIB_CONFIGURE_OPTIONS=None
      # pysam: (sysconfig) CC=gcc -pthread -B /nfs/home/vbezshapkin/miniconda3/envs/mvirs/compiler_compat -Wl,--sysroot=/
      # pysam: (sysconfig) CFLAGS=-Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes
      # pysam: (sysconfig) LDFLAGS=-L/nfs/home/vbezshapkin/miniconda3/envs/mvirs/lib -Wl,-rpath=/nfs/home/vbezshapkin/miniconda3/envs/mvirs/lib -Wl,--no-as-needed -Wl,--sysroot=/
      checking for gcc... gcc -pthread -B /nfs/home/vbezshapkin/miniconda3/envs/mvirs/compiler_compat -Wl,--sysroot=/
      checking whether the C compiler works... yes
      checking for C compiler default output file name... a.out
      checking for suffix of executables...
      checking whether we are cross compiling... no
      checking for suffix of object files... o
      checking whether we are using the GNU C compiler... yes
      checking whether gcc -pthread -B /nfs/home/vbezshapkin/miniconda3/envs/mvirs/compiler_compat -Wl,--sysroot=/ accepts -g... yes
      checking for gcc -pthread -B /nfs/home/vbezshapkin/miniconda3/envs/mvirs/compiler_compat -Wl,--sysroot=/ option to accept ISO C89... none needed
      checking for ranlib... ranlib
      checking for grep that handles long lines and -e... /usr/bin/grep
      checking for C compiler warning flags... unknown
      checking for pkg-config... /usr/bin/pkg-config
      checking pkg-config is at least version 0.9.0... yes
      checking for special C compiler options needed for large files... no
      checking for _FILE_OFFSET_BITS value needed for large files... no
      checking shared library type for unknown-Linux... plain .so
      checking whether the compiler accepts -fvisibility=hidden... yes
      checking how to run the C preprocessor... gcc -pthread -B /nfs/home/vbezshapkin/miniconda3/envs/mvirs/compiler_compat -Wl,--sysroot=/ -E
      checking for egrep... /usr/bin/grep -E
      checking for ANSI C header files... yes
      checking for sys/types.h... yes
      checking for sys/stat.h... yes
      checking for stdlib.h... yes
      checking for string.h... yes
      checking for memory.h... yes
      checking for strings.h... yes
      checking for inttypes.h... yes
      checking for stdint.h... yes
      checking for unistd.h... yes
      checking for stdlib.h... (cached) yes
      checking for unistd.h... (cached) yes
      checking for sys/param.h... yes
      checking for getpagesize... yes
      checking for working mmap... yes
      checking for gmtime_r... yes
      checking for fsync... yes
      checking for drand48... yes
      checking for srand48_deterministic... no
      checking whether fdatasync is declared... yes
      checking for fdatasync... yes
      checking for library containing log... -lm
      checking for zlib.h... yes
      checking for inflate in -lz... yes
      checking for library containing recv... none required
      checking for bzlib.h... no
      checking for BZ2_bzBuffToBuffCompress in -lbz2... yes
      configure: error: libbzip2 development files not found

      The CRAM format may use bzip2 compression, which is implemented in HTSlib
      by using compression routines from libbzip2 <http://www.bzip.org/>.

      Building HTSlib requires libbzip2 development files to be installed on the
      build machine; you may need to ensure a package such as libbz2-dev (on Debian
      or Ubuntu Linux) or bzip2-devel (on RPM-based Linux distributions or Cygwin)
      is installed.

      Either configure with --disable-bz2 (which will make some CRAM files
      produced elsewhere unreadable) or resolve this error to build HTSlib.
      checking for gcc... gcc -pthread -B /nfs/home/vbezshapkin/miniconda3/envs/mvirs/compiler_compat -Wl,--sysroot=/
      checking whether the C compiler works... yes
      checking for C compiler default output file name... a.out
      checking for suffix of executables...
      checking whether we are cross compiling... no
      checking for suffix of object files... o
      checking whether we are using the GNU C compiler... yes
      checking whether gcc -pthread -B /nfs/home/vbezshapkin/miniconda3/envs/mvirs/compiler_compat -Wl,--sysroot=/ accepts -g... yes
      checking for gcc -pthread -B /nfs/home/vbezshapkin/miniconda3/envs/mvirs/compiler_compat -Wl,--sysroot=/ option to accept ISO C89... none needed
      checking for ranlib... ranlib
      checking for grep that handles long lines and -e... /usr/bin/grep
      checking for C compiler warning flags... unknown
      checking for pkg-config... /usr/bin/pkg-config
      checking pkg-config is at least version 0.9.0... yes
      checking for special C compiler options needed for large files... no
      checking for _FILE_OFFSET_BITS value needed for large files... no
      checking shared library type for unknown-Linux... plain .so
      checking whether the compiler accepts -fvisibility=hidden... yes
      checking how to run the C preprocessor... gcc -pthread -B /nfs/home/vbezshapkin/miniconda3/envs/mvirs/compiler_compat -Wl,--sysroot=/ -E
      checking for egrep... /usr/bin/grep -E
      checking for ANSI C header files... yes
      checking for sys/types.h... yes
      checking for sys/stat.h... yes
      checking for stdlib.h... yes
      checking for string.h... yes
      checking for memory.h... yes
      checking for strings.h... yes
      checking for inttypes.h... yes
      checking for stdint.h... yes
      checking for unistd.h... yes
      checking for stdlib.h... (cached) yes
      checking for unistd.h... (cached) yes
      checking for sys/param.h... yes
      checking for getpagesize... yes
      checking for working mmap... yes
      checking for gmtime_r... yes
      checking for fsync... yes
      checking for drand48... yes
      checking for srand48_deterministic... no
      checking whether fdatasync is declared... yes
      checking for fdatasync... yes
      checking for library containing log... -lm
      checking for zlib.h... yes
      checking for inflate in -lz... yes
      checking for library containing recv... none required
      checking for bzlib.h... no
      checking for BZ2_bzBuffToBuffCompress in -lbz2... yes
      configure: error: libbzip2 development files not found

      The CRAM format may use bzip2 compression, which is implemented in HTSlib
      by using compression routines from libbzip2 <http://www.bzip.org/>.

      Building HTSlib requires libbzip2 development files to be installed on the
      build machine; you may need to ensure a package such as libbz2-dev (on Debian
      or Ubuntu Linux) or bzip2-devel (on RPM-based Linux distributions or Cygwin)
      is installed.

      Either configure with --disable-bz2 (which will make some CRAM files
      produced elsewhere unreadable) or resolve this error to build HTSlib.
      Makefile:122: htscodecs.mk: No such file or directory
      config.mk:2: *** Resolve configure error first.  Stop.
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-install-tg_t_11v/pysam_72162761c7fe4404a7b02fd709decbd2/setup.py", line 375, in <module>
          htslib_make_options = run_make_print_config()
        File "/tmp/pip-install-tg_t_11v/pysam_72162761c7fe4404a7b02fd709decbd2/setup.py", line 74, in run_make_print_config
          stdout = subprocess.check_output(["make", "-s", "print-config"])
        File "/nfs/home/vbezshapkin/miniconda3/envs/mvirs/lib/python3.8/subprocess.py", line 411, in check_output
          return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
        File "/nfs/home/vbezshapkin/miniconda3/envs/mvirs/lib/python3.8/subprocess.py", line 512, in run
          raise CalledProcessError(retcode, process.args,
      subprocess.CalledProcessError: Command '['make', '-s', 'print-config']' returned non-zero exit status 2.
      # pysam: htslib configure options: None
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

OPR unable to identify input read mates

Hello,

I can confirm this error #2 (comment) in non-compressed fastq files.

However, when using compressed files it is consistently failing with the following error:

2021-05-21 09:45:59,651 INFO: Starting mVIRs
2021-05-21 09:45:59,677 ERROR: Names of input reads do not match. 
(@WINDU:94:C823GANXX:5:1101:1183:1967/1 != @WINDU:94:C823GANXX:5:1101:1183:1967/2). Check if read files belong together. Quitting
2021-05-21 09:45:59,678 INFO: Finishing mVIRs

However, the read names are actually the same, except for the mate pair identifier at the end. This error happens with a variety of FASTQ input files, and is not specific for a single FASTQ file pair.

@WINDU:94:C823GANXX:5:1101:1183:1967/1 
@WINDU:94:C823GANXX:5:1101:1183:1967/2

Does mVIRs requires a specific FASTQ header modification?
Cheers!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.