cmayer / mitogeneextractor Goto Github PK

The MitoGeneExtractor can be used to extract protein coding mitochondrial genes, such as COI and others from short and long read sequencing libraries.

License: GNU Affero General Public License v3.0

C++ 56.80% C 39.38% Makefile 3.26% Python 0.47% Shell 0.10%

assembly bioinformatics mitochondrial-genes mitogenome plastome sequencing

mitogeneextractor's Introduction

MitoGeneExtractor

MitoGeneExtractor can be used to conveniently extract mitochondrial protein-coding genes from next-generation sequencing libraries. Mitochondrial reads are often found as byproduct in sequencing libraries obtained from whole-genome sequencing, RNA-sequencing or various kinds of reduced representation libraries (e.g. hybrid enrichment libraries).

List of use recommended use cases:

Extract mitochondrial protein-coding genes across a broad taxonomic range from sequencing libraries. Successfully tested for
- Illumina short read libraries, namely whole genomic, transcriptomic, reduced representation (e.g. hybrid enrichment libraries, RAD sequencing) libraries.
- PacBio long read libraries.
Mine plastome protein-coding genes (matK or rbcL) from sequencing libraries Successfully tested for
- Illumina short read libraries (genomic and transcriptomic)
Mine mitochondrial protein-coding genes from transcriptome assemblies. While this sometimes works, we recommended to mine these genes from quality trimmed sequencing reads rather than from the assembly of the reads. We saw example for which a reconstruction from quality trimmed reads was successful while it failed completely when using the assembly. Results might depend on parameters passed to the assembler.
Mine/excise protein-coding genes from whole mitochondrial genomes, which is often simpler than referring to the annotation (if one is available at all).
Check for contamination in sequencing libraries. Contamination is a common problem. By mining COI sequences from as NGS library, one should obtain COI sequencing reads from the target species as well as from the contaminating species.
Off label usage: Mine prokaryotic genes from assemblies. (Not tested, but in principle, all genes which can be directly translated into amino acid sequences can be reconstructed with MitoGeneExtractor)

List of input sources for which MitoGeneExtractor did not work in our tests:

Long reads from MinION, Oxford Nanopore (Tested on a small number of libraries. See the supplementary materials of the publication for details.)

Arguments pro MitoGeneExtractor:

Several tools exist that are able to reconstruct whole or partial mitochondrial genomes from sequencing libraries. All of them extract sequences from assemblies. We found several examples in which assemblies contained strongly reduced amounts of mitochondrial sequence data compared to the raw reads, in particular in the presence of conflicting sequences, e.g. if NUMTs are present or if the library contains DNA from different specimen. If mitochondrial sequences cannot be assembled, assembly based tools cannot find the genes.

For MGE this means that we recommend to extract protein coding mitochondrial genes from (quality trimmed) reads rather than assemblies if possible. We have seen examples where the extraction from assemblies worked equally well as the extraction from unassembled reads, but we have also seen cases where the extraction from unassembled reads was successful, but failed when using the assembly.

How MitoGeneExtractor works - the algorithm:

MitoGeneExtractor aligns all given input nucleotide sequences against a protein reference sequence to obtain a multiple sequence alignment. The intended use case is to extract mitochondrial protein coding genes from sequencing libraries. The individual alignments are computed by calling the Exonerate program.

Exonerate is a very efficient alignment program which allows to align protein and nucleotide sequences. Nucleotide sequences which cannot be aligned to the protein reference will not be included in the output. Exonerate should be able to align several 100k short reads in a few minutes using a single CPU core. Therefore, this approach can be used for projects of any size. Exonerate can align amino acid sequences also to long nucleotide sequences. For this reason, MitoGeneExtractor can also mine sequences from assemblies or from long read libraries. It can even be used to extract genes of interest from whole mitochondrial or nuclear genome/transcriptome assemblies.

Input to MitoGeneExtractor

Required by MitoGeneExtractor

MitoGeneExtractor requires two input files:

The amino acid reference in fasta file format. For MitoGeneExtractor version 1.9.5 or newer this file can contain multiple protein coding reference genes and/or their variants. This allows to extract all protein coding genes of interest in one program run.

Many example references are included in the Amino-Acid-references-for-taxonomic-groups folder of this project here. For the COI gene, one can specify as a reference the amino acid sequence of the barcode region, or if intended, the full COI sequence. If the full COI sequence shall be extracted, we suggest to create a reference specific for your taxonomic group, since the COI gene can differ considerably in the first and last few amino acids for specific groups with respect to references designed for larger groups. For the barcode region of COI this is normally not a problem.

The nucleotide reads/assemblies/genomes in the fasta or fastq format. Since version 1.9.5 any number of fasta or fastq files (e.g. files from paired-end sequencing or multiple replicates) can be specified as program parameters. They will automatically be concatenated and analysed in a single run. Since the paired-end information is not exploited, paired-end libraries can be combined with single-end data.

Recommendation: Since quality scores are not used in the analysis, we recommend to pass quality trimmed reads to MitoGeneExtractor.

Optional input

The user can specify a previously computed vulgar file, i.e. the output file produced by Exonerate. The vulgar file has to correspond to the input sequences! Specifying an existing file avoids aligning all reads against the reference(s) again, if only the MGE parameters are changed.

If you specify a vulgar file name:

If the file exists, it will be used.
If the file does not exist, MGE will run Exonerate to create a new vulgar file and save it using the specified filename.

If you do not specify a vulgar file name:

MGE will run Exonerate to create a new vulgar file and remove it after it has been used.

Caution: MGE can only find obvious inconsistencies between the sequence input files and the vulgar file. If the vulgar file contains only partial results (e.g. from a previous run with less data), this will not be noticed and leads to incomplete results.

Supported Platforms:

MitoGeneExtractor: All platforms are supported for which a C++ compiler is available. It can be compiled by users without root privileges. This program has been tested on Linux, HPC Unix platforms and MacOS. It should be possible in principle to compile it on windows. However, there is not support from the authors for this platform.

Snakemake workflow: Linux, HPC Unix platforms and MacOS. It should be possible to run this on Windows systems but the authors have no hardware to test this.

Installation:

MitoGeneExtractor requires either an Exonerate output file as input, or the Exonerate program has to be installed, so that such an Exonerate output can be generated by MitoGeneExtractor. As of writing this, the most recent version of the Exonerate program is 2.4, which is available e.g. here: https://github.com/nathanweeks/exonerate

On MacOS you will need to install the command line developer tools. For a manual how to do this, see https://osxdaily.com/2014/02/12/install-command-line-tools-mac-os-x/.

To install MitoGeneExtractor, do one of the following:

Clone the MitoGeneExtractor project to your computer. The link can be found by clicking on the "Code" pulldown menu at the top of this page.
Download the zipped project folder and extract the folder. The link can be found by clicking on the "Code" pulldown menu at the top of this page.

Now enter the mitogeneextractor folder on the command line and run the make program by typing "make" and hitting return. The make program should be preinstalled on all Linux distributions. On MacOS it is included in the command line developer tools (see above).

cd mitogeneextractor
make

The make program will generate an executable called MitoGeneExtractor_vx.y.z, where x.y.z is the version number. Either copy this to a directory in your path or reference it by its full path on the command line.

Get help and a full list of command line options:

Type

MitoGeneExtractor-vx.y.z -h

if MitoGeneExtractor is in your PATH and otherwise

Path-to-MitoGeneExtractor/MitoGeneExtractor-vx.y.z -h

to display a full list of command line options of MitoGeneExtractor.

Quickstart:

Assume the input file (sequencing reads in fasta format, transcriptome assembly, genome assembly) are stored in the file: query-input.fas. Furthermore assume that the amino acid reference sequence is stored in the COI-reference.fas file. Then the following command could be used to attempt to reconstruct the COI sequence from the read data in the query-input.fas file:

MitoGeneExtractor-vx.y.z  -d query-input.fas -p COI-reference.fas -V vulgar.txt -o out-alignment.fas -n 0 -c out-consensus.fas -t 0.5 -r 1 -C 2

Specifying the name of the vulgar file is optional, but recommended as this is the most-time consuming step. If the file exists, it is used as input instead of calling Exonerate to create it. If it does not exist, the name is used to create the vulgar file. The -C 2 option specifies the genetic code (here: vertebrate mitochondrial), the -t 0.5 option specifies the consensus threshold and the -r 1 and -n 0 options are used for a stricter alignment quality (see options for details).

If your read data is in fastq format, you could run the same analysis via this command:

MitoGeneExtractor-vx.y.z  -q query-input.fq -p COI-reference.fas -V vulgar.txt -o out-alignment.fas -n 0 -c out-consensus.fas -t 0.5 -r 1 -C 2

If you have multiple input files (e.g. paired-end data (PE) and single-end (SE) data) you cand specify this as follows:

MitoGeneExtractor-vx.y.z  -q PE_query-input_1.fq -q PE_query-input_2.fq -q SE_query-input.fq -p COI-reference.fas -V vulgar.txt -o out-alignment.fas -n 0 -c out-consensus.fas -t 0.5 -r 1 -C 2

Note, that the order of file names does not matter. It is also possible to simultaneously specify input data in fastq and fasta format.

Example analysis:

An example analysis for the MitoGeneExtractor program can be found in the example-analysis-for-MitoGeneExtractor folder here. The Readme.md file in this folder provided the necessary information to run the example analysis and provides further details.

Prepared Snakemake workflows

You can find a description how data preprocessing and MitoGeneExtractor analyses can be implemented in Snakemake here

An example snakemake analysis can be found here

Command line options:

A full list of the command line options is available when typing MitoGeneExtractor-vx.y.z -h

-d , --dna_fasta_file (accepted multiple times)
Specifies the input query nucleotide sequence files in the fasta format. Sequences are expected not to include gap characters. This option can be specified multiple times if multiple input files shall be analysed in one run. If sequence files contain reads, they should have been quality filtered before being used as input for this program. This option can be combined with multiple input files in the fastq format (see -q option).

-q , --dna_fastq_file (accepted multiple times)
Specifies the input query nucleotide sequence files in the fastq format. This option can be specified multiple times if multiple input files shall be analysed in one run. All input files will be converted to a fasta file without taking into account the quality scores. Sequence files should be quality filtered before being used as input for this program. This option can be combined with multiple input files in the fasta format (see -d option).

-p , --prot_reference_file
Specifies the fasta file containing the amino acid reference sequences. This file can contain one or multiple reference sequences. All input nucleotide sequences are aligned against all references. Hits with a score higher than the minimum are considered. If a sequence matches multiple reference genes/variants, the sequence will be assigned to the reference for which the alignment score is higher or to both if the scores are equal.

-o , -- (required)
Specifies the base name of alignment output file(s). Aligned input sequences are written to a file with the name: BaseName + sequenceNameOfRefernce + .fas for each reference sequence.

-V , --vulgar_file
Specifies the name of Exonerate vulgar file. If the specified file exists, it will be used for the analysis. If it does not exist MitoGeneExtractor will run Exonerate in order to create the file with this name. The created file will then be used to proceed. If no file is specified with this option, a temporary file called tmp-vulgar.txt will be created and removed after the program run. In this case a warning will be printed to the console.

-e , --exonerate_program
Specifies the name of the Exonerate program in system path OR the path to the Exonerate program including the program name. Default: Exonerate

-V , --vulgar_file Name of Exonerate vulgar file. If the specified file exists, it will be used for this analysis. If it does not exist, MitoGeneExtractor will run Exonerate in order to create the file. The created file will then be used to proceed. If no file is specified with this option, a temporary file called tmp-vulgar.txt will be created and removed after the program run. In this case a warning will be printed to the console, since the vulgar file cannot be used again. (Optional, but recommended parameter)

-e , --exonerate_program Name of the Exonerate program in the system path OR the path to the Exonerate program including the program name. Default: exonerate. (Optional parameter)

-n , --numberOfBpBeyond
Specifies the number of base pairs that are shown beyond the Exonerate alignment. A value of 0 means that the sequence is clipped at the point the Exonerate alignment ends. Values >0 can lead to the inclusion of sequence segments that do not align well with the amino acid sequence and have to be treated with caution. They might belong to chimera, NUMTs, or other problematic sequences. Larger values might be included e.g. if problematic sequences with a well matching seed alignment are of interest. CAUTION: Bases included with this option might not be aligned well or could even belong to stop codons! They should be considered as of lower quality compared to other bases. Bases that are added with this option are added as lower case characters to the output alignment file. A sequence coverage of bases not belonging to these extra bases can be requested with the --minSeqCoverageInAlignment_uppercase option. Default: 0.

-c , --consensus_file (required) Specifies the base name of the consensus sequence output file(s). A consensus sequence with the name baseName + reference-sequence-name + .fas is written for each reference sequence.

-t , --consensus_threshold
This option modifies the consensus threshold. Default: 0.5 which corresponds to 50%.

-D, --includeDoubleHits
Include input sequences with two alignment results against the same reference.

--noGaps
Do not include reads for which the alignment with the reference contains gaps.

-g, --onlyGap
Include only reads which aligned with a gap.

--report_gaps_mode
Gaps can be reported in different ways. With this option the reporting mode can be specified: 1: report leading and trailing gaps with '-' character. Report internal gaps (introduced with options -G or -g) with '~' character. 2: report leading and trailing gaps with '-' character. Report internal gaps (introduced with options -G or -g) with '-' characters. 3: Remove all gap characters in output. In this case sequences are extracted but are reported with respect to the reference. Default: 1.

-f , --frameshift_penalty
The frameshift penalty passed to Exonerate. Default: -9. Higher values lead to lower scores and by this can have the following effects: (i) hit regions are trimmed since trimming can lead to a better final alignment score, (ii) they can also lead to excluding a read as a whole if the final score is too low and trimming does lead to a higher score. The default of the Exonerate program is -28. A value of -9 (or other values lower than -28) lead to more reads in which the best alignment has a frameshift. In order to remove reads that do not align well, one can use a smaller frameshift penalty and then exclude hits with a frameshift, see -F option).

-C , --genetic_code
The number of the genetic code to use in Exonerate, if this step is required. See https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi for details. Default: 2, i.e. vertebrate mitochondrial code.

-s , --minExonerateScoreThreshold
The score threshold passed to Exonerate to decide whether to include or not include the hit in the output.

-r , --relative_score_threshold
Specified the relative alignment score threshold for Exonerate hits to be considered. The relative score is the score reported by Exonerate divided by the alignment length. Default 1. Reasonable thresholds are between 0.7 and 2.0.

--minSeqCoverageInAlignment_total
Specifies the absolute value of the minimum alignment coverage for computing the consensus sequence. For the coverage, all nucleotides count, also lower case nucleotides that have been added beyond the Exonerate alignment region. Default: 1. Increasing this value increases the number of unknown nucleotides in the consensus sequence.

--minSeqCoverageInAlignment_uppercase
Specifies the absolute value of the minimum alignment coverage for computing the consensus sequence. As coverage, only upper case nucleotides are taken into account, i.e. no nucleotides are counted that have been added beyond the Exonerate alignment region. Bases beyond the Exonerate alignment are added with the -n or --numberOfBpBeyond option. If no bases are added beyond the Exonerate alignment (default), the effect of this option is identical to the minSeqCoverageInAlignment_total option. Default: 1. Increasing this value increases the number of unknown nucleotides in the consensus sequence.

--temporaryDirectory
MGE has to create potentially large temporary files, e.g. if multiple input files are specified, or if fastq file are specified. With this option these files will not be created in the directory the program was launched, but in the specified tmp directory.

--treat-references-as-individual
Input sequences which can be aligned with different reference sequences are by default assigned only to the references for which the alignment score is equal to the best score achieved by this input sequence. This score competition is switched off if this option is specified. This treats multiple references as if they are specified in independent program runs.

--keep-concat-input-file
If multiple input files are specified, MGE first creates a concatenated file. By default this file is removed. Use this option if you want to keep this file.

--verbosity
Specifies how much run time information is printed to the console. Values: 0: minimal output, 1: important notices, 2: more notices, 3: basic progress, 4: detailed progress, 50-100: debug output, 1000: all output.

--, --ignore_rest
Ignores the rest of the labeled arguments following this flag.

--version
Displays version information and exits.

-h, --help
Displays usage information and exits.

Project outlook:

Currently, we are exploring the utility of using HMMs, namely nhmmer as another option and alternative to exonerate.

Authors of the publication:

Marie Brasseur, ZFMK/LIB, Bonn, Germany
Jonas Astrin, ZFMK/LIB, Bonn, Germany
Matthias Geiger, ZFMK/LIB, Bonn, Germany
Christoph Mayer, ZFMK/LIB, Bonn, Germany

Authors of the software project:

Christoph Mayer, ZFMK/LIB, Bonn, Germany: MitoGeneExtractor program.
Marie Brasseur, ZFMK/LIB, Bonn, Germany: Snakemake pipeline and analyses for publication.

Reference: When using MitoGeneExtractor please cite:

Brasseur, M.V., Astrin, J.J., Geiger, M.F., Mayer, C., 2023. MitoGeneExtractor: Efficient extraction of mitochondrial genes from next-generation sequencing libraries. Methods in Ecology and Evolution. https://besjournals.onlinelibrary.wiley.com/doi/10.1111/2041-210X.14075

Since MitoGeneExtractor uses the Exonerate program, please also cite:

Slater, G.S.C., Birney, E., 2005. Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics 6, 31. https://doi.org/10.1186/1471-2105-6-31

mitogeneextractor's People

Contributors

Stargazers

Watchers

mitogeneextractor's Issues

Problem running MitoGeneExtractor with 2 input reads

Good afternoon,

I am trying to analyze ultraconserved elements (UCEs) data with MitoGeneFinder. I have paired-end reads data (one R1 and R2 per sample) and I am trying to analyze only one sample as a test. I have followed the steps to generate the COI sequence for the taxon of interest. Subsequently, when running MitoGeneExtractor I get a problem. This is the code I used to run the analysis:

../../software/anaconda3/envs/mitogeneextractor/MitoGeneExtractor/MitoGeneExtractor/MitoGeneExtractor-v1.9.5 -q Ad_1_Gabiarra_PI-READ1.fastq Ad_1_Gabiarra_PI-READ2.fastq -p Alytes_COI_consensus.fasta -e ../.... /software/exonerate/exonerate-2.2.0-x86_64/bin/exonerate -V vulgar_Ad_1_Gabiarra_PI_Alytes_COI.txt -o Ad_1_Gabiarra_PI_align_Alytes_COI.fas -n 0 -c Ad_1_Gabiarra_PI_cons_Alytes_COI.fas -t 0.5 -r 1 -C 2

Subsequently, the terminal returns the following error:
Welcome to the MitoGeneExtractor program, version 1.9.5
PARSE ERROR: Argument: Ad_1_Gabiarra_PI-READ2.fastq
Couldn't find match for argument

I have also tried to concatenate both reads into one, and it seems that the analysis starts to run, but then returns another error:
ERROR: The sequence names in the DNA read file are not unique when trimming them at the first space, which is done by many programs, including exonerate. Please verify your input data. Often it helps to rename sequences by replacing spaces with e.g. underscores.

What could I do to try to fix them? Any help would be appreciated, thank you very much!

command not found: MitoGeneExtractor

Hi,

I am not able to run the program after installing it as you have instructed. My exonerate installation was successful. It's still showing:

zsh: command not found: MitoGeneExtractor

Could you please help me with this?

installation

Apologies! Can this be deleted? I accidentally submitted an issue here.

Best,
Justin

Exonerate crashes with Segmentation fault 11

This issue was posted originally by the user avvypaks in the thread of the issue "command not found: MitoGeneExtractor".

       `   innovation_user@Innovation-08 MitoGeneExtractor-main % MitoGeneExtractor -q merged18.fastq -p F04translated.fasta -o out-alignment.fas -n 0 -c out-consensus.fas -t 0.5 -r 1 -C 5`

Welcome to the MitoGeneExtractor program, version 1.9.5
WARNING: You did not specify a vulgar file, so a temporary vulgar file will be created for this run that will be removed at the end of this program run. Therefor the vulgar file cannot be reused in other runs.
Parameter settings:

DNA fastq input file names: merged18.fastq
Protein reference input file name: F04translated.fasta
Directory for temporary files: .
Base name for alignment output file: out-alignment.fas
Vulgar file name: tmp-vulgar.txt
Genetic code (NCBI genetic code number): 5
Print this number of bp beyond Exonerate alignment: 0
Write consensus sequence to file : yes
Filename for consensus sequence output: out-consensus.fas
Consensus sequence threshold value: 0.5
Frameshift penalty: -9
Relative score threshold: 1
Minimum coverage in Exonerate alignment: 1
Minimum coverage in Exonerate alignment (upper case): 1
Gappy reads used: yes
Frameshift reads used: no
Treat all references as independent: no
Report gaps mode: Report all (leading, trailing, internal) gaps with '-' character.
Verbosity: 1

Filename:./Concatenated_exonerate_input_XXXXXX
Filename:./Concatenated_exonerate_input_XXXXXX
Filename:./Concatenated_exonerate_input_ZO3Ugq
The specified vulgar file exists, so it does not have to be recomputed with exonerate.
NOTE: Exonerate hit skipped due to low relative alignment score: 0.792982 396580ee-63da-4195-b01f-1cf01fcfc00f
NOTE: Exonerate hit skipped due to low relative alignment score: 0.764368 386f9be3-37d4-40d4-bbb7-6dd7d3ab8eb9
NOTE: Exonerate hit skipped due to low relative alignment score: 0.984848 63ade953-ffdf-4140-8a95-99fd8d23257c
WARNING: Vulgar file is incomplete. It does not end with "-- completed exonerate analysis"
This is attempt number 1 to read the vulgar file. I will try to recompute the vulgar file.
sh: line 1: 15994 Segmentation fault: 11 exonerate --geneticcode FFLLSSSSYYCCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSSSSVVVVAAAADDEEGGGG --frameshift -9 --query F04translated.fasta -Q protein --target ./Concatenated_exonerate_input_ZO3Ugq -T dna --model protein2dna --showalignment 0 --showvulgar 1 > tmp-vulgar.txt 2> tmp-vulgar.txt.log
sh: line 1: 15999 Segmentation fault: 11 exonerate --geneticcode FFLLSSSSYYCCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSSSSVVVVAAAADDEEGGGG --frameshift -9 --query F04translated.fasta -Q protein --target ./Concatenated_exonerate_input_ZO3Ugq -T dna --model protein2dna --showalignment 0 --showvulgar 1 > tmp-vulgar.txt 2> tmp-vulgar.txt.log
sh: line 1: 16001 Segmentation fault: 11 exonerate --geneticcode FFLLSSSSYYCCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSSSSVVVVAAAADDEEGGGG --frameshift -9 --query F04translated.fasta -Q protein --target ./Concatenated_exonerate_input_ZO3Ugq -T dna --model protein2dna --showalignment 0 --showvulgar 1 > tmp-vulgar.txt 2> tmp-vulgar.txt.log
sh: line 1: 16003 Segmentation fault: 11 exonerate --geneticcode FFLLSSSSYYCCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSSSSVVVVAAAADDEEGGGG --frameshift -9 --query F04translated.fasta -Q protein --target ./Concatenated_exonerate_input_ZO3Ugq -T dna --model protein2dna --showalignment 0 --showvulgar 1 > tmp-vulgar.txt 2> tmp-vulgar.txt.log
sh: line 1: 16009 Segmentation fault: 11 exonerate --geneticcode FFLLSSSSYYCCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSSSSVVVVAAAADDEEGGGG --frameshift -9 --query F04translated.fasta -Q protein --target ./Concatenated_exonerate_input_ZO3Ugq -T dna --model protein2dna --showalignment 0 --showvulgar 1 > tmp-vulgar.txt 2> tmp-vulgar.txt.log
sh: line 1: 16013 Segmentation fault: 11 exonerate --geneticcode FFLLSSSSYYCCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSSSSVVVVAAAADDEEGGGG --frameshift -9 --query F04translated.fasta -Q protein --target ./Concatenated_exonerate_input_ZO3Ugq -T dna --model protein2dna --showalignment 0 --showvulgar 1 > tmp-vulgar.txt 2> tmp-vulgar.txt.log
sh: line 1: 16015 Segmentation fault: 11 exonerate --geneticcode FFLLSSSSYYCCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSSSSVVVVAAAADDEEGGGG --frameshift -9 --query F04translated.fasta -Q protein --target ./Concatenated_exonerate_input_ZO3Ugq -T dna --model protein2dna --showalignment 0 --showvulgar 1 > tmp-vulgar.txt 2> tmp-vulgar.txt.log
sh: line 1: 16017 Segmentation fault: 11 exonerate --geneticcode FFLLSSSSYYCCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSSSSVVVVAAAADDEEGGGG --frameshift -9 --query F04translated.fasta -Q protein --target ./Concatenated_exonerate_input_ZO3Ugq -T dna --model protein2dna --showalignment 0 --showvulgar 1 > tmp-vulgar.txt 2> tmp-vulgar.txt.log
sh: line 1: 16021 Segmentation fault: 11 exonerate --geneticcode FFLLSSSSYYCCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSSSSVVVVAAAADDEEGGGG --frameshift -9 --query F04translated.fasta -Q protein --target ./Concatenated_exonerate_input_ZO3Ugq -T dna --model protein2dna --showalignment 0 --showvulgar 1 > tmp-vulgar.txt 2> tmp-vulgar.txt.log
sh: line 1: 16025 Segmentation fault: 11 exonerate --geneticcode FFLLSSSSYYCCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSSSSVVVVAAAADDEEGGGG --frameshift -9 --query F04translated.fasta -Q protein --target ./Concatenated_exonerate_input_ZO3Ugq -T dna --model protein2dna --showalignment 0 --showvulgar 1 > tmp-vulgar.txt 2> tmp-vulgar.txt.log
ERROR: Running exonerate failed. The generated vulgar file is incomplete and should be removed manually. Exiting.

MitoGeneExtractor says goodbye.

IT WOULD BE GREAT IF YOU CAN TELL ME WHAT THIS "Segmentation fault: 11" IS ALL ABOUT.

Originally posted by @avvypaks in #4 (comment)

conda?

It would be highly convenient if MitoGeneExtractor could be deployed via conda. Are there any plans in that direction?

ERROR: Running exonerate failed.

Hello,

Thank you so much for making this tool!

I am running MitoGene on Illumina PE data and trying to remove a list of protein .fasta. The programme runs for ~2 hours, utilizes 95% of the memory I give it (669GB of 700GB), with this code:

$mitogene -q $R1 -q $R2 -p $fasta_reference -o out-alignment.fas -n 0 -c out-consensus.fas -t 0.5 -r 1 -C 2

With dependencies

singularity/3.7.3
exonerate/2.2.0

And then crashes with this error:

WARNING: You did not specify a vulgar file, so a temporary vulgar file will be created for this run that will be removed at the end of this program run. Therefor the vulgar file cannot be reused in other runs.
Filename:./Concatenated_exonerate_input_XXXXXX
Filename:./Concatenated_exonerate_input_XXXXXX
Filename:./Concatenated_exonerate_input_jvB6Ri
sh: line 1: 3274460 Segmentation fault      (core dumped) exonerate --geneticcode FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSS**VVVVAAAADDEEGGGG --frameshift -9 --query /nfs/scratch/finnca/mitoproteins.fasta -Q protein --target ./Concatenated_exonerate_input_jvB6Ri -T dna --model protein2dna --showalignment 0 --showvulgar 1 > tmp-vulgar.txt 2> tmp-vulgar.txt.log
sh: line 1: 3274503 Segmentation fault      (core dumped) exonerate --geneticcode FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSS**VVVVAAAADDEEGGGG --frameshift -9 --query /nfs/scratch/finnca/mitoproteins.fasta -Q protein --target ./Concatenated_exonerate_input_jvB6Ri -T dna --model protein2dna --showalignment 0 --showvulgar 1 > tmp-vulgar.txt 2> tmp-vulgar.txt.log
sh: line 1: 3274606 Segmentation fault      (core dumped) exonerate --geneticcode FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSS**VVVVAAAADDEEGGGG --frameshift -9 --query /nfs/scratch/finnca/mitoproteins.fasta -Q protein --target ./Concatenated_exonerate_input_jvB6Ri -T dna --model protein2dna --showalignment 0 --showvulgar 1 > tmp-vulgar.txt 2> tmp-vulgar.txt.log
sh: line 1: 3274710 Segmentation fault      (core dumped) exonerate --geneticcode FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSS**VVVVAAAADDEEGGGG --frameshift -9 --query /nfs/scratch/finnca/mitoproteins.fasta -Q protein --target ./Concatenated_exonerate_input_jvB6Ri -T dna --model protein2dna --showalignment 0 --showvulgar 1 > tmp-vulgar.txt 2> tmp-vulgar.txt.log
sh: line 1: 3274809 Segmentation fault      (core dumped) exonerate --geneticcode FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSS**VVVVAAAADDEEGGGG --frameshift -9 --query /nfs/scratch/finnca/mitoproteins.fasta -Q protein --target ./Concatenated_exonerate_input_jvB6Ri -T dna --model protein2dna --showalignment 0 --showvulgar 1 > tmp-vulgar.txt 2> tmp-vulgar.txt.log
sh: line 1: 3275376 Segmentation fault      (core dumped) exonerate --geneticcode FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSS**VVVVAAAADDEEGGGG --frameshift -9 --query /nfs/scratch/finnca/mitoproteins.fasta -Q protein --target ./Concatenated_exonerate_input_jvB6Ri -T dna --model protein2dna --showalignment 0 --showvulgar 1 > tmp-vulgar.txt 2> tmp-vulgar.txt.log
sh: line 1: 3275671 Segmentation fault      (core dumped) exonerate --geneticcode FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSS**VVVVAAAADDEEGGGG --frameshift -9 --query /nfs/scratch/finnca/mitoproteins.fasta -Q protein --target ./Concatenated_exonerate_input_jvB6Ri -T dna --model protein2dna --showalignment 0 --showvulgar 1 > tmp-vulgar.txt 2> tmp-vulgar.txt.log
sh: line 1: 3275785 Segmentation fault      (core dumped) exonerate --geneticcode FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSS**VVVVAAAADDEEGGGG --frameshift -9 --query /nfs/scratch/finnca/mitoproteins.fasta -Q protein --target ./Concatenated_exonerate_input_jvB6Ri -T dna --model protein2dna --showalignment 0 --showvulgar 1 > tmp-vulgar.txt 2> tmp-vulgar.txt.log
sh: line 1: 3275887 Segmentation fault      (core dumped) exonerate --geneticcode FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSS**VVVVAAAADDEEGGGG --frameshift -9 --query /nfs/scratch/finnca/mitoproteins.fasta -Q protein --target ./Concatenated_exonerate_input_jvB6Ri -T dna --model protein2dna --showalignment 0 --showvulgar 1 > tmp-vulgar.txt 2> tmp-vulgar.txt.log
sh: line 1: 3275983 Segmentation fault      (core dumped) exonerate --geneticcode FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSS**VVVVAAAADDEEGGGG --frameshift -9 --query /nfs/scratch/finnca/mitoproteins.fasta -Q protein --target ./Concatenated_exonerate_input_jvB6Ri -T dna --model protein2dna --showalignment 0 --showvulgar 1 > tmp-vulgar.txt 2> tmp-vulgar.txt.log
ERROR: Running exonerate failed. The generated vulgar file is incomplete and should be removed manually. Exiting.

Please let me know what further information I can provide to help resolve this issue (and I'm sorry if it's just that I've misunderstood something!)

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.