phac-nml / ecoli_serotyping Goto Github PK

In silico prediction of E. coli serotype

License: Apache License 2.0

Python 100.00%

ecoli_serotyping's Introduction

ECTyper (an easy typer)

ECTyper is a standalone versatile serotyping module for Escherichia coli. It supports both fasta (assembled) and fastq (raw reads) file formats. The tool provides convenient species identification coupled to quality control module giving a complete, transparent and reference laboratories suitable report on E.coli serotyping.

Dependencies:

python >= 3.5
bcftools >= 1.8
blast == 2.7.1
seqtk >= 1.2
samtools >= 1.8
bowtie2 >= 2.3.4.1
mash >= 2.0

Python packages:

biopython >= 1.70
pandas >= 0.23.1
requests >= 2.0

Installation

Option 1: As a conda package

If you do not have conda environment, get and install miniconda or anaconda:

bash miniconda.sh -b -p $HOME/miniconda
echo ". $HOME/miniconda/etc/profile.d/conda.sh" >> ~/.bashrc
source ~/.bashrc```

Install conda package from bioconda channel conda install -c bioconda ectyper

Option 2: From the source directly

Second option is to install from the source.

Install dependencies. On Ubuntu distro run

apt install samtools bowtie2 mash bcftools ncbi-blast+ seqtk

Install python dependencies via pip:

pip3 install pandas biopython

Clone the repository or checkout a particular release (e.g v1.0.0, etc.):

git clone https://github.com/phac-nml/ecoli_serotyping.git
git checkout v1.0.0 #optionally checkout release version

Install ectyper: python3 setup.py install

Basic Usage

Put the fasta/fastq files for serotyping analyses in one folder (concatenate paired raw reads files if you would like them to be considered a single entity)
ectyper -i [file path] -o [output_dir]
View the results on the console or in cat [output folder]/output.csv

Example Usage

ectyper -i ecoliA.fasta for a single file
ectyper -i ecoliA.fasta -o output_dir for a single file, results stored in output_dir
ectyper -i ecoliA.fasta,ecoliB.fastq,ecoliC.fna for multiple files (comma-delimited)
ectyper -i ecoli_folder for a folder (all files in the folder will be checked by the tool)

Advanced Usage

usage: ectyper [-h] [-V] -i INPUT [-c CORES] [-opid PERCENTIDENTITYOTYPE] [-hpid PERCENTIDENTITYHTYPE] [-opcov PERCENTCOVERAGEOTYPE] [-hpcov PERCENTCOVERAGEHTYPE] [--verify] [-o OUTPUT] [-r REFSEQ] [-s] [--debug] [--dbpath DBPATH]

ectyper v1.0.0 database v1.0 Prediction of Escherichia coli serotype from raw reads or assembled genome sequences. The default settings are recommended.

options:
  -h, --help            show this help message and exit
  -V, --version         show program's version number and exit
  -i INPUT, --input INPUT
                        Location of E. coli genome file(s). Can be a single file, a comma-separated list of files, or a directory
  -c CORES, --cores CORES
                        The number of cores to run ectyper with
  -opid PERCENTIDENTITYOTYPE, --percentIdentityOtype PERCENTIDENTITYOTYPE
                        Percent identity required for an O antigen allele match [default 90]
  -hpid PERCENTIDENTITYHTYPE, --percentIdentityHtype PERCENTIDENTITYHTYPE
                        Percent identity required for an H antigen allele match [default 95]
  -opcov PERCENTCOVERAGEOTYPE, --percentCoverageOtype PERCENTCOVERAGEOTYPE
                        Minumum percent coverage required for an O antigen allele match [default 90]
  -hpcov PERCENTCOVERAGEHTYPE, --percentCoverageHtype PERCENTCOVERAGEHTYPE
                        Minumum percent coverage required for an H antigen allele match [default 50]
  --verify              Enable E. coli species verification
  -o OUTPUT, --output OUTPUT
                        Directory location of output files
  -r REFSEQ, --refseq REFSEQ
                        Location of pre-computed MASH RefSeq sketch. If provided, genomes identified as non-E. coli will have their species identified using MASH. For best results the pre-sketched RefSeq archive https://gembox.cbcb.umd.edu/mash/refseq.genomes.k21s1000.msh is recommended
  -s, --sequence        Prints the allele sequences if enabled as the final columns of the output
  --debug               Print more detailed log including debug messages
  --dbpath DBPATH       Path to a custom database of O and H antigen alleles in JSON format. Check Data/ectyper_database.json for more information

Fine-tunning parameters

ECTyper requires minimum options to run (-i and -o) but allows for extensive configuration to accomodate wide variaty of typing scenarios

Parameter	Explanation	Usage scenario
`-opid`	Specify minimum `%identity` threshold just for O antigen match	Poor coverage of O antigen genes or for exploratory work (recommended value is 90)
`-opcov`	Minimum `%covereage` threshold for a valid match against reference O antigen alleles	Poor coverage of O antigen genes and a user wants to get O antigen call regardless (recommend value is 95)
`-hpid`	Specify minimum `%identity` threshold just for H antigen match	Poor coverage of O antigen genes or for exploratory work (recommend value is 95)
`-hpcov`	Minimum `%covereage` threshold for a valid match against reference H antigen alleles	Poor coverage of O antigen genes and a user wants to get O antigen call regardless (recommend value is 95)
`--verify`	Verify species of the input and run QC module providing information on the reliability of the result and any typing issues	User not sure if sample is E.coli and wants to obtain if serotype prediction is of sufficient quality for reporting purposes
`-r`	Specify custom MASH sketch of reference genomes that will be used for species inference	User has a new assembled genome that is not available in NCBI RefSeq database. Make sure to add metadata to `assembly_summary_refseq.txt` and provide custom accession number that start with `GCF_` prefix
`--dbpath`	Provide custom appended database of O and H antigen reference alleles in JSON format following structure and field names as default database `ectyper_alleles_db.json`	User wants to add new alleles to the alleles database to improve typing performance

Quality Control (QC) module

To provide an easier interpretation of the results and typing metrics, following QC codes were developed. These codes allow to quickly filter "reportable" and "non-reportable" samples. The QC module is tightly linked to ECTyper allele database, specifically, MinPident and MinPcov fields. For each reference allele minimum %identity and %coverage values were determined as a function of potential "cross-talk" between antigens (i.e. multiple potential antigen calls at a given setting). The QC module covers the following serotyping scenarios. More scenarios might be added in future versions depending on user needs.

QC flag	Explanation
PASS (REPORTABLE)	Both O and H antigen alleles meet min `%identity` or `%coverage` thresholds (ensuring no antigen cross-talk) and single antigen predicted for O and H
FAIL (-:- TYPING)	Sample is E.coli and O and H antigens are not typed. Serotype: -:-
WARNING MIXED O-TYPE	A mixed O antigen call is predicted requiring wet-lab confirmation
WARNING (WRONG SPECIES)	A sample is non-E.coli (e.g. E.albertii, Shigella, etc.) based on RefSeq assemblies
WARNING (-:H TYPING)	A sample is E.coli and O antigen is not predicted (e.g. -:H18)
WARNING (O:- TYPING)	A sample is E.coli and O antigen is not predicted (e.g. O17:-)
WARNING (O NON-REPORT)	O antigen alleles do not meet min %identity or %coverage thresholds
WARNING (H NON-REPORT)	H antigen alleles do not meet min %id or %cov thresholds
WARNING (O and H NON-REPORT)	Both O and H antigen alleles do not meet min %identity or %coverage thresholds

Report format

ECTyper capitalizes on a concise minimum output coupled to easy results interpretation and reporting. ECTyper v1.0 serotyping results are available in a tab-delimited output.tsv file consisting of the 16 columns listed below:

Name: Sample name (usually a unique identifier)
Species: the species column provides valuable species identification information in case of inadvertent sample contamination or mislabelling events
O-type: O antigen
H-type: H antigen
Serotype: Predicted O and H antigen(s)
QC: The Quality Control value summarizing the overall quality of prediction
Evidence: How many alleles in total used to both call O and H antigens
GeneScores: ECTyper O and H antigen gene scores in 0 to 1 range
AllelesKeys: Best matching ECTyper database allele keys used to call the serotype
GeneIdentities(%): %identity values of the query alleles
GeneCoverages(%): %coverage values of the query alleles
GeneContigNames: the contig names where the query alleles were found
GeneRanges: genomic coordinates of the query alleles
GeneLengths: allele lengths of the query alleles
Database: database release version and date
Warnings: any additional warnings linked to the quality control status or any other error message(s).

Selected columns from the ECTyper typical report are shown below.

Name	Species	Serotype	Evidence	QC	GeneScores	AlleleKeys	GeneIdentities(%)	GeneCoverages(%)	GeneContigNames	GeneRanges	GeneLengths	Database	Warnings
15-520	Escherichia coli	O174:H21	Based on 3 allele(s)	PASS (REPORTABLE)	wzx:1; wzy:1; fliC:1;	O104-5-wzx-origin;O104-13-wzy;H7-6-fliC-origin;	100;100;100;	100;100;100;	contig00049;contig00001;contig00019;	22302-23492;178-1290;6507-8264;	1191;1113;1758;	v1.0 (2020-05-07)	-
EC20151709	Escherichia coli	O157:H43	Based on 3 allele(s)	PASS (REPORTABLE)	wzx:1;wzy:0.999;fliC:1	O157-5-wzx-origin;O157-9-wzy-origin;H43-1-fliC-origin;	100;99.916;99.934;	100;100;100;	contig00002;contig00002;contig00003;	62558-63949;64651-65835;59962-61467;	1392;1185;1506;	v1.0 (2020-05-07)	-

Availability

Resource	Description	Type
PyPI	PyPI pacakge that could be installed via `pip` utility	Terminal
Conda	Conda package available from BioConda channel	Terminal
Docker	Images containing completely initialized ECTyper with all dependencies	Terminal
Singluarity	Images containing completely initialized ECTyper with all dependencies	Terminal
GitHub	Install source code as any Python package	Terminal
Galaxy ToolShed	Galaxy wrapper available for installation on a private/public instance	Web-based
Galaxy Europe	Galaxy public server to execute your analysis from anywhere	Web-based
IRIDA plugin	IRIDA instances could easily install additional pipeline	Web-based

ecoli_serotyping's People

Contributors

Stargazers

Watchers

Forkers

dorbarker kevinkle kbessonov1984 mingjuhao crashfrog sap-phe-bioinformatics bidhan364 mattheww95 engynasr trang021200

ecoli_serotyping's Issues

Standalone VF Tool

Would be a fun task for anyone who wanted to try out Dataclasses: https://www.youtube.com/watch?v=T-TwcmT6Rcw. Can use the backport in 3.6 https://github.com/ericvsmith/dataclasses and just swap it out when we move to 3.7.

Could do something like:

if sys.version_info[0] < 3:
    import subprocess32 as subprocess
else:
    import subprocess

except check 3.6 vs 3.7 and only import if 3.6.

Might be useful approach for future stuff because the structs let us know what were keeping.

Failed lookup of assembly GCF_000092525

Looks like assembly GCF_000092525 is found in the refseq masher database but not in assembly_summary_refseq.txt

Version 0.9.0

Stacktrace below:

2019-12-20 13:53:46,398 ectyper.genomeFunctions INFO     Creating combined serotype and identification fasta file
2019-12-20 13:55:26,455 ectyper      INFO     Assembling final list of fasta files
2019-12-20 13:55:34,780 ectyper.speciesIdentification INFO     MASH species RefSeq top hit GCF_000092525.1_ASM9252v1_genomic.fna.gz with distance 0.000830728 and shared hashes ratio 966/1000
2019-12-20 13:55:34,781 ectyper.speciesIdentification INFO     GCF_000092525
2019-12-20 13:55:34,836 ectyper.subprocess_util ERROR    Error in subprocess. The following command failed: ['grep', 'GCF_000092525', '/Galaxy/_conda/envs/[email protected]/lib/python3.7/site-packages/ectyper/Data/assembly_summary_refseq.txt']
2019-12-20 13:55:34,837 ectyper.subprocess_util ERROR    Subprocess failed with error: b''
2019-12-20 13:55:34,837 ectyper.subprocess_util CRITICAL ectyper has stopped
subprocess failure

Refseq currency check ignores the -r (--refseq) argument

When using the -r (or --refseq) option, the database currency check nonetheless still uses the default location (file/Data/). Consequence: users get a write permission error and ectyper abends.

In def get_refseq_mash():
.
.
.
targetpath = os.path.join(os.path.dirname(file),"Data/refseq.genomes.k21s1000.msh")

if bool_downloadMashRefSketch(targetpath):

Can you please explain the QC?

00104c81-2985-4943-9502-9143a1eee412 Escherichia coli UMEA 3053-1 O75 H5 O75:H5 NA - Based on 3 allele(s) wzx:1.000;wzy:1.000;fliC:1.000; -
0013e2ce-394b-4fc5-a356-abadd635fa00 Escherichia coli 2-177-06_S4_C2 O132 H10 O132:H10 NA - Based on 3 allele(s) wzx:0.999;wzy:0.999;fliC:1.000; -

Eg the two records above - the QC column is NA, confidence - output looks sensible against eg MLST though - can you explain how EcTyper does this and what the above means?

Thanks

Improve species identification reliability on edge cases or poor quality inputs

the RefSeq sketch is stale and needs to be updated more frequently to match with the NCBI RefSeq database sketch.
Also in species prediction module implement species prediction threshold via min MASH score check in cases when all top hits are equal to MASH p-value of 1 (no certainty) due to poor WGS data quality input (e.g. truncated library). In this case, return species "-" instead of some erroneous prediction

ectyper -h should report version information

Paired-end reads

Hi,

I think one of the useful features of ectyper is the support of paired end reads, and if I understand it correctly there is no paired-end support (FASTQs need to be concatenated together). Using the paired-end information could improve the precision of the tool. Do you have any plan on implementing that?

Thanks for this fast and useful tool!

Simone

TODO: Make tool compatible with metadata or mixed reads datasets

Currently version 1.0.0 of the tool only allows to type single culture/pure E.coli inputs. Since often biological inputs are contaminated, mixed culture or of metadata type, there is a growing need to add read binning capability to the tool.
This feature is non-trivial especially for closely related E.coli species
Need to

explore existing solutions
test feasibility

Ectyper outputs different results depending on input format

Hello,

I have encountered an issue with the pipeline. It gives different results based on how the input is provided, one fasta file vs directory of fasta files vs comma-separated fasta files.

When I provide a directory of fasta files as an input, or more than one comma-separated fasta files, there is one sequence with no O-antigen detected.
When I provide only one single input fasta from that sample as an input, It produces the O-antigen result.

Here are the files when run using one input.
2419114246B1_blast_output_alleles.txt
2419114246B1_ectyper.log

Here are the files when run using multiple inputs
2inputs_blast_output_alleles.txt
2inputs_ectyper.log

We use the pipeline to report serotyping for E.coli in our Public Health lab, It"ll be great if you can look into this issue.
Thank you

Update superphy branch to new shortnames

Switch off E.coli serotype prediction for E.albertii stains

There is little value of E.coli serotype prediction for other Escherichia species such as E.albertii. It was observed that serotype is predicted for E.albertii even when coverage of the closest reference allele is only 16%. The non-E.coli samples such as E.albertii follow different nomenclature. I will switch off the serotype prediction and reporting for non-E.coli genomes as it is confusing to report E.coli serotype for non-E.coli sample and does not bring any additional value to the end user even though there is some degree of relatedness as reported here

The O-polysaccharide structure and the O-antigen gene cluster of E. albertii HK18069 are related to those of Esherichia coli O55 and E. coli O128 reported earlier. Read more from here

Mismatch in parameters, help text and manuscript

Hi @kbessonov1984,

Thanks for the great tool!

I noticed that there appears to be a mismatch between ECTyper defaults, help message descriptions and the ECTyper manuscript.

Specifically these lines: https://github.com/phac-nml/ecoli_serotyping/blob/1.0.0/ectyper/commandLineOptions.py#L71-L101

In the manuscript, I see:

the hits are filtered using default thresholds of >=95 % identity and >=90 % coverage for O-antigen alleles and >=90 % identity and >=50 % coverage for H-antigen alleles

Therefore, I understand:

Parameter	Default	Help message	Manuscript
-opid	90	90	95
-hpid	95	95	90
-opcov	90	95	90
-hpcov	50	50	50

Am I misunderstanding the parameters, missing other logic or should this be corrected? I am happy to contribute via PR once clarified.

Thanks!

Use importlib.resources for files in ectyper/Data/

https://us.pycon.org/2018/schedule/presentation/162/

Currently available for py36 as a standalone and will be in stdlib for py37. There should be a backport somewhere for py27.

Genome name not extracted from filepath

The genome name is not being extracted from the filepath. I'm using version 0.9.0 via Conda.

Running $ ectyper -i /path/to/file/ECR.fasta has the following error:

Traceback (most recent call last):
  File ".../ectyper", line 13, in <module>
    ectyper.run_program()
  File ".../ectyper.py", line 101, in run_program
    args
  File ".../genomeFunctions.py", line 115, in get_genome_names_from_files
    files_dict[sample]["modheaderfile"] = r["newfile"]
KeyError: 'ECR'

The files_dict is
{'/path/to/file/ECR.fasta': {'species': 'Escherichia coli', 'filepath': '/path/to/file/ECR.fasta'}}

The expected files_dict is
{'ECR': {'species': 'Escherichia coli', 'filepath': '/path/to/file/ECR.fasta'}}

-o argument is required

Hello,
I recommend to add in the instruction that -o argument is required. On my system without the output dir specified, I got the error
"TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'
Adding the -o argument solved the issue.

Sequence not shown. --sequence flag is depricated

2023-10-28 20:24:52,090 ectyper INFO Database structure QC is OK at /usr/local/lib/python3.8/site-packages/ectyper/Data/ectyper_alleles_db.json
2023-10-28 20:24:52,091 ectyper INFO Starting ectyper v1.0.0 running on allele database v1.0 (11-03-2020)
2023-10-28 20:24:52,091 ectyper INFO Output_directory is /content/output1
2023-10-28 20:24:52,091 ectyper INFO Command-line arguments Namespace(cores=1, dbpath=None, debug=False, input='U00095.3.fasta', output='/content/output1', percentCoverageHtype=50, percentCoverageOtype=90, percentIdentityHtype=95, percentIdentityOtype=90, refseq=None, sequence=True, verify=False)
2023-10-28 20:24:52,091 ectyper.speciesIdentification INFO RefSeq sketch (refseq.genomes.k21s1000.msh) and assembly meta data (assembly_summary_refseq.txt) is in good health and does not need to be downloaded
2023-10-28 20:24:52,092 ectyper INFO Gathering genome files
2023-10-28 20:24:52,092 ectyper.genomeFunctions INFO Using genomes in file U00095.3.fasta
2023-10-28 20:24:52,092 ectyper INFO Identifying genome file types
2023-10-28 20:24:52,246 ectyper.genomeFunctions INFO Folowing files were not found in the input:
2023-10-28 20:24:52,273 ectyper.genomeFunctions INFO Creating combined serotype and identification fasta file
2023-10-28 20:24:52,292 ectyper INFO Assembling final list of fasta files
2023-10-28 20:24:52,305 ectyper INFO Standardizing the E.coli genome headers based on file names
2023-10-28 20:25:02,847 ectyper.predictionFunctions INFO Predicting serotype from blast output
2023-10-28 20:25:02,925 ectyper.predictionFunctions INFO Serotype prediction completed
2023-10-28 20:25:02,932 ectyper INFO BLAST output file against reference alleles is written at /content/output1/blast_output_alleles.txt
2023-10-28 20:25:02,943 ectyper INFO Reporting results:
2023-10-28 20:25:02,943 ectyper.predictionFunctions INFO Name Species O-type H-type Serotype QC Evidence GeneScores AlleleKeys GeneIdentities(%) GeneCoverages(%) GeneContigNames GeneRanges GeneLengths Database Warnings
2023-10-28 20:25:02,943 ectyper.predictionFunctions INFO U00095.3 - O16 H48 O16:H48 - Based on 3 allele(s) wzx:1;wzy:1;fliC:1; O16-1-wzx-origin;O16-2-wzy-origin;H48-1-fliC-origin; 100;100;100; 100;100;100; gi;gi;gi; 2108337-2109584;2106060-2107226;2002110-2003606; 1248;1167;1497; v1.0 (11-03-2020) -
2023-10-28 20:25:02,944 ectyper INFO
ECTyper has finished successfully.

Sequence is equal to True but it is not shown.

I am trying to use this tool to find out the O antigen and H antigen region specifically. So I would be grateful if you could help me out by helping me see the sequence or get the gene range region as a fasta file. @kbessonov1984

Multiple copies detected at the same locus

eci_2792_genome_sequence.zip
For Stx1A gene (possibly others too, haven't checked), VF module reported two overlapping copies of this same gene at the same locus:

e.g.
Contig: https://www.github.com/superphy#00856f353e3710088ec7582e30ce8578bbeb43b1/contigs/lclECI-2792NODE_8_length_178059_cov_24.9142_ID_15
Copy1 start: 174082
Copy2 start: 174073

Only one copy should be reported for the Stx1A gene in this case (probably longest).

I have supplied a genome file to reproduce this, although i would guess this occurs with any Stx1A harboring genome.

Run getting halted in between with error code b

Hi can you please help me resolve the following error?
2019-11-29 23:41:27,445 ectyper.speciesIdentification INFO GCF_001672015
2019-11-29 23:41:27,535 ectyper.subprocess_util ERROR Error in subprocess. The following command failed: ['grep', 'GCF_001672015', '/home/arya/miniconda3/lib/python3.7/site-packages/ectyper/Data/assembly_summary_refseq.txt']
2019-11-29 23:41:27,538 ectyper.subprocess_util ERROR Subprocess failed with error: b''
2019-11-29 23:41:27,539 ectyper.subprocess_util CRITICAL ectyper has stopped
subprocess failure
-Arya

TODO: Add O-antigen subtype information to the output

There is a talk and reference and diagnostic lab requests for adding O-antigen subtype information due to different phenotypic and clinical manifestations associated with these subtypes. The subtypes of interest are O18ab/ac, O28ac/ab, O112ab/ac, O125ab/ac, O128ab/ac, O141ab/ac, O174ab/ac.

Challenge: Some of the above pairs are very similar genetically and are difficult to resolve.

Possible solution: Incorporate additional metadata into the ECTYPER alleles database. Provide a command line option for user to get subtype information in the O-type field in the output (e.g. --with-O-subtypes). Add additional warning message to alert user to possible limit of resolution due to high degree of similarity between subgroups.

Unable to identiify species, even in the representative genome of E.coli

First, I want to thank you for your work in this pipeline, but I have been trying to run ECTyper since yesterday without success and it seems to be a problem with it.

I created a new conda environment with conda create --name ectyper
I installed the module with conda install -c bioconda ectyper
I downloaded the representative genome of E.coli (Escherichia coli O157:H7 str. Sakai) Refseq: NC_002695.2
Move the file to a new directory and rename O157H7.fasta
Execute Ectyper with ectyper -i O157H7.fasta --verify -o output_dir

And it the results indicates that

2022-10-05 15:00:13,591 ectyper.predictionFunctions INFO     Name	Species	O-type	H-type	Serotype	QC	Evidence	GeneScores	AlleleKeys	GeneIdentities(%)	GeneCoverages(%)	GeneContigNames	GeneRanges	GeneLengths	Database	Warnings
2022-10-05 15:00:13,591 ectyper.predictionFunctions INFO     O157H7	-	-	-	-:-	WARNING (WRONG SPECIES)	-	-							v1.0 (11-03-2020)	Sample identified as -: serotyping results are only available for E.coli samples.If sure that sample is E.coli run without --verify parameter.Sample was not identified as valid E.coli sample but as -
2022-10-05 15:00:13,591 ectyper      INFO     
ECTyper has finished successfully.

It seems to identify correctly the serotype without the --verify argument but I need to assign the species as E.coli

Also, it seems to be something with MASH and the database, because previous to that result I get this:

2022-10-05 15:00:13,588 ectyper.speciesIdentification INFO     Following top hits returned by MASH ['GCF_000002435.1_GL2_genomic.fna.gz', 'GCF_000003955.1_ASM395v1_genomic.fna.gz', 'GCF_000005845.2_ASM584v2_genomic.fna.gz', 'GCF_000006665.1_ASM666v1_genomic.fna.gz', 'GCF_000006825.1_ASM682v1_genomic.fna.gz', '']
2022-10-05 15:00:13,589 ectyper.speciesIdentification WARNING  
Top MASH sketch hit GCF_000002435.1_GL2_genomic.fna.gz with 1/1000 shared hashes.
Could not assign species based on MASH distance to reference sketch file.
Due to either:
1. MASH sketch meta data accessions do not start with the GCF_ prefix in assembly_summary_refseq.txt or
2. Number of shared hashes to reference is less than 100 (i.e. too distant).
3. Genome coverage is very limited causing species verification to fail.
If sample is E.coli, try running without --verify parameter

but GCF_000002435.1 is the ID of Giardia lamblia ATCC 50803

I also tried with the docker version (use docker pull kbessonov/ectyper:1.0.0 because docker pull kbessonov/ectyperdoest work) but I get the same issues.

INFO     Name	Species	O-type	H-type	Serotype	QC	Evidence	GeneScores	AlleleKeys	GeneIdentities(%)	GeneCoverages(%)	GeneContigNames	GeneRanges	GeneLengths	Database	Warnings
2022-10-05 18:22:37,697 ectyper.predictionFunctions INFO     input	-	-	-	-:-	WARNING (WRONG SPECIES)	-	-							v1.0 (11-03-2020)	File /ectyper/input.fasta not found!Sample was not identified as valid E.coli sample but as -
2022-10-05 18:22:37,698 ectyper      INFO     
ECTyper has finished successfully.

The Galaxy version seems to be working fine at least, but I need this to work locally for APECtyper