xapple / crest4 Goto Github PK
View Code? Open in Web Editor NEWThe `crest4` python package can automatically assign taxonomic names to DNA sequences obtained from environmental sequencing.
License: GNU General Public License v3.0
The `crest4` python package can automatically assign taxonomic names to DNA sequences obtained from environmental sequencing.
License: GNU General Public License v3.0
Sorry for coming back here with another issue after several months. I am trying to classify some SSU rRNA short reads obtained from metatranscriptomic sequencing. Because the rRNA files are kind of large (~6GB), I split each file into 10 equal parts and ran them against the silvamod138 database with Blastn, as suggested by @lanzen in issue #1 .
I am using crest4 install via pip in a virtual python environment.
blastn \
-query $input/aligned.SSU.subsampled.part_001.fa \
-db ~/.crest4/silvamod138/silvamod138.fasta \
-num_alignments 10 \
-outfmt "7 qseqid sseqid bitscore length nident" \
-out $output/rRNA_silva138_part_001.hits \
-num_threads 32
The hits file looks something like this
# BLASTN 2.13.0+
# Query: c_000000000001_hx1b_ssu_aligned_rna
# Database: /home/ruizhang/.crest4/silvamod138/silvamod138.fasta
# 0 hits found
# BLASTN 2.13.0+
# Query: c_000000000014_hx1b_ssu_aligned_rna
# Database: /home/ruizhang/.crest4/silvamod138/silvamod138.fasta
# 0 hits found
# BLASTN 2.13.0+
# Query: c_000000000032_hx1b_ssu_aligned_rna
# Database: /home/ruizhang/.crest4/silvamod138/silvamod138.fasta
# Fields: query id, subject id, bit score, alignment length, identical
# 5 hits found
c_000000000032_hx1b_ssu_aligned_rna AOUI01017441 182 138 125
c_000000000032_hx1b_ssu_aligned_rna AXCG01145984 176 138 124
c_000000000032_hx1b_ssu_aligned_rna APFE030523816 152 131 115
c_000000000032_hx1b_ssu_aligned_rna BDFN01002883 150 130 115
c_000000000032_hx1b_ssu_aligned_rna CP027782 86.1 52 50
When I provide this hits profile into crest4, I received the following error message
crest4 \
> --fasta $fasta/aligned.SSU.subsampled.fa \
> --search_hits test.hits \
> -o $output \
> --otu_table $output
Running crest4 v.4.2.7
Traceback (most recent call last):
File "/project/6004066/ruizhang/software/anvio-7.1/bin/crest4", line 8, in <module>
sys.exit(main())
File "/project/6004066/ruizhang/software/anvio-7.1/lib/python3.8/site-packages/crest4/__main__.py", line 20, in main
return magic()
File "/project/6004066/ruizhang/software/anvio-7.1/lib/python3.8/site-packages/optmagic/__init__.py", line 250, in __call__
return instance(*extra_args, **extra_kwargs)
File "/project/6004066/ruizhang/software/anvio-7.1/lib/python3.8/site-packages/crest4/classify.py", line 350, in __call__
self.out_file.writelines(query.tax_string for query in self.queries)
File "/project/6004066/ruizhang/software/anvio-7.1/lib/python3.8/site-packages/plumbing/cache.py", line 86, in __get__
else: result = self.func(instance)
File "/project/6004066/ruizhang/software/anvio-7.1/lib/python3.8/site-packages/crest4/classify.py", line 305, in queries
result = [Query(self, query) for query in self.seqsearch.results]
File "/project/6004066/ruizhang/software/anvio-7.1/lib/python3.8/site-packages/crest4/classify.py", line 305, in <listcomp>
result = [Query(self, query) for query in self.seqsearch.results]
File "/project/6004066/ruizhang/software/anvio-7.1/lib/python3.8/site-packages/seqsearch/search/blast.py", line 153, in results
for entry in SearchIO.parse(handle, 'blast-tab', comments=True):
File "/project/6004066/ruizhang/software/anvio-7.1/lib/python3.8/site-packages/Bio/SearchIO/__init__.py", line 306, in parse
yield from generator
File "/project/6004066/ruizhang/software/anvio-7.1/lib/python3.8/site-packages/Bio/SearchIO/BlastIO/blast_tab.py", line 234, in __iter__
yield from iterfunc()
File "/project/6004066/ruizhang/software/anvio-7.1/lib/python3.8/site-packages/Bio/SearchIO/BlastIO/blast_tab.py", line 271, in _parse_commented_qresult
for qresult in qres_iter:
File "/project/6004066/ruizhang/software/anvio-7.1/lib/python3.8/site-packages/Bio/SearchIO/BlastIO/blast_tab.py", line 406, in _parse_qresult
cur = self._parse_result_row()
File "/project/6004066/ruizhang/software/anvio-7.1/lib/python3.8/site-packages/Bio/SearchIO/BlastIO/blast_tab.py", line 332, in _parse_result_row
raise ValueError(
ValueError: Expected 5 columns, found: 1
I thought it might be a format issue, then I removed the parts in the file starting with "#" to only use this part
c_000000000032_hx1b_ssu_aligned_rna AOUI01017441 182 138 125
c_000000000032_hx1b_ssu_aligned_rna AXCG01145984 176 138 124
c_000000000032_hx1b_ssu_aligned_rna APFE030523816 152 131 115
c_000000000032_hx1b_ssu_aligned_rna BDFN01002883 150 130 115
c_000000000032_hx1b_ssu_aligned_rna CP027782 86.1 52 50
I received the following message
Running crest4 v.4.3.0
/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/Bio/SearchIO/_legacy/__init__.py:12: BiopythonDeprecationWarning: The 'Bio.SearchIO._legacy' module for parsing BLAST plain text output is deprecated and will be removed in a future release of Biopython. Consider generating your BLAST output for parsing as XML or tabular format instead.
warnings.warn(
Classification ran successfully. Results are placed in '/scratch/ruizhang/crest4/test/assignments.txt'.
When I try to open the assignments.txt
file, it's empty.
Then I ran 'crest4 --pytest' to make sure the software is build successfully, but it actually failed one test. Could this be the reason why CREST is not working?
crest4 --pytest
===================================== test session starts =====================================
platform linux -- Python 3.11.2, pytest-7.4.0, pluggy-1.2.0
rootdir: /project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/crest4/tests
configfile: pytest.ini
collected 15 items
../../../../project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/crest4/tests/actualize_database/run_test.py . [ 6%]
.
. [ 20%]
../../../../project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/crest4/tests/cmd_line_tool/run_test.py . [ 26%]
[ 26%]
../../../../project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/crest4/tests/custom_database/run_test.py . [ 33%]
[ 33%]
../../../../project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/crest4/tests/midori_test/run_test.py F [ 40%]
[ 40%]
../../../../project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/crest4/tests/ncbi_two_sequences/run_test.py . [ 46%]
[ 46%]
../../../../project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/crest4/tests/no_hits_sequence/run_test.py . [ 53%]
. [ 60%]
../../../../project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/crest4/tests/precomputed_hits/run_test.py . [ 66%]
[ 66%]
../../../../project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/crest4/tests/readme_example_seq/run_test.py . [ 73%]
[ 73%]
../../../../project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/crest4/tests/score_drop_change/run_test.py . [ 80%]
[ 80%]
../../../../project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/crest4/tests/vsearch_test/run_test.py . [ 86%]
[ 86%]
../../../../project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/crest4/tests/with_otu_table/run_test.py . [ 93%]
. [100%]
========================================== FAILURES ===========================================
_________________________________________ test_midori _________________________________________
def test_midori():
# The input fasta #
fasta = this_dir.find('*.fasta')
# The output directory #
output_dir = this_dir + 'results/'
output_dir.remove()
# Create object #
c = Classify(fasta = fasta,
output_dir = output_dir,
search_algo = 'vsearch',
search_db = 'midori253darn',
num_threads = True)
# Run it #
> c()
/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/crest4/tests/midori_test/run_test.py:37:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/crest4/classify.py:351: in __call__
self.out_file.writelines(query.tax_string for query in self.queries)
/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/plumbing/cache.py:86: in __get__
else: result = self.func(instance)
/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/crest4/classify.py:304: in queries
if not self.search_hits: self.search()
/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/crest4/classify.py:285: in search
return self.seqsearch.run()
/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/plumbing/cache.py:86: in __get__
else: result = self.func(instance)
/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/crest4/classify.py:273: in seqsearch
return SeqSearch(input_fasta = self.fasta,
/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/seqsearch/search/__init__.py:81: in __init__
self.set_defaults()
/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/seqsearch/search/__init__.py:93: in set_defaults
if self.algorithm == 'vsearch' and hasattr(self.database, 'vsearch_db'):
/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/plumbing/cache.py:86: in __get__
else: result = self.func(instance)
/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/crest4/databases.py:213: in vsearch_db
if not self.downloaded: self.download()
/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/crest4/databases.py:171: in download
download_from_url(self.url,
/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/plumbing/scraping/__init__.py:82: in download_from_url
if stream: response = request(url, header, response=True, stream=True, **kwargs)
/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/decorator.py:232: in fun
return caller(func, *(extras + args), **kw)
/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/retry/api.py:73: in retry_decorator
return __retry_internal(partial(f, *args, **kwargs), exceptions, tries, delay, max_delay, backoff, jitter,
/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/retry/api.py:33: in __retry_internal
return f()
/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/plumbing/scraping/__init__.py:35: in request
resp.raise_for_status()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <Response [403]>
def raise_for_status(self):
"""Raises :class:`HTTPError`, if one occurred."""
http_error_msg = ""
if isinstance(self.reason, bytes):
# We attempt to decode utf-8 first because some servers
# choose to localize their reason strings. If the string
# isn't utf-8, we fall back to iso-8859-1 for all other
# encodings. (See PR #3538)
try:
reason = self.reason.decode("utf-8")
except UnicodeDecodeError:
reason = self.reason.decode("iso-8859-1")
else:
reason = self.reason
if 400 <= self.status_code < 500:
http_error_msg = (
f"{self.status_code} Client Error: {reason} for url: {self.url}"
)
elif 500 <= self.status_code < 600:
http_error_msg = (
f"{self.status_code} Server Error: {reason} for url: {self.url}"
)
if http_error_msg:
> raise HTTPError(http_error_msg, response=self)
E requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://crest4.s3-eu-west-1.amazonaws.com/midori253darn.tar.gz
/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/requests/models.py:1021: HTTPError
------------------------------------ Captured stdout call -------------------------------------
Running crest4 v.4.3.0
╭───────────────────── Large Download ─────────────────────╮
│ │
│ │
│ The database 'midori253darn' has not been downloaded │
│ yet. This process will start now and might take some │
│ time depending on your internet connection. Please be │
│ patient. The result will be saved to │
│ '/home/ruizhang/.crest4/'. You can override this by │
│ setting the $CREST4_DIR environment variable. │
│ │
│ │
╰──────────────────────────────────────────────────────────╯
-------------------------------------- Captured log call --------------------------------------
WARNING retry.api:api.py:40 403 Client Error: Forbidden for url: https://crest4.s3-eu-west-1.amazonaws.com/midori253darn.tar.gz, retrying in 1 seconds...
WARNING retry.api:api.py:40 403 Client Error: Forbidden for url: https://crest4.s3-eu-west-1.amazonaws.com/midori253darn.tar.gz, retrying in 2 seconds...
WARNING retry.api:api.py:40 403 Client Error: Forbidden for url: https://crest4.s3-eu-west-1.amazonaws.com/midori253darn.tar.gz, retrying in 4 seconds...
WARNING retry.api:api.py:40 403 Client Error: Forbidden for url: https://crest4.s3-eu-west-1.amazonaws.com/midori253darn.tar.gz, retrying in 8 seconds...
WARNING retry.api:api.py:40 403 Client Error: Forbidden for url: https://crest4.s3-eu-west-1.amazonaws.com/midori253darn.tar.gz, retrying in 16 seconds...
WARNING retry.api:api.py:40 403 Client Error: Forbidden for url: https://crest4.s3-eu-west-1.amazonaws.com/midori253darn.tar.gz, retrying in 32 seconds...
WARNING retry.api:api.py:40 403 Client Error: Forbidden for url: https://crest4.s3-eu-west-1.amazonaws.com/midori253darn.tar.gz, retrying in 64 seconds...
====================================== warnings summary =======================================
cmd_line_tool/run_test.py::test_cmd_line_tool
/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/_pytest/python.py:198: PytestReturnNotNoneWarning: Expected None, but cmd_line_tool/run_test.py::test_cmd_line_tool returned "Running crest4 v.4.3.0\nClassification ran successfully. Results are placed in '/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/crest4/tests/cmd_line_tool/results/assignments.txt'.\n", which will be an error in a future version of pytest. Did you mean to use `assert` instead of `return`?
warnings.warn(
custom_database/run_test.py::test_custom_database
/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/Bio/SearchIO/_legacy/__init__.py:12: BiopythonDeprecationWarning: The 'Bio.SearchIO._legacy' module for parsing BLAST plain text output is deprecated and will be removed in a future release of Biopython. Consider generating your BLAST output for parsing as XML or tabular format instead.
warnings.warn(
custom_database/run_test.py::test_custom_database
/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/ete3/webplugin/webapp.py:44: DeprecationWarning: 'cgi' is deprecated and slated for removal in Python 3.13
import cgi
custom_database/run_test.py::test_custom_database
/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/_pytest/python.py:198: PytestReturnNotNoneWarning: Expected None, but custom_database/run_test.py::test_custom_database returned <Classify object on '/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/crest4/tests/custom_database/two_seqs.fasta'>, which will be an error in a future version of pytest. Did you mean to use `assert` instead of `return`?
warnings.warn(
ncbi_two_sequences/run_test.py::test_ncbi_two_seqs
/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/_pytest/python.py:198: PytestReturnNotNoneWarning: Expected None, but ncbi_two_sequences/run_test.py::test_ncbi_two_seqs returned <Classify object on '/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/crest4/tests/ncbi_two_sequences/two_seqs.fasta'>, which will be an error in a future version of pytest. Did you mean to use `assert` instead of `return`?
warnings.warn(
no_hits_sequence/run_test.py::test_no_hits_blast
/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/_pytest/python.py:198: PytestReturnNotNoneWarning: Expected None, but no_hits_sequence/run_test.py::test_no_hits_blast returned <Classify object on '/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/crest4/tests/no_hits_sequence/no_hits.fasta'>, which will be an error in a future version of pytest. Did you mean to use `assert` instead of `return`?
warnings.warn(
no_hits_sequence/run_test.py::test_no_hits_vsearch
/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/_pytest/python.py:198: PytestReturnNotNoneWarning: Expected None, but no_hits_sequence/run_test.py::test_no_hits_vsearch returned <Classify object on '/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/crest4/tests/no_hits_sequence/no_hits.fasta'>, which will be an error in a future version of pytest. Did you mean to use `assert` instead of `return`?
warnings.warn(
precomputed_hits/run_test.py::test_precomputed_hits
/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/_pytest/python.py:198: PytestReturnNotNoneWarning: Expected None, but precomputed_hits/run_test.py::test_precomputed_hits returned <Classify object on '/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/crest4/tests/precomputed_hits/two_seqs.fasta'>, which will be an error in a future version of pytest. Did you mean to use `assert` instead of `return`?
warnings.warn(
readme_example_seq/run_test.py::test_readme_example_seq
/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/_pytest/python.py:198: PytestReturnNotNoneWarning: Expected None, but readme_example_seq/run_test.py::test_readme_example_seq returned <Classify object on '/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/crest4/tests/readme_example_seq/readme_test.fasta'>, which will be an error in a future version of pytest. Did you mean to use `assert` instead of `return`?
warnings.warn(
score_drop_change/run_test.py::test_score_drop_change
/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/_pytest/python.py:198: PytestReturnNotNoneWarning: Expected None, but score_drop_change/run_test.py::test_score_drop_change returned "Running crest4 v.4.3.0\nClassification ran successfully. Results are placed in '/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/crest4/tests/score_drop_change/results/assignments.txt'.\n", which will be an error in a future version of pytest. Did you mean to use `assert` instead of `return`?
warnings.warn(
vsearch_test/run_test.py::test_vsearch
/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/_pytest/python.py:198: PytestReturnNotNoneWarning: Expected None, but vsearch_test/run_test.py::test_vsearch returned <Classify object on '/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/crest4/tests/vsearch_test/two_seqs.fasta'>, which will be an error in a future version of pytest. Did you mean to use `assert` instead of `return`?
warnings.warn(
with_otu_table/run_test.py::test_with_good_otu_table
/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/_pytest/python.py:198: PytestReturnNotNoneWarning: Expected None, but with_otu_table/run_test.py::test_with_good_otu_table returned <Classify object on '/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/crest4/tests/with_otu_table/seqs.fasta'>, which will be an error in a future version of pytest. Did you mean to use `assert` instead of `return`?
warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=================================== short test summary info ===================================
FAILED ../../../../project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/crest4/tests/midori_test/run_test.py::test_midori - requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://crest4.s3-eu-w...
==================== 1 failed, 14 passed, 12 warnings in 498.09s (0:08:18) ====================
Sorry for this lengthy question.
Dear CREST developers,
Thank you for developing this tool! I am running a nucleotide file (~12G) against the silvmod128 database using vsearch with about 125G memories.
However, I received an error message that seemed to indicate there isn’t sufficient memory.
slurmstepd: error: Detected 1 oom-kill event(s) in StepId=63476927.batch. Some of your processes may have been killed by the cgroup out-of-memory handle
Is it possible to estimate the run time and memory requirements given the size of a FASTA file?
Thank you for your help!
Hi!
I have installed crest4 v.4.3.6 via conda and am running it to assign CO sequences.
I am getting this warning message:
crest4 -t 32 --fasta swarm/COI_methods_comp_SWARM_parsed.fasta -d midori253darn --otu_table swarm/COI_methods_comp_SWARM_parsed.tsv -o crest4/COI/ Running crest4 v.4.3.6 Calling
makeblastdb on '/scratch/ssd/fastwork/metabridge/common/crest4/midori253darn/midori253darn.fasta'... /scratch/ssd/fastwork/metabridge/common/conda/crest4/lib/python3.11/site-packages/Bio/SearchIO/_legacy/__init__.py:12: BiopythonDeprecationWarning: The 'Bio.SearchIO._legacy' module for parsing BLAST plain text output is deprecated and will be removed in a future release of Biopython. Consider generating your BLAST output for parsing as XML or tabular format instead. warnings.warn(
Could you solve this deprecation warning by updating the corresponding crest4 module please?
Dear,
I'm new to the use of CREST4 and I would like to use it to classify COI sequences. I saw in the previous version of CREST that it was possible to use BOLD as a reference database. However, with CREST4, I don't see this possibility.
Are you planning to add the BOLD database in the already implemented reference databases?
If not, can you explain me which is the easiest way to download the BOLD database myself and use it in CREST4?
I already downloaded the BOLD database for an other project with CRABS but the format seems to not fit the requirements of the make_new_crest_db.py because there is no accession numbers (see example below).
CRABS_1:Megascolecidae 6400 Eukaryota Annelida Clitellata Crassiclitellata Megascolecidae nan nan AACCTTATATTTTCTTTTAGGAATTTGAGCTGGAATAGTTGGTGCCGGAATAAGTTTACTTATCCGCATTGAATTAAGACAGCCGGGTGCATTTCTAGGCAGTGATCAACTATATAATACTTTAGTAACAGCTCACGCATTCGTAATAATTTTCTTTATGGTTATACCCGTATTTATTGGAGGATTTGGTAATTGACTCCTTCCACTAATATTAGGTGCCCCAGACATAGCATTCCCTCGCCTAAACAACCTAAGATTTTGATTATTACCGCCATCCTTAATTCTATTAGTCTCATCTGCAGCAGTAGAAAAAGGCGCCGGTACAGGATGAACAGTTTATCCTCCTTTAGCTAGAAATATAGCTCATGCAGGACCTTCTGTAGACCTAGCAATCTTCTCCCTCCATCTAGCAGGCGCATCCTCAATTTTAGGAGCATTAAACTTCATTACTACAGTTATTAACATACGTTGACAAGGTCTACGACTCGAGCGAATCCCACTATTTGTTTGAGCCGTTACTATTACAGTAGTATTACTACTTTTATCTTTACCAGTATTAGCTGGTGCAATTACCATATTATTAACCGACCGAAACTTAAACACATCTTTCTTTGATCCAGCGGGAGGGGGAGACCCTATTCTTTATCAACATTTATTC
CRABS_2:Nais 85919 Eukaryota Ascomycota Sordariomycetes Microascales Halosphaeriaceae Nais nan TACACTATATCTTATCTTAGGAGTATGAGCAGGAATAGTAGGAACTGGAACTAGATTACTTATTCGAATTGAATTATCACAACCAGGATCATTTCTTGGAAGAGATCAATTATATAATACTCTTGTAACAGCACATGCATTCTTAATAATTTTCTTCTTAGTAATACCAGTATTCATTGGGGGGTTCGGAAACTGACTTCTCCCATTAATACTAGGTGCTGCTGATATGGCATTTCCACGACTAAATAATCTTAGATTTTGACTACTACCACCATCATTAATTCTATTAATTTCTTCTGCTGCTGTAGAAAAAGGTGCGGGAACAGGATGAACTGTTTATCCTCCACTATCAAGAAACCTGGCACATGCAGGACCTTCAGTAGACATGGCTATTTTCTCACTCCATTTAGCAGGTGCTTCTTCTATTTTAGGGGCAGTTAACTTTATCACTACTGTAATAAACATGCGATGAAACGGAATACGATTAGAACGACTTCCACTATTTGTGTGAGCGGTATTTCTCACAGTAATTCTCCTTCTTCTATCACTTCCTGTTCTTGCTGGCGCAATTACAATACTACTAACAGATCGAAACCTTAATACCTCATTCTTCGATCCTGCTGGTGGTGGAGACCCAATTTTATATCAACATTTATTC
Hi,
I have installed crest4 through conda, and managed to successfully change the CREST4_DIR to a common folder by adding its path on my .bashrc file using:
export CREST4_DIR=/path/to/common/folder
Now, when I run crest4 --pytest
I get the following error:
FAILED conda/crest4/lib/python3.11/site-packages/crest4/tests/midori_test/run_test.py::test_midori - requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://crest4.s3-eu-west-1.amazonaws.com/midori253darn.tar.gz
Indeed, there is only silvamod138pr2 in the CREST4_DIR. How do I get access to MIDORI?
Hi,
I am running into an error assigning COI sequences with midori253darn. Could you troubleshoot this? It seems it is a database format error.
Thanks!
`(crest4)mibr@cno-0004:/scratch/ssd/fastwork/metabridge/methods_comparisons/2_metabridge_methods_analysis$ crest4 -t 32 --fasta swarm/COI
_methods_comp_SWARM_parsed.fasta -d midori253darn --otu_table swarm/COI_methods_comp_SWARM_parsed.tsv -o crest4/COI/
Running crest4 v.4.3.6
Calling "makeblastdb" on "/scratch/ssd/fastwork/metabridge/common/crest4/midori253darn/midori253darn.fasta"...
/scratch/ssd/fastwork/metabridge/common/conda/crest4/lib/python3.11/site-packages/Bio/SearchIO/_legacy/__init__.py:12: BiopythonDeprecationWarning: The 'Bio.SearchIO._legacy' module for parsing BLAST plain text output is deprecated and will be removed in a future release of Biopython. Consider generating your BLAST output for parsing as XML or tabular format instead.
warnings.warn(
Traceback (most recent call last):
File "/scratch/ssd/fastwork/metabridge/common/conda/crest4/bin/crest4", line 10, in <module>
sys.exit(main())
^^^^^^
File "/scratch/ssd/fastwork/metabridge/common/conda/crest4/lib/python3.11/site-packages/crest4/__main__.py", line 19, in main
return magic()
^^^^^^^
File "/scratch/ssd/fastwork/metabridge/common/conda/crest4/lib/python3.11/site-packages/optmagic/__init__.py", line 250, in __call__
return instance(*extra_args, **extra_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/scratch/ssd/fastwork/metabridge/common/conda/crest4/lib/python3.11/site-packages/crest4/classify.py", line 356, in __call__
otus_by_rank = self.otu_info.otus_by_rank
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/scratch/ssd/fastwork/metabridge/common/conda/crest4/lib/python3.11/site-packages/plumbing/cache.py", line 86, in __get__
else: result = self.func(instance)
^^^^^^^^^^^^^^^^^^^
File "/scratch/ssd/fastwork/metabridge/common/conda/crest4/lib/python3.11/site-packages/crest4/otu_tables.py", line 58, in otus_by_rank
return self(cumulative=False)
^^^^^^^^^^^^^^^^^^^^^^
File "/scratch/ssd/fastwork/metabridge/common/conda/crest4/lib/python3.11/site-packages/crest4/otu_tables.py", line 112, in __call__
ranks = result.taxonomy.apply(tax_to_rank)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/scratch/ssd/fastwork/metabridge/common/conda/crest4/lib/python3.11/site-packages/pandas/core/series.py", line 4765, in apply
).apply()
^^^^^^^
File "/scratch/ssd/fastwork/metabridge/common/conda/crest4/lib/python3.11/site-packages/pandas/core/apply.py", line 1201, in apply
return self.apply_standard()
^^^^^^^^^^^^^^^^^^^^^
File "/scratch/ssd/fastwork/metabridge/common/conda/crest4/lib/python3.11/site-packages/pandas/core/apply.py", line 1281, in apply_standard
mapped = obj._map_values(
^^^^^^^^^^^^^^^^
File "/scratch/ssd/fastwork/metabridge/common/conda/crest4/lib/python3.11/site-packages/pandas/core/base.py", line 921, in _map_values
return algorithms.map_array(arr, mapper, na_action=na_action, convert=convert)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/scratch/ssd/fastwork/metabridge/common/conda/crest4/lib/python3.11/site-packages/pandas/core/algorithms.py", line 1812, in map_array
return lib.map_infer(values, mapper, convert=convert)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "lib.pyx", line 2917, in pandas._libs.lib.map_infer
File "/scratch/ssd/fastwork/metabridge/common/conda/crest4/lib/python3.11/site-packages/crest4/otu_tables.py", line 111, in <lambda>
tax_to_rank = lambda t: rank_names[len(t.split(';')) - 1]
~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
IndexError: list index out of range`
After updating my conda environment for crest, I now have an error when assigning a 16S fasta file using silvamod138pr2, which I did not have with the previous crest version.
(crest4)mibr@cno-0004:/scratch/ssd/fastwork/metabridge/methods_comparisons/2_metabridge_methods_analysis$ crest4 -t 32 --fasta swarm/16S_methods_comp_SWARM_parsed.fasta -
d silvamod138pr2 --otu_table swarm/16S_methods_comp_SWARM_parsed.tsv -o crest4/16S_final/
Running crest4 v.4.3.7
Traceback (most recent call last):
File "/scratch/ssd/fastwork/metabridge/common/conda/crest4/bin/crest4", line 10, in <module>
sys.exit(main())
^^^^^^
File "/scratch/ssd/fastwork/metabridge/common/conda/crest4/lib/python3.11/site-packages/crest4/__main__.py", line 19, in main
return magic()
^^^^^^^
File "/scratch/ssd/fastwork/metabridge/common/conda/crest4/lib/python3.11/site-packages/optmagic/__init__.py", line 250, in __call__
return instance(*extra_args, **extra_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/scratch/ssd/fastwork/metabridge/common/conda/crest4/lib/python3.11/site-packages/crest4/classify.py", line 356, in __call__
otus_by_rank = self.otu_info.otus_by_rank
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/scratch/ssd/fastwork/metabridge/common/conda/crest4/lib/python3.11/site-packages/plumbing/cache.py", line 86, in __get__
else: result = self.func(instance)
^^^^^^^^^^^^^^^^^^^
File "/scratch/ssd/fastwork/metabridge/common/conda/crest4/lib/python3.11/site-packages/crest4/otu_tables.py", line 58, in otus_by_rank
return self(cumulative=False)
^^^^^^^^^^^^^^^^^^^^^^
File "/scratch/ssd/fastwork/metabridge/common/conda/crest4/lib/python3.11/site-packages/crest4/otu_tables.py", line 81, in __call__
self.check_id_match()
File "/scratch/ssd/fastwork/metabridge/common/conda/crest4/lib/python3.11/site-packages/crest4/otu_tables.py", line 77, in check_id_match
raise ValueError(msg)
ValueError: The sequence named 'SWARM_13377' in the table located at 'swarm/16S_methods_comp_SWARM_parsed.tsv' does not appear in the FASTA file provided at 'swarm/16S_methods_comp_SWARM_parsed.fasta'.
I have checked and SWARM_13377 does exist in the fasta file:
(crest4)mibr@cno-0004:/scratch/ssd/fastwork/metabridge/methods_comparisons/2_metabridge_methods_analysis$ grep "SWARM_13377" swarm/16S_methods_comp_SWARM_parsed.fasta > swarm13377.out
does produce the expected output
Hi
in the documentation, it says...
In addition to the lowest common ancestor classification, a minimum similarity filter is used, based on a set of taxon-specific requirements, by default depending on their taxonomic rank. By default, a sequence must be aligned with at least 99% nucleotide similarity to the best reference sequence in order to be classified to the species rank. For the genus, family, order, class and phylum ranks the respective default cut-offs are 97%, 95%, 90%, 85% and 80%.
I was wondering how this is handled for reference databases that do not follow these ranks. I have 18S and ITS metabarcoding data sets that were classified using PR2 v.4.14.0 and Unite v7.2, respectively.
For ITS for example, Unite will output a classification with 9 ranks:
"Domain","Supergroup","Division/Kingdom","Phylum","Class","Order","Family","Genus","Species"
Running Crest with the default bit scores and similarity thresholds, I get some OTUs assigned as Fungi at the kingdom level, but unclassified at phylum level. I guess that actually, these are phylum assignments, because I get the following classification for those:
Cellular organisms;Eukaryota;Opisthokonta;Fungi;Unknown Fungi phylum 1
Cellular organisms;Eukaryota;Opisthokonta;Fungi;Unknown Fungi phylum 2
Cellular organisms;Eukaryota;Opisthokonta;Fungi;Unknown Fungi phylum 3
Cellular organisms;Eukaryota;Opisthokonta;Fungi;Unknown Fungi phylum 4
So I think in Unite, the reference sequences are deposited as "Unknown Fungi phylum" and this way Crest can make a phylum level assignment. Correct?
For 18S, the taxonomy doesnt really follow those ranks as for ITS. But I also have OTUs that are for example classified at the domain level, but unclassified at the supergroup level, and OTUs that are classified at supergroup level, but unclassified at division/kingdom level.
I for example get some assignments like this:
Main genome;Eukaryota;Alveolata
So this is an assignment to supergroup level only. I am not sure if this is due to a sequence deposited in PR2 that has this taxonomy attributed to it, or if Crest only assigns to supergroup level because some threshold is not met. Either way, what similarity threshold is applied for levels above phylum rank? What thresholds were met when an OTU gets assigned to supergroup, but not to kingdom level? Or is it that in this case, the 80% phylum threshold was met, but a lower rank classification couldnt be made, because the deposited sequence in PR2 just didnt have a lower-rank taxonomy attributed to it?
Thanks a lot
Nauras
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.