GithubHelp home page GithubHelp logo

xapple / crest4 Goto Github PK

View Code? Open in Web Editor NEW
5.0 5.0 0.0 1.2 MB

The `crest4` python package can automatically assign taxonomic names to DNA sequences obtained from environmental sequencing.

License: GNU General Public License v3.0

Python 100.00%
16s-rrna bioinformatics dna-sequences taxonomic-classification

crest4's People

Contributors

lanzen avatar xapple avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

crest4's Issues

CREST4 produces empty assignment.txt file

Dear @xapple and @lanzen ,

Sorry for coming back here with another issue after several months. I am trying to classify some SSU rRNA short reads obtained from metatranscriptomic sequencing. Because the rRNA files are kind of large (~6GB), I split each file into 10 equal parts and ran them against the silvamod138 database with Blastn, as suggested by @lanzen in issue #1 .

I am using crest4 install via pip in a virtual python environment.

blastn \
-query $input/aligned.SSU.subsampled.part_001.fa \
-db ~/.crest4/silvamod138/silvamod138.fasta \
-num_alignments 10 \
-outfmt "7 qseqid sseqid bitscore length nident" \
-out $output/rRNA_silva138_part_001.hits \
-num_threads 32

The hits file looks something like this

# BLASTN 2.13.0+
# Query: c_000000000001_hx1b_ssu_aligned_rna
# Database: /home/ruizhang/.crest4/silvamod138/silvamod138.fasta
# 0 hits found
# BLASTN 2.13.0+
# Query: c_000000000014_hx1b_ssu_aligned_rna
# Database: /home/ruizhang/.crest4/silvamod138/silvamod138.fasta
# 0 hits found
# BLASTN 2.13.0+
# Query: c_000000000032_hx1b_ssu_aligned_rna
# Database: /home/ruizhang/.crest4/silvamod138/silvamod138.fasta
# Fields: query id, subject id, bit score, alignment length, identical
# 5 hits found
c_000000000032_hx1b_ssu_aligned_rna     AOUI01017441    182     138     125
c_000000000032_hx1b_ssu_aligned_rna     AXCG01145984    176     138     124
c_000000000032_hx1b_ssu_aligned_rna     APFE030523816   152     131     115
c_000000000032_hx1b_ssu_aligned_rna     BDFN01002883    150     130     115
c_000000000032_hx1b_ssu_aligned_rna     CP027782        86.1    52      50

When I provide this hits profile into crest4, I received the following error message

crest4 \
> --fasta $fasta/aligned.SSU.subsampled.fa \
> --search_hits test.hits \
> -o $output \
> --otu_table $output
Running crest4 v.4.2.7
Traceback (most recent call last):
  File "/project/6004066/ruizhang/software/anvio-7.1/bin/crest4", line 8, in <module>
    sys.exit(main())
  File "/project/6004066/ruizhang/software/anvio-7.1/lib/python3.8/site-packages/crest4/__main__.py", line 20, in main
    return magic()
  File "/project/6004066/ruizhang/software/anvio-7.1/lib/python3.8/site-packages/optmagic/__init__.py", line 250, in __call__
    return instance(*extra_args, **extra_kwargs)
  File "/project/6004066/ruizhang/software/anvio-7.1/lib/python3.8/site-packages/crest4/classify.py", line 350, in __call__
    self.out_file.writelines(query.tax_string for query in self.queries)
  File "/project/6004066/ruizhang/software/anvio-7.1/lib/python3.8/site-packages/plumbing/cache.py", line 86, in __get__
    else:                                      result = self.func(instance)
  File "/project/6004066/ruizhang/software/anvio-7.1/lib/python3.8/site-packages/crest4/classify.py", line 305, in queries
    result = [Query(self, query) for query in self.seqsearch.results]
  File "/project/6004066/ruizhang/software/anvio-7.1/lib/python3.8/site-packages/crest4/classify.py", line 305, in <listcomp>
    result = [Query(self, query) for query in self.seqsearch.results]
  File "/project/6004066/ruizhang/software/anvio-7.1/lib/python3.8/site-packages/seqsearch/search/blast.py", line 153, in results
    for entry in SearchIO.parse(handle, 'blast-tab', comments=True):
  File "/project/6004066/ruizhang/software/anvio-7.1/lib/python3.8/site-packages/Bio/SearchIO/__init__.py", line 306, in parse
    yield from generator
  File "/project/6004066/ruizhang/software/anvio-7.1/lib/python3.8/site-packages/Bio/SearchIO/BlastIO/blast_tab.py", line 234, in __iter__
    yield from iterfunc()
  File "/project/6004066/ruizhang/software/anvio-7.1/lib/python3.8/site-packages/Bio/SearchIO/BlastIO/blast_tab.py", line 271, in _parse_commented_qresult
    for qresult in qres_iter:
  File "/project/6004066/ruizhang/software/anvio-7.1/lib/python3.8/site-packages/Bio/SearchIO/BlastIO/blast_tab.py", line 406, in _parse_qresult
    cur = self._parse_result_row()
  File "/project/6004066/ruizhang/software/anvio-7.1/lib/python3.8/site-packages/Bio/SearchIO/BlastIO/blast_tab.py", line 332, in _parse_result_row
    raise ValueError(
ValueError: Expected 5 columns, found: 1

I thought it might be a format issue, then I removed the parts in the file starting with "#" to only use this part

c_000000000032_hx1b_ssu_aligned_rna     AOUI01017441    182     138     125
c_000000000032_hx1b_ssu_aligned_rna     AXCG01145984    176     138     124
c_000000000032_hx1b_ssu_aligned_rna     APFE030523816   152     131     115
c_000000000032_hx1b_ssu_aligned_rna     BDFN01002883    150     130     115
c_000000000032_hx1b_ssu_aligned_rna     CP027782        86.1    52      50

I received the following message

Running crest4 v.4.3.0
/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/Bio/SearchIO/_legacy/__init__.py:12: BiopythonDeprecationWarning: The 'Bio.SearchIO._legacy' module for parsing BLAST plain text output is deprecated and will be removed in a future release of Biopython. Consider generating your BLAST output for parsing as XML or tabular format instead.
  warnings.warn(
Classification ran successfully. Results are placed in '/scratch/ruizhang/crest4/test/assignments.txt'.

When I try to open the assignments.txt file, it's empty.

Then I ran 'crest4 --pytest' to make sure the software is build successfully, but it actually failed one test. Could this be the reason why CREST is not working?

crest4 --pytest
===================================== test session starts =====================================
platform linux -- Python 3.11.2, pytest-7.4.0, pluggy-1.2.0
rootdir: /project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/crest4/tests
configfile: pytest.ini
collected 15 items                                                                            

../../../../project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/crest4/tests/actualize_database/run_test.py . [  6%]
.

.                                                                                      [ 20%]
../../../../project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/crest4/tests/cmd_line_tool/run_test.py . [ 26%]
                                                                                        [ 26%]
../../../../project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/crest4/tests/custom_database/run_test.py . [ 33%]
                                                                                        [ 33%]
../../../../project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/crest4/tests/midori_test/run_test.py F [ 40%]
                                                                                        [ 40%]
../../../../project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/crest4/tests/ncbi_two_sequences/run_test.py . [ 46%]
                                                                                        [ 46%]
../../../../project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/crest4/tests/no_hits_sequence/run_test.py . [ 53%]
.                                                                                       [ 60%]
../../../../project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/crest4/tests/precomputed_hits/run_test.py . [ 66%]
                                                                                        [ 66%]
../../../../project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/crest4/tests/readme_example_seq/run_test.py . [ 73%]
                                                                                        [ 73%]
../../../../project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/crest4/tests/score_drop_change/run_test.py . [ 80%]
                                                                                        [ 80%]
../../../../project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/crest4/tests/vsearch_test/run_test.py . [ 86%]
                                                                                        [ 86%]
../../../../project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/crest4/tests/with_otu_table/run_test.py . [ 93%]
.                                                                                       [100%]

========================================== FAILURES ===========================================
_________________________________________ test_midori _________________________________________

    def test_midori():
        # The input fasta #
        fasta = this_dir.find('*.fasta')
        # The output directory #
        output_dir = this_dir + 'results/'
        output_dir.remove()
        # Create object #
        c = Classify(fasta       = fasta,
                     output_dir  = output_dir,
                     search_algo = 'vsearch',
                     search_db   = 'midori253darn',
                     num_threads = True)
        # Run it #
>       c()

/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/crest4/tests/midori_test/run_test.py:37: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/crest4/classify.py:351: in __call__
    self.out_file.writelines(query.tax_string for query in self.queries)
/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/plumbing/cache.py:86: in __get__
    else:                                      result = self.func(instance)
/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/crest4/classify.py:304: in queries
    if not self.search_hits: self.search()
/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/crest4/classify.py:285: in search
    return self.seqsearch.run()
/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/plumbing/cache.py:86: in __get__
    else:                                      result = self.func(instance)
/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/crest4/classify.py:273: in seqsearch
    return SeqSearch(input_fasta = self.fasta,
/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/seqsearch/search/__init__.py:81: in __init__
    self.set_defaults()
/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/seqsearch/search/__init__.py:93: in set_defaults
    if self.algorithm == 'vsearch' and hasattr(self.database, 'vsearch_db'):
/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/plumbing/cache.py:86: in __get__
    else:                                      result = self.func(instance)
/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/crest4/databases.py:213: in vsearch_db
    if not self.downloaded: self.download()
/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/crest4/databases.py:171: in download
    download_from_url(self.url,
/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/plumbing/scraping/__init__.py:82: in download_from_url
    if stream: response = request(url, header, response=True, stream=True, **kwargs)
/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/decorator.py:232: in fun
    return caller(func, *(extras + args), **kw)
/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/retry/api.py:73: in retry_decorator
    return __retry_internal(partial(f, *args, **kwargs), exceptions, tries, delay, max_delay, backoff, jitter,
/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/retry/api.py:33: in __retry_internal
    return f()
/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/plumbing/scraping/__init__.py:35: in request
    resp.raise_for_status()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <Response [403]>

    def raise_for_status(self):
        """Raises :class:`HTTPError`, if one occurred."""
    
        http_error_msg = ""
        if isinstance(self.reason, bytes):
            # We attempt to decode utf-8 first because some servers
            # choose to localize their reason strings. If the string
            # isn't utf-8, we fall back to iso-8859-1 for all other
            # encodings. (See PR #3538)
            try:
                reason = self.reason.decode("utf-8")
            except UnicodeDecodeError:
                reason = self.reason.decode("iso-8859-1")
        else:
            reason = self.reason
    
        if 400 <= self.status_code < 500:
            http_error_msg = (
                f"{self.status_code} Client Error: {reason} for url: {self.url}"
            )
    
        elif 500 <= self.status_code < 600:
            http_error_msg = (
                f"{self.status_code} Server Error: {reason} for url: {self.url}"
            )
    
        if http_error_msg:
>           raise HTTPError(http_error_msg, response=self)
E           requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://crest4.s3-eu-west-1.amazonaws.com/midori253darn.tar.gz

/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/requests/models.py:1021: HTTPError
------------------------------------ Captured stdout call -------------------------------------
Running crest4 v.4.3.0
                                                                                
                                                                                
          ╭───────────────────── Large Download ─────────────────────╮          
          │                                                          │          
          │                                                          │          
          │  The database 'midori253darn' has not been downloaded    │          
          │  yet. This process will start now and might take some    │          
          │  time depending on your internet connection. Please be   │          
          │  patient. The result will be saved to                    │          
          │  '/home/ruizhang/.crest4/'. You can override this by     │          
          │  setting the $CREST4_DIR environment variable.           │          
          │                                                          │          
          │                                                          │          
          ╰──────────────────────────────────────────────────────────╯          
                                                                                
                                                                                
-------------------------------------- Captured log call --------------------------------------
WARNING  retry.api:api.py:40 403 Client Error: Forbidden for url: https://crest4.s3-eu-west-1.amazonaws.com/midori253darn.tar.gz, retrying in 1 seconds...
WARNING  retry.api:api.py:40 403 Client Error: Forbidden for url: https://crest4.s3-eu-west-1.amazonaws.com/midori253darn.tar.gz, retrying in 2 seconds...
WARNING  retry.api:api.py:40 403 Client Error: Forbidden for url: https://crest4.s3-eu-west-1.amazonaws.com/midori253darn.tar.gz, retrying in 4 seconds...
WARNING  retry.api:api.py:40 403 Client Error: Forbidden for url: https://crest4.s3-eu-west-1.amazonaws.com/midori253darn.tar.gz, retrying in 8 seconds...
WARNING  retry.api:api.py:40 403 Client Error: Forbidden for url: https://crest4.s3-eu-west-1.amazonaws.com/midori253darn.tar.gz, retrying in 16 seconds...
WARNING  retry.api:api.py:40 403 Client Error: Forbidden for url: https://crest4.s3-eu-west-1.amazonaws.com/midori253darn.tar.gz, retrying in 32 seconds...
WARNING  retry.api:api.py:40 403 Client Error: Forbidden for url: https://crest4.s3-eu-west-1.amazonaws.com/midori253darn.tar.gz, retrying in 64 seconds...
====================================== warnings summary =======================================
cmd_line_tool/run_test.py::test_cmd_line_tool
  /project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/_pytest/python.py:198: PytestReturnNotNoneWarning: Expected None, but cmd_line_tool/run_test.py::test_cmd_line_tool returned "Running crest4 v.4.3.0\nClassification ran successfully. Results are placed in '/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/crest4/tests/cmd_line_tool/results/assignments.txt'.\n", which will be an error in a future version of pytest.  Did you mean to use `assert` instead of `return`?
    warnings.warn(

custom_database/run_test.py::test_custom_database
  /project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/Bio/SearchIO/_legacy/__init__.py:12: BiopythonDeprecationWarning: The 'Bio.SearchIO._legacy' module for parsing BLAST plain text output is deprecated and will be removed in a future release of Biopython. Consider generating your BLAST output for parsing as XML or tabular format instead.
    warnings.warn(

custom_database/run_test.py::test_custom_database
  /project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/ete3/webplugin/webapp.py:44: DeprecationWarning: 'cgi' is deprecated and slated for removal in Python 3.13
    import cgi

custom_database/run_test.py::test_custom_database
  /project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/_pytest/python.py:198: PytestReturnNotNoneWarning: Expected None, but custom_database/run_test.py::test_custom_database returned <Classify object on '/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/crest4/tests/custom_database/two_seqs.fasta'>, which will be an error in a future version of pytest.  Did you mean to use `assert` instead of `return`?
    warnings.warn(

ncbi_two_sequences/run_test.py::test_ncbi_two_seqs
  /project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/_pytest/python.py:198: PytestReturnNotNoneWarning: Expected None, but ncbi_two_sequences/run_test.py::test_ncbi_two_seqs returned <Classify object on '/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/crest4/tests/ncbi_two_sequences/two_seqs.fasta'>, which will be an error in a future version of pytest.  Did you mean to use `assert` instead of `return`?
    warnings.warn(

no_hits_sequence/run_test.py::test_no_hits_blast
  /project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/_pytest/python.py:198: PytestReturnNotNoneWarning: Expected None, but no_hits_sequence/run_test.py::test_no_hits_blast returned <Classify object on '/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/crest4/tests/no_hits_sequence/no_hits.fasta'>, which will be an error in a future version of pytest.  Did you mean to use `assert` instead of `return`?
    warnings.warn(

no_hits_sequence/run_test.py::test_no_hits_vsearch
  /project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/_pytest/python.py:198: PytestReturnNotNoneWarning: Expected None, but no_hits_sequence/run_test.py::test_no_hits_vsearch returned <Classify object on '/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/crest4/tests/no_hits_sequence/no_hits.fasta'>, which will be an error in a future version of pytest.  Did you mean to use `assert` instead of `return`?
    warnings.warn(

precomputed_hits/run_test.py::test_precomputed_hits
  /project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/_pytest/python.py:198: PytestReturnNotNoneWarning: Expected None, but precomputed_hits/run_test.py::test_precomputed_hits returned <Classify object on '/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/crest4/tests/precomputed_hits/two_seqs.fasta'>, which will be an error in a future version of pytest.  Did you mean to use `assert` instead of `return`?
    warnings.warn(

readme_example_seq/run_test.py::test_readme_example_seq
  /project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/_pytest/python.py:198: PytestReturnNotNoneWarning: Expected None, but readme_example_seq/run_test.py::test_readme_example_seq returned <Classify object on '/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/crest4/tests/readme_example_seq/readme_test.fasta'>, which will be an error in a future version of pytest.  Did you mean to use `assert` instead of `return`?
    warnings.warn(

score_drop_change/run_test.py::test_score_drop_change
  /project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/_pytest/python.py:198: PytestReturnNotNoneWarning: Expected None, but score_drop_change/run_test.py::test_score_drop_change returned "Running crest4 v.4.3.0\nClassification ran successfully. Results are placed in '/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/crest4/tests/score_drop_change/results/assignments.txt'.\n", which will be an error in a future version of pytest.  Did you mean to use `assert` instead of `return`?
    warnings.warn(

vsearch_test/run_test.py::test_vsearch
  /project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/_pytest/python.py:198: PytestReturnNotNoneWarning: Expected None, but vsearch_test/run_test.py::test_vsearch returned <Classify object on '/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/crest4/tests/vsearch_test/two_seqs.fasta'>, which will be an error in a future version of pytest.  Did you mean to use `assert` instead of `return`?
    warnings.warn(

with_otu_table/run_test.py::test_with_good_otu_table
  /project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/_pytest/python.py:198: PytestReturnNotNoneWarning: Expected None, but with_otu_table/run_test.py::test_with_good_otu_table returned <Classify object on '/project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/crest4/tests/with_otu_table/seqs.fasta'>, which will be an error in a future version of pytest.  Did you mean to use `assert` instead of `return`?
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=================================== short test summary info ===================================
FAILED ../../../../project/6004066/ruizhang/software/crest4/lib/python3.11/site-packages/crest4/tests/midori_test/run_test.py::test_midori - requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://crest4.s3-eu-w...
==================== 1 failed, 14 passed, 12 warnings in 498.09s (0:08:18) ====================

Sorry for this lengthy question.

Crest4 runtime and memory requirement

Dear CREST developers,

Thank you for developing this tool! I am running a nucleotide file (~12G) against the silvmod128 database using vsearch with about 125G memories.

However, I received an error message that seemed to indicate there isn’t sufficient memory.

slurmstepd: error: Detected 1 oom-kill event(s) in StepId=63476927.batch. Some of your processes may have been killed by the cgroup out-of-memory handle

Is it possible to estimate the run time and memory requirements given the size of a FASTA file?

Thank you for your help!

BiopythonDeprecationWarning

Hi!
I have installed crest4 v.4.3.6 via conda and am running it to assign CO sequences.
I am getting this warning message:
crest4 -t 32 --fasta swarm/COI_methods_comp_SWARM_parsed.fasta -d midori253darn --otu_table swarm/COI_methods_comp_SWARM_parsed.tsv -o crest4/COI/ Running crest4 v.4.3.6 Callingmakeblastdb on '/scratch/ssd/fastwork/metabridge/common/crest4/midori253darn/midori253darn.fasta'... /scratch/ssd/fastwork/metabridge/common/conda/crest4/lib/python3.11/site-packages/Bio/SearchIO/_legacy/__init__.py:12: BiopythonDeprecationWarning: The 'Bio.SearchIO._legacy' module for parsing BLAST plain text output is deprecated and will be removed in a future release of Biopython. Consider generating your BLAST output for parsing as XML or tabular format instead. warnings.warn(

Could you solve this deprecation warning by updating the corresponding crest4 module please?

Using BOLD database with CREST4

Dear,

I'm new to the use of CREST4 and I would like to use it to classify COI sequences. I saw in the previous version of CREST that it was possible to use BOLD as a reference database. However, with CREST4, I don't see this possibility.

Are you planning to add the BOLD database in the already implemented reference databases?

If not, can you explain me which is the easiest way to download the BOLD database myself and use it in CREST4?

I already downloaded the BOLD database for an other project with CRABS but the format seems to not fit the requirements of the make_new_crest_db.py because there is no accession numbers (see example below).

CRABS_1:Megascolecidae  6400    Eukaryota       Annelida        Clitellata      Crassiclitellata        Megascolecidae  nan     nan     AACCTTATATTTTCTTTTAGGAATTTGAGCTGGAATAGTTGGTGCCGGAATAAGTTTACTTATCCGCATTGAATTAAGACAGCCGGGTGCATTTCTAGGCAGTGATCAACTATATAATACTTTAGTAACAGCTCACGCATTCGTAATAATTTTCTTTATGGTTATACCCGTATTTATTGGAGGATTTGGTAATTGACTCCTTCCACTAATATTAGGTGCCCCAGACATAGCATTCCCTCGCCTAAACAACCTAAGATTTTGATTATTACCGCCATCCTTAATTCTATTAGTCTCATCTGCAGCAGTAGAAAAAGGCGCCGGTACAGGATGAACAGTTTATCCTCCTTTAGCTAGAAATATAGCTCATGCAGGACCTTCTGTAGACCTAGCAATCTTCTCCCTCCATCTAGCAGGCGCATCCTCAATTTTAGGAGCATTAAACTTCATTACTACAGTTATTAACATACGTTGACAAGGTCTACGACTCGAGCGAATCCCACTATTTGTTTGAGCCGTTACTATTACAGTAGTATTACTACTTTTATCTTTACCAGTATTAGCTGGTGCAATTACCATATTATTAACCGACCGAAACTTAAACACATCTTTCTTTGATCCAGCGGGAGGGGGAGACCCTATTCTTTATCAACATTTATTC
CRABS_2:Nais    85919   Eukaryota       Ascomycota      Sordariomycetes Microascales    Halosphaeriaceae        Nais    nan     TACACTATATCTTATCTTAGGAGTATGAGCAGGAATAGTAGGAACTGGAACTAGATTACTTATTCGAATTGAATTATCACAACCAGGATCATTTCTTGGAAGAGATCAATTATATAATACTCTTGTAACAGCACATGCATTCTTAATAATTTTCTTCTTAGTAATACCAGTATTCATTGGGGGGTTCGGAAACTGACTTCTCCCATTAATACTAGGTGCTGCTGATATGGCATTTCCACGACTAAATAATCTTAGATTTTGACTACTACCACCATCATTAATTCTATTAATTTCTTCTGCTGCTGTAGAAAAAGGTGCGGGAACAGGATGAACTGTTTATCCTCCACTATCAAGAAACCTGGCACATGCAGGACCTTCAGTAGACATGGCTATTTTCTCACTCCATTTAGCAGGTGCTTCTTCTATTTTAGGGGCAGTTAACTTTATCACTACTGTAATAAACATGCGATGAAACGGAATACGATTAGAACGACTTCCACTATTTGTGTGAGCGGTATTTCTCACAGTAATTCTCCTTCTTCTATCACTTCCTGTTCTTGCTGGCGCAATTACAATACTACTAACAGATCGAAACCTTAATACCTCATTCTTCGATCCTGCTGGTGGTGGAGACCCAATTTTATATCAACATTTATTC

error with midori database access

Hi,
I have installed crest4 through conda, and managed to successfully change the CREST4_DIR to a common folder by adding its path on my .bashrc file using:

export CREST4_DIR=/path/to/common/folder

Now, when I run crest4 --pytest

I get the following error:

FAILED conda/crest4/lib/python3.11/site-packages/crest4/tests/midori_test/run_test.py::test_midori - requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://crest4.s3-eu-west-1.amazonaws.com/midori253darn.tar.gz

Indeed, there is only silvamod138pr2 in the CREST4_DIR. How do I get access to MIDORI?

error for COI assignments

Hi,
I am running into an error assigning COI sequences with midori253darn. Could you troubleshoot this? It seems it is a database format error.
Thanks!
`(crest4)mibr@cno-0004:/scratch/ssd/fastwork/metabridge/methods_comparisons/2_metabridge_methods_analysis$ crest4 -t 32 --fasta swarm/COI
_methods_comp_SWARM_parsed.fasta -d midori253darn --otu_table swarm/COI_methods_comp_SWARM_parsed.tsv -o crest4/COI/
Running crest4 v.4.3.6
Calling "makeblastdb" on "/scratch/ssd/fastwork/metabridge/common/crest4/midori253darn/midori253darn.fasta"...

/scratch/ssd/fastwork/metabridge/common/conda/crest4/lib/python3.11/site-packages/Bio/SearchIO/_legacy/__init__.py:12: BiopythonDeprecationWarning: The 'Bio.SearchIO._legacy' module for parsing BLAST plain text output is deprecated and will be removed in a future release of Biopython. Consider generating your BLAST output for parsing as XML or tabular format instead.
  warnings.warn(


Traceback (most recent call last):
  File "/scratch/ssd/fastwork/metabridge/common/conda/crest4/bin/crest4", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/scratch/ssd/fastwork/metabridge/common/conda/crest4/lib/python3.11/site-packages/crest4/__main__.py", line 19, in main
    return magic()
           ^^^^^^^
  File "/scratch/ssd/fastwork/metabridge/common/conda/crest4/lib/python3.11/site-packages/optmagic/__init__.py", line 250, in __call__
    return instance(*extra_args, **extra_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch/ssd/fastwork/metabridge/common/conda/crest4/lib/python3.11/site-packages/crest4/classify.py", line 356, in __call__
    otus_by_rank    = self.otu_info.otus_by_rank
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch/ssd/fastwork/metabridge/common/conda/crest4/lib/python3.11/site-packages/plumbing/cache.py", line 86, in __get__
    else:                                      result = self.func(instance)
                                                        ^^^^^^^^^^^^^^^^^^^
  File "/scratch/ssd/fastwork/metabridge/common/conda/crest4/lib/python3.11/site-packages/crest4/otu_tables.py", line 58, in otus_by_rank
    return self(cumulative=False)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch/ssd/fastwork/metabridge/common/conda/crest4/lib/python3.11/site-packages/crest4/otu_tables.py", line 112, in __call__
    ranks       = result.taxonomy.apply(tax_to_rank)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch/ssd/fastwork/metabridge/common/conda/crest4/lib/python3.11/site-packages/pandas/core/series.py", line 4765, in apply
    ).apply()
      ^^^^^^^
  File "/scratch/ssd/fastwork/metabridge/common/conda/crest4/lib/python3.11/site-packages/pandas/core/apply.py", line 1201, in apply
    return self.apply_standard()
           ^^^^^^^^^^^^^^^^^^^^^
  File "/scratch/ssd/fastwork/metabridge/common/conda/crest4/lib/python3.11/site-packages/pandas/core/apply.py", line 1281, in apply_standard
    mapped = obj._map_values(
             ^^^^^^^^^^^^^^^^
  File "/scratch/ssd/fastwork/metabridge/common/conda/crest4/lib/python3.11/site-packages/pandas/core/base.py", line 921, in _map_values
    return algorithms.map_array(arr, mapper, na_action=na_action, convert=convert)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch/ssd/fastwork/metabridge/common/conda/crest4/lib/python3.11/site-packages/pandas/core/algorithms.py", line 1812, in map_array
    return lib.map_infer(values, mapper, convert=convert)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "lib.pyx", line 2917, in pandas._libs.lib.map_infer
  File "/scratch/ssd/fastwork/metabridge/common/conda/crest4/lib/python3.11/site-packages/crest4/otu_tables.py", line 111, in <lambda>
    tax_to_rank = lambda t: rank_names[len(t.split(';')) - 1]
                            ~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
IndexError: list index out of range`

error in 16S assignment after update to crest 4.3.7

After updating my conda environment for crest, I now have an error when assigning a 16S fasta file using silvamod138pr2, which I did not have with the previous crest version.

(crest4)mibr@cno-0004:/scratch/ssd/fastwork/metabridge/methods_comparisons/2_metabridge_methods_analysis$ crest4 -t 32 --fasta swarm/16S_methods_comp_SWARM_parsed.fasta -
d silvamod138pr2 --otu_table swarm/16S_methods_comp_SWARM_parsed.tsv -o crest4/16S_final/
Running crest4 v.4.3.7
Traceback (most recent call last):
  File "/scratch/ssd/fastwork/metabridge/common/conda/crest4/bin/crest4", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/scratch/ssd/fastwork/metabridge/common/conda/crest4/lib/python3.11/site-packages/crest4/__main__.py", line 19, in main
    return magic()
           ^^^^^^^
  File "/scratch/ssd/fastwork/metabridge/common/conda/crest4/lib/python3.11/site-packages/optmagic/__init__.py", line 250, in __call__
    return instance(*extra_args, **extra_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch/ssd/fastwork/metabridge/common/conda/crest4/lib/python3.11/site-packages/crest4/classify.py", line 356, in __call__
    otus_by_rank    = self.otu_info.otus_by_rank
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch/ssd/fastwork/metabridge/common/conda/crest4/lib/python3.11/site-packages/plumbing/cache.py", line 86, in __get__
    else:                                      result = self.func(instance)
                                                        ^^^^^^^^^^^^^^^^^^^
  File "/scratch/ssd/fastwork/metabridge/common/conda/crest4/lib/python3.11/site-packages/crest4/otu_tables.py", line 58, in otus_by_rank
    return self(cumulative=False)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch/ssd/fastwork/metabridge/common/conda/crest4/lib/python3.11/site-packages/crest4/otu_tables.py", line 81, in __call__
    self.check_id_match()
  File "/scratch/ssd/fastwork/metabridge/common/conda/crest4/lib/python3.11/site-packages/crest4/otu_tables.py", line 77, in check_id_match
    raise ValueError(msg)
ValueError: The sequence named 'SWARM_13377' in the table located at 'swarm/16S_methods_comp_SWARM_parsed.tsv' does not appear in the FASTA file provided at 'swarm/16S_methods_comp_SWARM_parsed.fasta'.

I have checked and SWARM_13377 does exist in the fasta file:

(crest4)mibr@cno-0004:/scratch/ssd/fastwork/metabridge/methods_comparisons/2_metabridge_methods_analysis$ grep "SWARM_13377" swarm/16S_methods_comp_SWARM_parsed.fasta > swarm13377.out

does produce the expected output

Classification thresholds for ranks above phylum level

Hi

in the documentation, it says...

In addition to the lowest common ancestor classification, a minimum similarity filter is used, based on a set of taxon-specific requirements, by default depending on their taxonomic rank. By default, a sequence must be aligned with at least 99% nucleotide similarity to the best reference sequence in order to be classified to the species rank. For the genus, family, order, class and phylum ranks the respective default cut-offs are 97%, 95%, 90%, 85% and 80%.

I was wondering how this is handled for reference databases that do not follow these ranks. I have 18S and ITS metabarcoding data sets that were classified using PR2 v.4.14.0 and Unite v7.2, respectively.

For ITS for example, Unite will output a classification with 9 ranks:

"Domain","Supergroup","Division/Kingdom","Phylum","Class","Order","Family","Genus","Species"

Running Crest with the default bit scores and similarity thresholds, I get some OTUs assigned as Fungi at the kingdom level, but unclassified at phylum level. I guess that actually, these are phylum assignments, because I get the following classification for those:

Cellular organisms;Eukaryota;Opisthokonta;Fungi;Unknown Fungi phylum 1
Cellular organisms;Eukaryota;Opisthokonta;Fungi;Unknown Fungi phylum 2
Cellular organisms;Eukaryota;Opisthokonta;Fungi;Unknown Fungi phylum 3
Cellular organisms;Eukaryota;Opisthokonta;Fungi;Unknown Fungi phylum 4

So I think in Unite, the reference sequences are deposited as "Unknown Fungi phylum" and this way Crest can make a phylum level assignment. Correct?

For 18S, the taxonomy doesnt really follow those ranks as for ITS. But I also have OTUs that are for example classified at the domain level, but unclassified at the supergroup level, and OTUs that are classified at supergroup level, but unclassified at division/kingdom level.

I for example get some assignments like this:

Main genome;Eukaryota;Alveolata

So this is an assignment to supergroup level only. I am not sure if this is due to a sequence deposited in PR2 that has this taxonomy attributed to it, or if Crest only assigns to supergroup level because some threshold is not met. Either way, what similarity threshold is applied for levels above phylum rank? What thresholds were met when an OTU gets assigned to supergroup, but not to kingdom level? Or is it that in this case, the 80% phylum threshold was met, but a lower rank classification couldnt be made, because the deposited sequence in PR2 just didnt have a lower-rank taxonomy attributed to it?

Thanks a lot

Nauras

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.