This is a QIIME 2 plugin. For details on QIIME 2, see https://qiime2.org.
qiime2 / q2-deblur Goto Github PK
View Code? Open in Web Editor NEWLicense: BSD 3-Clause "New" or "Revised" License
License: BSD 3-Clause "New" or "Revised" License
This is a QIIME 2 plugin. For details on QIIME 2, see https://qiime2.org.
The core plugins version numbers are all in sync, which would mean that this initial version should have 0.0.7.dev0
. This isn't required for plugins, but just recommended as a best practice.
Main discussion here: biocore/deblur#157
It is set to 100, but the default in the deblur docs is 150.
Depends on #19
Looks like this functionality was recently merged in Deblur: biocore/deblur#156
Bug Description
The problem is that it is possible for a run to complete such that there are deblurred reads without any recruiting to the reference. This results in an empty DNAIterator
being returned, which the plugin system interprets as a malformed FASTA file.
References
We've also been trying to avoid putting any real documentation in the plugin readmes, in favor of just pointing to the QIIME 2 docs, though that might not make sense here since this isn't covered in the docs yet.
I am running q2-deblur by following the qiime2 moving pictures tutorial except I have replaced the dada2 denoise
command with deblur denoise
qiime deblur denoise --i-demultiplexed-seqs demux.qza --o-representative-sequences rep-seqs --o-table table
This returns the following error:
subprocess.CalledProcessError: Command '['deblur', 'workflow', '--seqs-fp', '/tmp/qiime2-archive-cdd9bdgw/08c4e925-3699-4fde-808a-bfb9025d0c25/data', '--output-dir', '/tmp/tmp93ongdi2', '--mean-error', '0.005', '--error-dist', '1,0.06,0.02,0.02,0.01,0.005,0.005,0.005,0.001,0.001,0.001,0.0005', '--indel-prob', '0.01', '--indel-max', '3', '--trim-length', '100', '--min-reads', '0', '--min-size', '2', '-w']' returned non-zero exit status 1
I installed q2-deblur directly from the github repository using pip. Any thought's on what may be causing this error?
This came up on the forum and it looks like we don't have enough package data in our setup.py so the js
directory is skipped. It also means we don't have any tests to execute visualize-stats
(otherwise our CI system would have caught this).
Should use the new citation API in qiime2/qiime2#387
Proposed Behavior
This involved adding a transformer from DeblurStatsDirFmt
to qiime2.Metadata
. After this is added we can remove the deblur visualize-stats
visualizer in favor of metadata tabulate
. The moving pictures tutorial will also need to be updated at that time to reflect this change.
References
These PRs address the same thing for q2-quality-filter, so may be useful as a reference:
PackageNotFoundError: Package not found: '' Dependencies missing in current osx-64 channels:
Close matches found; did you mean one of these?
qiime2: qiime, r-qiimer
You can search for packages on anaconda.org with
anaconda search -t conda qiime2
(and similarly for the other packages)
Within the Qiime 2 environment I am running:
(qiime2-2.0.6) Bobby-Mac-Pro:~ twchicken$ conda install -c biocore q2-deblur
I fixed Deblur in a way that all denoised sequences are ensured to be upper case.
As far as I understand q2, only enforces that all sequences in a biom table are upper case, thus I figure this plugin is somewhere doing the conversion to all upper cases, maybe here?
q2-deblur/q2_deblur/_denoise.py
Line 184 in dcee4bd
Anyways: with the new release this is fixed in the underlying Deblur program itself. From looking at the conda recipe, the new release should automatically be used when build. But you might want to document this change somewhere?
@gregcaporaso @wasade
Bug Description
A string formatting is missing the trim length resulting in an incomplete ValueError
message. The message right now contains a %d
when that string formatting construct should be replaced by the variable value.
Steps to reproduce the behavior
Run a dataset in which none of the samples have any sequences which make it past the positive filter.
Expected behavior
The message should contain the trim length.
Computation Environment
Comments
An example of the traceback output being produced is below:
Traceback (most recent call last):
File "/Users/dtmcdonald/miniconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/q2cli/commands.py", line 274, in __call__
results = action(**arguments)
File "<decorator-gen-356>", line 2, in denoise_16S
File "/Users/dtmcdonald/miniconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/qiime2/sdk/action.py", line 231, in bound_callable
output_types, provenance)
File "/Users/dtmcdonald/miniconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/qiime2/sdk/action.py", line 362, in _callable_executor_
output_views = self._callable(**view_args)
File "/Users/dtmcdonald/miniconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/q2_deblur/_denoise.py", line 96, in denoise_16S
hashed_feature_ids=hashed_feature_ids)
File "/Users/dtmcdonald/miniconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/q2_deblur/_denoise.py", line 170, in _denoise_helper
raise ValueError("No sequences passed the filter. It is possible "
ValueError: No sequences passed the filter. It is possible the trim_length (%d) may exceed the longest sequence, that all of the sequences are artifacts like PhiX or adapter, or that the positive reference used is not representative of the data being denoised.
Plugin error from deblur:
No sequences passed the filter. It is possible the trim_length (%d) may exceed the longest sequence, that all of the sequences are artifacts like PhiX or adapter, or that the positive reference used is not representative of the data being denoised.
See above for debug info.
Bug Description
Sequence identifiers that contain semicolons on input to q2-deblur will fail to parse correction when aggregating per-sample statistics. The issue was first reported by AhHua on the QIIME 2 Forum. Specifically, --p-sample-stats
obtains the size=
value added to the sequence identifiers by vsearch
during dereplication by splitting the sequence identifiers on a semicolon. In AhHua's case, the sequence records already had some information in the comment section of the sequence identifiers split by a semicolon, leading the stats collector to attempt to operate off of a different value.
Steps to reproduce the behavior
>some_identifier some_comment;foo=0.123
).q2-deblur
with --p-sample-stats
Expected behavior
A TypeError
will occur when collecting the sample stats when attempting to cast (in the above example) 0.123
to an integer.
Screenshots
See this post for an example of incompatible sequence identifiers, and this post for an example of the traceback.
Comment
A workaround for this bug is to not use --p-sample-stats
.
I have some code that addresses this here. While the code technically works (it doesn't error), the resulting table has very low counts relative to the same data unjoined. So I need to explore this a bit more before it's ready. The low sequence counts were due to an error on my part (set the trim length too long, which resulted in dropping a lot of the joined sequences).
q2-feature-table has examples of these, and we're filling them in for the other plugins now.
Bug Description
This file contains the reads which failed to recruit to the positive filter.
Expected Behavior
The documentation surrounding this file should be appropriate w.r.t. the likelihood of the file containing sequences at are, for instance, not actually 16S.
Improvement Description
Currently launching q2-deblur
will create a deblur.log
file wherever the command was run, assuming as a step baked in to deblur
itself. It would be nice if this were able to be optionally created, or controllable as to where it gets placed, via a Q2 artifact or piping the output somewhere (directly into provenance?), or was just catchable output, a la q2-dada2
information.
Bug Description
I just tried running qiime deblur visualize-stats
on deblur-stats.qza from the Moving Pictures Tutorial, and I got an empty table.
Questions
Is this a peculiarity of the specific .qza
in the Moving Pictures Tutorial, or a bug in visualize-stats
?
Comments
.qza
that does produce a meaningful visualization in one of the tutorials somewhere, and in the second, I think we should fix the visualizer.References
Bug Description
Underscores in Sample IDs are not supported in deblur
. This breaks in several ways --- the reference database check is unable to find any hits when there are underscores present in IDs. As well, IDs with underscores appear to be truncated when underscores are present.
Steps to reproduce the behavior
denoise-16s
with samples with underscores in the IDs.Expected behavior
Deblur should work as advertised.
Screenshots
A user reported that mock community samples were producing the following results:
Note sample HMP_mock_2
has no reads hitting the reference. The user had previously used the same mock community and had had success, so this reference miss was surprising.
I reran the same samples through denoise-16s
using underscore-less sample IDs:
The sample in question now has the expected amount of reads hitting the reference.
Computation Environment
Questions
References
They are appending --pos-ref-db
or --neg-ref-db
to the deblur command that is being built, but the actual options in deblur are --pos-ref-db-fp
--pos-ref-fp
or --neg-ref-db-fp
--neg-ref-fp
.
Have you had a chance to check out the docs?
https://docs.qiime2.org
There are many tutorials, walkthroughs, and guides available.
If you still need help, please visit:
https://forum.qiime2.org/c/user-support
environment : qiime2 2023.05 - docker-image
data:PACBIO sequel II
my command as follows:
qiime tools import --type 'SampleData[SequencesWithQuality]' --input-path ../input-path-list.tsv --output-path ccs-data-demux.qza --input-format SingleEndFastqManifestPhred33V2
qiime cutadapt trim-single --i-demultiplexed-sequences ../ccs-data-demux.qza --p-cores 4 --p-adapter RGYTACCTTGTTACGACTT --p-front AGRGTTYGATYMTGGCTCAG --o-trimmed-sequences ccs-data-cutadapt.qza --output-dir result
qiime deblur denoise-16S --i-demultiplexed-seqs ccs-data-cutadapt.qza --o-table deblur.table.qza --o-representative-sequences deblur.representative-sequences.qza --o-stats deblur.stat.qza --p-trim-length 10 --verbose
when i run those programm, something error information as follows:
File "/opt/conda/envs/qiime2-2023.5/lib/python3.8/site-packages/deblur/workflow.py", line 130, in trim_seqs
for label, seq in input_seqs:
File "/opt/conda/envs/qiime2-2023.5/lib/python3.8/site-packages/deblur/workflow.py", line 99, in sequence_generator
for record in skbio.read(input_fp, format=format, **kw):
File "/opt/conda/envs/qiime2-2023.5/lib/python3.8/site-packages/skbio/io/registry.py", line 506, in <genexpr>
return (x for x in itertools.chain([next(gen)], gen))
File "/opt/conda/envs/qiime2-2023.5/lib/python3.8/site-packages/skbio/io/registry.py", line 531, in _read_gen
yield from reader(file, **kwargs)
File "/opt/conda/envs/qiime2-2023.5/lib/python3.8/site-packages/skbio/io/registry.py", line 1008, in wrapped_reader
yield from reader_function(fhs[-1], **kwargs)
File "/opt/conda/envs/qiime2-2023.5/lib/python3.8/site-packages/skbio/io/format/fastq.py", line 351, in _fastq_to_generator
phred_scores, seq_header = _parse_quality_scores(fh, len(seq),
File "/opt/conda/envs/qiime2-2023.5/lib/python3.8/site-packages/skbio/io/format/fastq.py", line 522, in _parse_quality_scores
_decode_qual_to_phred(chunk, variant=variant,
File "/opt/conda/envs/qiime2-2023.5/lib/python3.8/site-packages/skbio/io/format/_base.py", line 34, in _decode_qual_to_phred
raise ValueError("Decoded Phred score is out of range [%d, %d]."
this error information is ValueError: Decoded Phred score is out of range [0, 62]
,but data is correct when i import fastq data
Bug Description
Currently pinned to an old scipy:
# scipy isn't directly used in this plugin, but setting a version pin here
# because deblur doesn't currently work with modern scipy versions.
- scipy <1.1.0
Questions
Comments
Bug Description
CI may fail to catch failing tests, likely because the test runner isn't collecting some or all tests.
Steps to reproduce the behavior
Expected behavior
Failing tests should fail CI
References
Affected PR
For the time being, this is going to require that we transition the one existing method into four methods because we don't have optional artifact inputs. This is something that we're addressing shortly, so will be able to transition this back to one method at that time. In the meantime, we'd rather have four methods than accept file paths as input, as the latter won't work well in non-command-line interfaces (e.g., in QIIME Studio, users would have to type a filepath into a text field).
I'll take this one.
For example, here, but maybe in other places too.
Depends on biocore/deblur#136
There are no unit tests for this plugin.
Current Behavior
As of qiime2-2017.9 there's a new semantic type SampleData[Sequences]
and a new file format (QIIME1DemuxFormat
) associated with that type. This new data type represents the QIIME 1 "demux" format, where sequences have already been demultiplexed and quality-filtered.
Proposed Behavior
By supporting this data type, q2-deblur will be able to support denoising existing QIIME 1 data, or data that's still being produced in this format (some sequencing centers do this for their clients). Currently it is only possible to dereplicate, cluster de novo, or cluster closed-reference with q2-vsearch
.
References
Previously the cookiecutter template would use a block of code in setup.py
to scrape the version from the __init__.py
file, and that little block was attributed to Flask. This is no longer the case as of versioneer upgrades.
A more descriptive name would be helpful here.
Questions
Hallo,
Where exactly is the documentation. I didn't find anything on the website you have indicated.
Thank you
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.