qiime2 / q2-deblur Goto Github PK

View Code? Open in Web Editor NEW

2.0 12.0 22.0 20.55 MB

License: BSD 3-Clause "New" or "Revised" License

Python 98.70% HTML 0.62% Makefile 0.19% TeX 0.35% Shell 0.13%

hacktoberfest

q2-deblur's Introduction

q2-deblur

This is a QIIME 2 plugin. For details on QIIME 2, see https://qiime2.org.

q2-deblur's People

Contributors

Stargazers

Watchers

q2-deblur's Issues

Track number of trimmed reads, and number dropped due to trimming in stats

Improvement Description
A user on the forum noted that we do not currently track these two stats, and right now the recovery of those values would be convoluted at best. Adding them to the stats output would be great.

References
forum

version should be 0.0.7.dev0

The core plugins version numbers are all in sync, which would mean that this initial version should have 0.0.7.dev0. This isn't required for plugins, but just recommended as a best practice.

Underscore sample IDs

Main discussion here: biocore/deblur#157

should the default trim length be 150?

It is set to 100, but the default in the deblur docs is 150.

Release

Depends on #19

Expose `--left-trim-length` in plugin

Looks like this functionality was recently merged in Deblur: biocore/deblur#156

configure travis

Test if `reference-hit.seqs.fa` is empty instead of `all.seqs.fa`

Bug Description
The problem is that it is possible for a run to complete such that there are deblurred reads without any recruiting to the reference. This results in an empty DNAIterator being returned, which the plugin system interprets as a malformed FASTA file.

References

Context is here.
Specifically, these lines should be revised to test against reference-hit.seqs.fa instead of all.seqs.fa.

readme lists command as "qiime q2-deblur" but should be "qiime deblur"

We've also been trying to avoid putting any real documentation in the plugin readmes, in favor of just pointing to the QIIME 2 docs, though that might not make sense here since this isn't covered in the docs yet.

deblur returned non-zero exit status 1

I am running q2-deblur by following the qiime2 moving pictures tutorial except I have replaced the dada2 denoise command with deblur denoise

qiime deblur denoise  --i-demultiplexed-seqs demux.qza --o-representative-sequences rep-seqs --o-table table

This returns the following error:

subprocess.CalledProcessError: Command '['deblur', 'workflow', '--seqs-fp', '/tmp/qiime2-archive-cdd9bdgw/08c4e925-3699-4fde-808a-bfb9025d0c25/data', '--output-dir', '/tmp/tmp93ongdi2', '--mean-error', '0.005', '--error-dist', '1,0.06,0.02,0.02,0.01,0.005,0.005,0.005,0.001,0.001,0.001,0.0005', '--indel-prob', '0.01', '--indel-max', '3', '--trim-length', '100', '--min-reads', '0', '--min-size', '2', '-w']' returned non-zero exit status 1

I installed q2-deblur directly from the github repository using pip. Any thought's on what may be causing this error?

add coveralls to travis

Missing package data: `/assets/js/*`

This came up on the forum and it looks like we don't have enough package data in our setup.py so the js directory is skipped. It also means we don't have any tests to execute visualize-stats (otherwise our CI system would have caught this).

Add citations

Should use the new citation API in qiime2/qiime2#387

support viewing of `DeblurStats` type with `qiime metadata tabulate`

Proposed Behavior
This involved adding a transformer from DeblurStatsDirFmt to qiime2.Metadata. After this is added we can remove the deblur visualize-stats visualizer in favor of metadata tabulate. The moving pictures tutorial will also need to be updated at that time to reflect this change.

References
These PRs address the same thing for q2-quality-filter, so may be useful as a reference:

Can't install deblur in Qiime 2: PackageNotFoundError: Package not found: '' Dependencies missing in current osx-64 channels:

PackageNotFoundError: Package not found: '' Dependencies missing in current osx-64 channels:

q2-deblur -> q2-types ==2017.2.0
q2-deblur -> q2cli ==2017.2.0
q2-deblur -> qiime2 ==2017.2.0

Close matches found; did you mean one of these?

qiime2: qiime, r-qiimer

You can search for packages on anaconda.org with

anaconda search -t conda qiime2

(and similarly for the other packages)

Within the Qiime 2 environment I am running:

(qiime2-2.0.6) Bobby-Mac-Pro:~ twchicken$ conda install -c biocore q2-deblur

new release 1.0.4

I fixed Deblur in a way that all denoised sequences are ensured to be upper case.

As far as I understand q2, only enforces that all sequences in a biom table are upper case, thus I figure this plugin is somewhere doing the conversion to all upper cases, maybe here?

q2-deblur/q2_deblur/_denoise.py

Line 184 in dcee4bd

rep_sequences = DNAIterator(

Anyways: with the new release this is fixed in the underlying Deblur program itself. From looking at the conda recipe, the new release should automatically be used when build. But you might want to document this change somewhere?
@gregcaporaso @wasade

A ValueError is missing string formatting

Bug Description
A string formatting is missing the trim length resulting in an incomplete ValueError message. The message right now contains a %d when that string formatting construct should be replaced by the variable value.

Steps to reproduce the behavior
Run a dataset in which none of the samples have any sequences which make it past the positive filter.

Expected behavior
The message should contain the trim length.

Computation Environment

Vanilla qiime2-2018.8 environment

Comments
An example of the traceback output being produced is below:

Traceback (most recent call last):
  File "/Users/dtmcdonald/miniconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/q2cli/commands.py", line 274, in __call__
    results = action(**arguments)
  File "<decorator-gen-356>", line 2, in denoise_16S
  File "/Users/dtmcdonald/miniconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/qiime2/sdk/action.py", line 231, in bound_callable
    output_types, provenance)
  File "/Users/dtmcdonald/miniconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/qiime2/sdk/action.py", line 362, in _callable_executor_
    output_views = self._callable(**view_args)
  File "/Users/dtmcdonald/miniconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/q2_deblur/_denoise.py", line 96, in denoise_16S
    hashed_feature_ids=hashed_feature_ids)
  File "/Users/dtmcdonald/miniconda3/envs/qiime2-2018.8/lib/python3.5/site-packages/q2_deblur/_denoise.py", line 170, in _denoise_helper
    raise ValueError("No sequences passed the filter. It is possible "
ValueError: No sequences passed the filter. It is possible the trim_length (%d) may exceed the longest sequence, that all of the sequences are artifacts like PhiX or adapter, or that the positive reference used is not representative of the data being denoised.

Plugin error from deblur:

  No sequences passed the filter. It is possible the trim_length (%d) may exceed the longest sequence, that all of the sequences are artifacts like PhiX or adapter, or that the positive reference used is not representative of the data being denoised.

See above for debug info.

add citation text

here

--p-sample-stats fails to correctly parse sequence identifiers with a semicolon

Bug Description
Sequence identifiers that contain semicolons on input to q2-deblur will fail to parse correction when aggregating per-sample statistics. The issue was first reported by AhHua on the QIIME 2 Forum. Specifically, --p-sample-stats obtains the size= value added to the sequence identifiers by vsearch during dereplication by splitting the sequence identifiers on a semicolon. In AhHua's case, the sequence records already had some information in the comment section of the sequence identifiers split by a semicolon, leading the stats collector to attempt to operate off of a different value.

Steps to reproduce the behavior

Create an input artifact with sequence identifiers that include a semicolon, and a non-integer value (e.g., >some_identifier some_comment;foo=0.123).
Run q2-deblur with --p-sample-stats

Expected behavior
A TypeError will occur when collecting the sample stats when attempting to cast (in the above example) 0.123 to an integer.

Screenshots
See this post for an example of incompatible sequence identifiers, and this post for an example of the traceback.

Comment
A workaround for this bug is to not use --p-sample-stats.

support `SampleData[JoinedSequencesWithQuality]` as input

I have some code that addresses this here. While the code technically works (it doesn't error), the resulting table has very low counts relative to the same data unjoined. So I need to explore this a bit more before it's ready. The low sequence counts were due to an error on my part (set the trim length too long, which resulted in dropping a lot of the joined sequences).

add `description` and `short_description` to `Plugin` instantiation

q2-feature-table has examples of these, and we're filling them in for the other plugins now.

Expose `reference-non-hit.biom`

Bug Description
This file contains the reads which failed to recruit to the positive filter.

Expected Behavior
The documentation surrounding this file should be appropriate w.r.t. the likelihood of the file containing sequences at are, for instance, not actually 16S.

add input/parameter/output descriptions to denoise method

Optionally suppress logfile?

Improvement Description
Currently launching q2-deblur will create a deblur.log file wherever the command was run, assuming as a step baked in to deblur itself. It would be nice if this were able to be optionally created, or controllable as to where it gets placed, via a Q2 artifact or piping the output somewhere (directly into provenance?), or was just catchable output, a la q2-dada2 information.

Visualizer looks fairly empty

Bug Description
I just tried running qiime deblur visualize-stats on deblur-stats.qza from the Moving Pictures Tutorial, and I got an empty table.

Screenshots

Questions
Is this a peculiarity of the specific .qza in the Moving Pictures Tutorial, or a bug in visualize-stats?

Comments

In the first scenario, I think we should include a .qza that does produce a meaningful visualization in one of the tutorials somewhere, and in the second, I think we should fix the visualizer.
(Of course this may be simply user error on my end.)

References

Underscores in sample IDs breaks the pipeline

Bug Description
Underscores in Sample IDs are not supported in deblur. This breaks in several ways --- the reference database check is unable to find any hits when there are underscores present in IDs. As well, IDs with underscores appear to be truncated when underscores are present.

Steps to reproduce the behavior

Run denoise-16s with samples with underscores in the IDs.

Expected behavior
Deblur should work as advertised.

Screenshots
A user reported that mock community samples were producing the following results:

Note sample HMP_mock_2 has no reads hitting the reference. The user had previously used the same mock community and had had success, so this reference miss was surprising.

I reran the same samples through denoise-16s using underscore-less sample IDs:

The sample in question now has the expected amount of reads hitting the reference.

Computation Environment

OS: macOS High Sierra
QIIME 2 Release: 2018.6

Questions

Perhaps the way to solve this in this plugin is to test sample IDs for underscores and error out when observed. Thoughts?

References

Reference DB issue (forum)
Truncated ID issue (forum)

positive/negative filter parameters don't work

They are appending --pos-ref-db or --neg-ref-db to the deblur command that is being built, but the actual options in deblur are ~~--pos-ref-db-fp~~ --pos-ref-fp or ~~--neg-ref-db-fp~~ --neg-ref-fp.

ValueError: Decoded Phred score is out of range [0, 62]

Have you had a chance to check out the docs?
https://docs.qiime2.org
There are many tutorials, walkthroughs, and guides available.

If you still need help, please visit:
https://forum.qiime2.org/c/user-support

environment : qiime2 2023.05 - docker-image
data:PACBIO sequel II
my command as follows:

qiime tools import           --type 'SampleData[SequencesWithQuality]'            --input-path ../input-path-list.tsv            --output-path ccs-data-demux.qza            --input-format SingleEndFastqManifestPhred33V2
qiime cutadapt trim-single --i-demultiplexed-sequences ../ccs-data-demux.qza --p-cores 4 --p-adapter RGYTACCTTGTTACGACTT --p-front AGRGTTYGATYMTGGCTCAG --o-trimmed-sequences ccs-data-cutadapt.qza --output-dir result
qiime deblur denoise-16S --i-demultiplexed-seqs ccs-data-cutadapt.qza --o-table deblur.table.qza --o-representative-sequences deblur.representative-sequences.qza --o-stats deblur.stat.qza --p-trim-length 10 --verbose

when i run those programm, something error information as follows:

  File "/opt/conda/envs/qiime2-2023.5/lib/python3.8/site-packages/deblur/workflow.py", line 130, in trim_seqs
    for label, seq in input_seqs:
  File "/opt/conda/envs/qiime2-2023.5/lib/python3.8/site-packages/deblur/workflow.py", line 99, in sequence_generator
    for record in skbio.read(input_fp, format=format, **kw):
  File "/opt/conda/envs/qiime2-2023.5/lib/python3.8/site-packages/skbio/io/registry.py", line 506, in <genexpr>
    return (x for x in itertools.chain([next(gen)], gen))
  File "/opt/conda/envs/qiime2-2023.5/lib/python3.8/site-packages/skbio/io/registry.py", line 531, in _read_gen
    yield from reader(file, **kwargs)
  File "/opt/conda/envs/qiime2-2023.5/lib/python3.8/site-packages/skbio/io/registry.py", line 1008, in wrapped_reader
    yield from reader_function(fhs[-1], **kwargs)
  File "/opt/conda/envs/qiime2-2023.5/lib/python3.8/site-packages/skbio/io/format/fastq.py", line 351, in _fastq_to_generator
    phred_scores, seq_header = _parse_quality_scores(fh, len(seq),
  File "/opt/conda/envs/qiime2-2023.5/lib/python3.8/site-packages/skbio/io/format/fastq.py", line 522, in _parse_quality_scores
    _decode_qual_to_phred(chunk, variant=variant,
  File "/opt/conda/envs/qiime2-2023.5/lib/python3.8/site-packages/skbio/io/format/_base.py", line 34, in _decode_qual_to_phred
    raise ValueError("Decoded Phred score is out of range [%d, %d]."

this error information is ValueError: Decoded Phred score is out of range [0, 62],but data is correct when i import fastq data

Update to latest scipy

Bug Description
Currently pinned to an old scipy:

    # scipy isn't directly used in this plugin, but setting a version pin here
    # because deblur doesn't currently work with modern scipy versions.
    - scipy <1.1.0

Questions

Why is this pin needed? Is it related to skbio?

Comments

This isn't super critical (yet!), but it'll matter more the older this issue gets.

CI Failing to collect tests?

Bug Description
CI may fail to catch failing tests, likely because the test runner isn't collecting some or all tests.

Steps to reproduce the behavior

Build a test that should fail (see example below for the case that raised this issue)
Open a CR
Check CI results for test failure

Expected behavior
Failing tests should fail CI

References
Affected PR

make two filepath parameters to denoise into artifacts

For the time being, this is going to require that we transition the one existing method into four methods because we don't have optional artifact inputs. This is something that we're addressing shortly, so will be able to transition this back to one method at that time. In the meantime, we'd rather have four methods than accept file paths as input, as the latter won't work well in non-command-line interfaces (e.g., in QIIME Studio, users would have to type a filepath into a text field).

I'll take this one.

numpy 1.14.3 and scipy 1.1.0 break denoise-*

Not sure yet if this is an issue with the plugin, an issue with deblur, or an issue with both packages.

Failing tests on busywork.

cc @wasade

remove comments leftover from cookie-cutter

For example, here, but maybe in other places too.

Update version dependencies

Depends on biocore/deblur#136

add unit tests

There are no unit tests for this plugin.

support SampleData[Sequences] as input

Current Behavior
As of qiime2-2017.9 there's a new semantic type SampleData[Sequences] and a new file format (QIIME1DemuxFormat) associated with that type. This new data type represents the QIIME 1 "demux" format, where sequences have already been demultiplexed and quality-filtered.

Proposed Behavior
By supporting this data type, q2-deblur will be able to support denoising existing QIIME 1 data, or data that's still being produced in this format (some sequencing centers do this for their clients). Currently it is only possible to dereplicate, cluster de novo, or cluster closed-reference with q2-vsearch.

References

QIIME 1 "demux" format
Came up on the forum here.

qiime2 / q2-deblur Goto Github PK

q2-deblur's Introduction

q2-deblur

q2-deblur's People

Contributors

Stargazers

Watchers

Forkers

q2-deblur's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs