qiime2 / q2-cutadapt Goto Github PK

License: BSD 3-Clause "New" or "Revised" License

Makefile 0.13% Python 99.75% TeX 0.12%

q2-cutadapt's Introduction

qiime2 (the QIIME 2 framework)

Source code repository for the QIIME 2 framework.

QIIME 2™ is a powerful, extensible, and decentralized microbiome bioinformatics platform that is free, open source, and community developed. With a focus on data and analysis transparency, QIIME 2 enables researchers to start an analysis with raw DNA sequence data and finish with publication-quality figures and statistical results.

Visit https://qiime2.org to learn more about the QIIME 2 project.

Installation

Detailed instructions are available in the documentation.

Users

Head to the user docs for help getting started, core concepts, tutorials, and other resources.

Just have a question? Please ask it in our forum.

Developers

Please visit the contributing page for more information on contributions, documentation links, and more.

Citing QIIME 2

If you use QIIME 2 for any published research, please include the following citation:

Bolyen E, Rideout JR, Dillon MR, Bokulich NA, Abnet CC, Al-Ghalith GA, Alexander H, Alm EJ, Arumugam M, Asnicar F, Bai Y, Bisanz JE, Bittinger K, Brejnrod A, Brislawn CJ, Brown CT, Callahan BJ, Caraballo-Rodríguez AM, Chase J, Cope EK, Da Silva R, Diener C, Dorrestein PC, Douglas GM, Durall DM, Duvallet C, Edwardson CF, Ernst M, Estaki M, Fouquier J, Gauglitz JM, Gibbons SM, Gibson DL, Gonzalez A, Gorlick K, Guo J, Hillmann B, Holmes S, Holste H, Huttenhower C, Huttley GA, Janssen S, Jarmusch AK, Jiang L, Kaehler BD, Kang KB, Keefe CR, Keim P, Kelley ST, Knights D, Koester I, Kosciolek T, Kreps J, Langille MGI, Lee J, Ley R, Liu YX, Loftfield E, Lozupone C, Maher M, Marotz C, Martin BD, McDonald D, McIver LJ, Melnik AV, Metcalf JL, Morgan SC, Morton JT, Naimey AT, Navas-Molina JA, Nothias LF, Orchanian SB, Pearson T, Peoples SL, Petras D, Preuss ML, Pruesse E, Rasmussen LB, Rivers A, Robeson MS, Rosenthal P, Segata N, Shaffer M, Shiffer A, Sinha R, Song SJ, Spear JR, Swafford AD, Thompson LR, Torres PJ, Trinh P, Tripathi A, Turnbaugh PJ, Ul-Hasan S, van der Hooft JJJ, Vargas F, Vázquez-Baeza Y, Vogtmann E, von Hippel M, Walters W, Wan Y, Wang M, Warren J, Weber KC, Williamson CHD, Willis AD, Xu ZZ, Zaneveld JR, Zhang Y, Zhu Q, Knight R, and Caporaso JG. 2019. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nature Biotechnology 37:852–857. https://doi.org/10.1038/s41587-019-0209-9

q2-cutadapt's People

Contributors

Stargazers

Watchers

Forkers

nbokulich jairideout gregcaporaso ebolyen chriskeefe oddant1 jcszamosi angrybee andrewsanchez nvt-1009 igk-nz hyphaltip keegan-evans lizgehret vaamb hagenjp colinvwood cherman2

q2-cutadapt's Issues

Demux dual-index PE seqs

Cutadapt behaving differently inside and outside of qiime2

Hi,

I've encountered a weird issue that I can't figure out - hopefully one of you can! Here's what I did:

Imported some paired-end fastq data into qiime 2
Visualized the quality
Tried to use cutadapt:

qiime cutadapt trim-paired
--i-demultiplexed-sequences POTATOE-BACT.qza
--p-front-f CCTACGGGNGGCWGCAG
--p-front-r GACTACHVGGGTATCTAATCC
--o-trimmed-sequences POTATOE-BACT_primers_trimmed.qza
--verbose

I noticed the issue when I looked at the standard output from this command. There were 6 files out of 37 that said no adapters were detected (in this case primers). I checked manually, and sure enough they were there in the fastq files. A simple grep search showed that most of the reads had the fwd/rev primer in them.

So, then I tried importing two of the samples separately (1 that worked and 1 that didn't). This time, the issue didn't occur... the file that failed before had its primers detected successfully.

I've also confirmed that cutadapt standalone does not have this issue. I have included some of the log information below. I'm running the same cutadapt that qiime has access to (qiime2-2018.2, installed from conda, both were run from same environment).

As far as I can tell the options are more or less the same, so I'm not sure how I am getting this unpredictable behaviour.

Upon closer inspection of the logs, it appears that somehow the CLI arguments are getting messed up such that R1 and R2 are switched in these samples in qiime2 (see below) which would explain why it's not finding primers. I don't understand how this can be the case as the manifest file I used for the initial batch test looks fine to me (pasted at the very bottom) and it works for the majority of samples. And I also ran another test where I pulled out one of the offending files from the qza archive and ran it with standalone cutadapt and didn't get this issue. So to me, it seems like qiime2 is importing it correctly according to the manifest but somehow the instructions to cutadapt are getting garbled. But I would love this to be a simple mistake on my part but I just can't see where it is...

Thanks in advance for your help,

Jesse

Log for failed file (batch import of 37 samples to qiime2 then cutadapt run on all together with qiime2):

This is cutadapt 1.15 with Python 3.5.5
Command line parameters: --cores 1 --error-rate 0.1 --times 1 --overlap 3 -o /tmp/q2-CasavaOneEightSingleLanePerSampleDirFmt-r6zhwz2c/stationP5_41_L001_R2_001.fastq.gz -p /tmp/q2-CasavaOneEightSingleLanePerSampleDirFmt-r6zhwz2c/stationP5_4_L001_R1_001.fastq.gz --front CCTACGGGNGGCWGCAG -G GACTACHVGGGTATCTAATCC /tmp/qiime2-archive-gwcyl56w/08e4e960-f54e-4038-86e5-486610afe00b/data/stationP5_41_L001_R2_001.fastq.gz /tmp/qiime2-archive-gwcyl56w/08e4e960-f54e-4038-86e5-486610afe00b/data/stationP5_4_L001_R1_001.fastq.gz
Running on 1 core
Trimming 2 adapters with at most 10.0% errors in paired-end mode ...
Finished in 18.84 s (111 us/read; 0.54 M reads/minute).

=== Summary ===

Total read pairs processed:            170,296
  Read 1 with adapter:                       3 (0.0%)
  Read 2 with adapter:                       6 (0.0%)
Pairs written (passing filters):       170,296 (100.0%)

Total basepairs processed:    85,488,592 bp
  Read 1:    42,744,296 bp
  Read 2:    42,744,296 bp
Total written (filtered):     85,488,496 bp (100.0%)
  Read 1:    42,744,254 bp
  Read 2:    42,744,242 bp

=== First read: Adapter 1 ===

Sequence: CCTACGGGNGGCWGCAG; Type: regular 5'; Length: 17; Trimmed: 3 times.

No. of allowed errors:
0-9 bp: 0; 10-17 bp: 1

Overview of removed sequences
length	count	expect	max.err	error counts
8	1	2.6	0	1
17	2	0.0	1	2

=== Second read: Adapter 2 ===

Sequence: GACTACHVGGGTATCTAATCC; Type: regular 5'; Length: 21; Trimmed: 6 times.

No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1; 20-21 bp: 2

Overview of removed sequences
length	count	expect	max.err	error counts
3	4	2660.9	0	4
21	2	0.0	2	0 2

Log for successful run where I imported the failed file in a smaller batch (only 2 samples) and it worked inexplicably:

This is cutadapt 1.15 with Python 3.5.5
Command line parameters: -a CCTACGGGNGGCWGCAG -A GACTACHVGGGTATCTAATCC -o stationP5outR1.fastq.gz -p stationP5outR2.fastq.gz stationP5_4_L001_R1_001.fastq.gz stationP5_41_L001_R2_001.fastq.gz
Running on 1 core
Trimming 2 adapters with at most 10.0% errors in paired-end mode ...
Finished in 14.72 s (86 us/read; 0.69 M reads/minute).

=== Summary ===

Total read pairs processed:            170,296
  Read 1 with adapter:                 169,781 (99.7%)
  Read 2 with adapter:                 169,584 (99.6%)
Pairs written (passing filters):       170,296 (100.0%)

Total basepairs processed:    85,488,592 bp
  Read 1:    42,744,296 bp
  Read 2:    42,744,296 bp
Total written (filtered):        312,231 bp (0.4%)
  Read 1:       131,759 bp
  Read 2:       180,472 bp

=== First read: Adapter 1 ===

Sequence: CCTACGGGNGGCWGCAG; Type: regular 3'; Length: 17; Trimmed: 169781 times.

No. of allowed errors:
0-9 bp: 0; 10-17 bp: 1

Bases preceding removed adapters:
  A: 0.0%
  C: 0.0%
  G: 0.0%
  T: 0.0%
  none/other: 100.0%

Overview of removed sequences
length	count	expect	max.err	error counts
3	6	2660.9	0	6
4	4	665.2	0	4
250	18	0.0	1	15 3
251	169753	0.0	1	162055 7698

=== Second read: Adapter 2 ===

Sequence: GACTACHVGGGTATCTAATCC; Type: regular 3'; Length: 21; Trimmed: 169584 times.

No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1; 20-21 bp: 2

Bases preceding removed adapters:
  A: 0.0%
  C: 0.0%
  G: 0.0%
  T: 0.0%
  none/other: 100.0%

Overview of removed sequences
length	count	expect	max.err	error counts
3	4	2660.9	0	4
5	1	166.3	0	1
9	2	0.6	0	0 2
244	1	0.0	2	1
248	4	0.0	2	0 0 4
250	19	0.0	2	10 9
251	169553	0.0	2	164116 5030 407

Log for successful run of same file with standalone cutadapt:

This is cutadapt 1.15 with Python 3.5.5
Command line parameters: --discard-untrimmed -g CCTACGGGNGGCWGCAG -G GACTACHVGGGTATCTAATCC -o 3237-WHJ-0005_S5_L001_R1_primers-trimmed.fastq.gz -p 3237-WHJ-0005_S5_L001_R2_primers-trimmed.fastq.gz 3237-WHJ-0005_S5_L001_R1_001.fastq.gz 3237-WHJ-0005_S5_L001_R2_001.fastq.gz
Running on 1 core
Trimming 2 adapters with at most 10.0% errors in paired-end mode ...
Finished in 14.24 s (84 us/read; 0.72 M reads/minute).

=== Summary ===

Total read pairs processed:            170,296
  Read 1 with adapter:                 169,895 (99.8%)
  Read 2 with adapter:                 169,605 (99.6%)
Pairs written (passing filters):       169,282 (99.4%)

Total basepairs processed:    85,488,592 bp
  Read 1:    42,744,296 bp
  Read 2:    42,744,296 bp
Total written (filtered):     78,550,377 bp (91.9%)
  Read 1:    39,613,870 bp
  Read 2:    38,936,507 bp

=== First read: Adapter 1 ===

Sequence: CCTACGGGNGGCWGCAG; Type: regular 5'; Length: 17; Trimmed: 169895 times.

No. of allowed errors:
0-9 bp: 0; 10-17 bp: 1

Overview of removed sequences
length	count	expect	max.err	error counts
3	85	2660.9	0	85
11	1	0.0	1	1
14	4	0.0	1	2 2
15	24	0.0	1	9 15
16	943	0.0	1	203 740
17	168558	0.0	1	162055 6503
18	280	0.0	1	15 265

=== Second read: Adapter 2 ===

Sequence: GACTACHVGGGTATCTAATCC; Type: regular 5'; Length: 21; Trimmed: 169605 times.

No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1; 20-21 bp: 2

Overview of removed sequences
length	count	expect	max.err	error counts
15	2	0.0	1	0 2
16	3	0.0	1	0 3
17	13	0.0	1	10 3
18	20	0.0	1	3 17
19	103	0.0	1	4 5 94
20	1748	0.0	2	86 1604 58
21	167289	0.0	2	164116 2972 201
22	420	0.0	2	10 379 31
23	2	0.0	2	0 0 2
25	4	0.0	2	0 0 4
28	1	0.0	2	1

My manifest file:

sample-id,absolute-filepath,direction
stationP1,$PWD/BACT-341F-805R/3237-WHJ-0001_S1_L001_R1_001.fastq.gz,forward
stationP2,$PWD/BACT-341F-805R/3237-WHJ-0002_S2_L001_R1_001.fastq.gz,forward
stationP3,$PWD/BACT-341F-805R/3237-WHJ-0003_S3_L001_R1_001.fastq.gz,forward
stationP4,$PWD/BACT-341F-805R/3237-WHJ-0004_S4_L001_R1_001.fastq.gz,forward
stationP5,$PWD/BACT-341F-805R/3237-WHJ-0005_S5_L001_R1_001.fastq.gz,forward
stationP6,$PWD/BACT-341F-805R/3237-WHJ-0006_S6_L001_R1_001.fastq.gz,forward
stationP7,$PWD/BACT-341F-805R/3237-WHJ-0007_S7_L001_R1_001.fastq.gz,forward
stationP8,$PWD/BACT-341F-805R/3237-WHJ-0008_S8_L001_R1_001.fastq.gz,forward
stationP9,$PWD/BACT-341F-805R/3237-WHJ-0009_S9_L001_R1_001.fastq.gz,forward
stationP10,$PWD/BACT-341F-805R/3237-WHJ-0010_S10_L001_R1_001.fastq.gz,forward
stationP11,$PWD/BACT-341F-805R/3237-WHJ-0011_S11_L001_R1_001.fastq.gz,forward
stationP12,$PWD/BACT-341F-805R/3237-WHJ-0012_S12_L001_R1_001.fastq.gz,forward
stationP13,$PWD/BACT-341F-805R/3237-WHJ-0013_S13_L001_R1_001.fastq.gz,forward
stationP14,$PWD/BACT-341F-805R/3237-WHJ-0014_S14_L001_R1_001.fastq.gz,forward
stationP15,$PWD/BACT-341F-805R/3237-WHJ-0015_S15_L001_R1_001.fastq.gz,forward
stationP16,$PWD/BACT-341F-805R/3237-WHJ-0016_S16_L001_R1_001.fastq.gz,forward
stationP17,$PWD/BACT-341F-805R/3237-WHJ-0017_S17_L001_R1_001.fastq.gz,forward
stationP18,$PWD/BACT-341F-805R/3237-WHJ-0018_S18_L001_R1_001.fastq.gz,forward
stationP20,$PWD/BACT-341F-805R/3237-WHJ-0019_S19_L001_R1_001.fastq.gz,forward
stationP21,$PWD/BACT-341F-805R/3237-WHJ-0020_S20_L001_R1_001.fastq.gz,forward
stationP22,$PWD/BACT-341F-805R/3237-WHJ-0021_S21_L001_R1_001.fastq.gz,forward
stationP23,$PWD/BACT-341F-805R/3237-WHJ-0022_S22_L001_R1_001.fastq.gz,forward
stationP24,$PWD/BACT-341F-805R/3237-WHJ-0023_S23_L001_R1_001.fastq.gz,forward
stationP25,$PWD/BACT-341F-805R/3237-WHJ-0024_S24_L001_R1_001.fastq.gz,forward
stationP26,$PWD/BACT-341F-805R/3237-WHJ-0025_S25_L001_R1_001.fastq.gz,forward
stationP27,$PWD/BACT-341F-805R/3237-WHJ-0026_S26_L001_R1_001.fastq.gz,forward
stationP28,$PWD/BACT-341F-805R/3237-WHJ-0027_S27_L001_R1_001.fastq.gz,forward
stationP29,$PWD/BACT-341F-805R/3237-WHJ-0028_S28_L001_R1_001.fastq.gz,forward
stationP30,$PWD/BACT-341F-805R/3237-WHJ-0029_S29_L001_R1_001.fastq.gz,forward
stationP31,$PWD/BACT-341F-805R/3237-WHJ-0030_S30_L001_R1_001.fastq.gz,forward
stationM1,$PWD/BACT-341F-805R/3237-WHJ-0031_S31_L001_R1_001.fastq.gz,forward
stationM2,$PWD/BACT-341F-805R/3237-WHJ-0032_S32_L001_R1_001.fastq.gz,forward
stationM3,$PWD/BACT-341F-805R/3237-WHJ-0033_S33_L001_R1_001.fastq.gz,forward
stationM4,$PWD/BACT-341F-805R/3237-WHJ-0034_S34_L001_R1_001.fastq.gz,forward
stationM5,$PWD/BACT-341F-805R/3237-WHJ-0035_S35_L001_R1_001.fastq.gz,forward
stationM6,$PWD/BACT-341F-805R/3237-WHJ-0036_S36_L001_R1_001.fastq.gz,forward
stationM7,$PWD/BACT-341F-805R/3237-WHJ-0037_S37_L001_R1_001.fastq.gz,forward
stationP1,$PWD/BACT-341F-805R/3237-WHJ-0001_S1_L001_R2_001.fastq.gz,reverse
stationP2,$PWD/BACT-341F-805R/3237-WHJ-0002_S2_L001_R2_001.fastq.gz,reverse
stationP3,$PWD/BACT-341F-805R/3237-WHJ-0003_S3_L001_R2_001.fastq.gz,reverse
stationP4,$PWD/BACT-341F-805R/3237-WHJ-0004_S4_L001_R2_001.fastq.gz,reverse
stationP5,$PWD/BACT-341F-805R/3237-WHJ-0005_S5_L001_R2_001.fastq.gz,reverse
stationP6,$PWD/BACT-341F-805R/3237-WHJ-0006_S6_L001_R2_001.fastq.gz,reverse
stationP7,$PWD/BACT-341F-805R/3237-WHJ-0007_S7_L001_R2_001.fastq.gz,reverse
stationP8,$PWD/BACT-341F-805R/3237-WHJ-0008_S8_L001_R2_001.fastq.gz,reverse
stationP9,$PWD/BACT-341F-805R/3237-WHJ-0009_S9_L001_R2_001.fastq.gz,reverse
stationP10,$PWD/BACT-341F-805R/3237-WHJ-0010_S10_L001_R2_001.fastq.gz,reverse
stationP11,$PWD/BACT-341F-805R/3237-WHJ-0011_S11_L001_R2_001.fastq.gz,reverse
stationP12,$PWD/BACT-341F-805R/3237-WHJ-0012_S12_L001_R2_001.fastq.gz,reverse
stationP13,$PWD/BACT-341F-805R/3237-WHJ-0013_S13_L001_R2_001.fastq.gz,reverse
stationP14,$PWD/BACT-341F-805R/3237-WHJ-0014_S14_L001_R2_001.fastq.gz,reverse
stationP15,$PWD/BACT-341F-805R/3237-WHJ-0015_S15_L001_R2_001.fastq.gz,reverse
stationP16,$PWD/BACT-341F-805R/3237-WHJ-0016_S16_L001_R2_001.fastq.gz,reverse
stationP17,$PWD/BACT-341F-805R/3237-WHJ-0017_S17_L001_R2_001.fastq.gz,reverse
stationP18,$PWD/BACT-341F-805R/3237-WHJ-0018_S18_L001_R2_001.fastq.gz,reverse
stationP20,$PWD/BACT-341F-805R/3237-WHJ-0019_S19_L001_R2_001.fastq.gz,reverse
stationP21,$PWD/BACT-341F-805R/3237-WHJ-0020_S20_L001_R2_001.fastq.gz,reverse
stationP22,$PWD/BACT-341F-805R/3237-WHJ-0021_S21_L001_R2_001.fastq.gz,reverse
stationP23,$PWD/BACT-341F-805R/3237-WHJ-0022_S22_L001_R2_001.fastq.gz,reverse
stationP24,$PWD/BACT-341F-805R/3237-WHJ-0023_S23_L001_R2_001.fastq.gz,reverse
stationP25,$PWD/BACT-341F-805R/3237-WHJ-0024_S24_L001_R2_001.fastq.gz,reverse
stationP26,$PWD/BACT-341F-805R/3237-WHJ-0025_S25_L001_R2_001.fastq.gz,reverse
stationP27,$PWD/BACT-341F-805R/3237-WHJ-0026_S26_L001_R2_001.fastq.gz,reverse
stationP28,$PWD/BACT-341F-805R/3237-WHJ-0027_S27_L001_R2_001.fastq.gz,reverse
stationP29,$PWD/BACT-341F-805R/3237-WHJ-0028_S28_L001_R2_001.fastq.gz,reverse
stationP30,$PWD/BACT-341F-805R/3237-WHJ-0029_S29_L001_R2_001.fastq.gz,reverse
stationP31,$PWD/BACT-341F-805R/3237-WHJ-0030_S30_L001_R2_001.fastq.gz,reverse
stationM1,$PWD/BACT-341F-805R/3237-WHJ-0031_S31_L001_R2_001.fastq.gz,reverse
stationM2,$PWD/BACT-341F-805R/3237-WHJ-0032_S32_L001_R2_001.fastq.gz,reverse
stationM3,$PWD/BACT-341F-805R/3237-WHJ-0033_S33_L001_R2_001.fastq.gz,reverse
stationM4,$PWD/BACT-341F-805R/3237-WHJ-0034_S34_L001_R2_001.fastq.gz,reverse
stationM5,$PWD/BACT-341F-805R/3237-WHJ-0035_S35_L001_R2_001.fastq.gz,reverse
stationM6,$PWD/BACT-341F-805R/3237-WHJ-0036_S36_L001_R2_001.fastq.gz,reverse
stationM7,$PWD/BACT-341F-805R/3237-WHJ-0037_S37_L001_R2_001.fastq.gz,reverse

When called from a shell script, qiime2 is calling the system version of cutadapt, not the qiime version.

Bug Description
When I run the q2-cutadapt plugin directly on the command line, it behaves fine, but when I call it from inside a script it calls the system version of cutadapt instead of the qiime2 version.

It is not a case of the qiime2 environment failing to activate inside the script. If that were the case, the qiime2 commands wouldn't work at all.

I have attached a screenshot, the script in question, and qiime2's log file pertaining to the error.

trim_test.sh.txt

In particular, I draw your attention to line 7 in the log file, where the traceback shows that it's using my system version of cutadapt:

qiime2-q2cli-err-eu6loi4f.log

`trim-single`: accept `SampleData[JoinedSequencesWithQuality]`

Questions
is there any reason not to?

References
forum xref

Expose --m (min output read length) parameter

For now reads that don't pass this filter will have to be written to /dev/null, since we haven't squared away the nullable outputs situation brought up in #10. In the meantime, the methods can use a Range to enforce a minimum read length of 1, which will prevent FastqGzFormat validation issues.

This recently came up on the forum:

here
here

Too many open files

Just like with q2-demux, it looks like we need to investigate filehandle accounting here, too. Unfortunately, it looks like the issue is originating within cutadapt or one of its related tools, xopen.

This recently came up on the forum.

Convert --verbose stats to visualization

Improvement Description
Convert --verbose stats to visualization

Questions

Is it possible to have metadata in-between like DADA2 or would this just become a visualization directly?

Include quality trimming in the q2-cutadapt plugin?

Improvement Description
I don't know if this has already been discussed, but I wonder if people would be open to including quality trimming in the cutadapt plugin.

Current Behavior
Right now the q2-cutadapt plugin only trims adapters, and quality trimming needs to be done using the q2-quality-filter plugin.

Proposed Behavior
It would be nice if the option was available to take advantage of cutadapt's quality-trimming functionality in this plugin. I think it would have a few advantages:

cutadapt's quality trimming algorithm is more sophisticated than the one implemented in q2-quality-filter and, IME, produces cleaner sequences with less need for trying various parameter values
Given that every time a qiime2 plugin is called, the user needs to wait for the objects to get unzipped and rezipped, reducing the number of plugins required to perform conceptually-grouped tasks seems like a good thing (within reason).
Omitting this functionality feels like it hobbles the plugin compared to running cutadapt on its own, and is therefore a barrier to entry for more experienced users who are accustomed to using cutadapt or similar tools already.

I've never contributed to an open-source project before, but I could probably fork this repo and add the functionality if that's desirable (or whatever your process is).

Update for cutadapt 3.0

Bug Description
There are three errors in our test suite that seem to be related to changes in the latest cutadapt release. A test failure can be seen in this busywork run. All of the errors are in test_demux.py::TestDemuxSingle.

TestDemuxSingle.test_none_matched may be related to marcelm/cutadapt#478.

References

marcelm/cutadapt@v2.10...v3.0

add support for multicore?

Improvement Description
Is there a reason the -j option for cutadapt v3.0 isn't used which support multicore runs for cutadapt

Current Behavior
Single-thread cutadapt (default -j 1) is what current runs

Proposed Behavior
add a CLI option to specify a core / thread count to use

Output multiple artifacts per primer, similar to Cutadapt's demultiplexing method

Addition Description
It would be useful to bin reads by primer prior to primer removal. I'd like to separate a single FASTQ-based artifact (containing several different primers) into multiple output artifacts by primer; each output artifact would be characterized by a single primer. This would be helpful for meta-analyses in which sequences with multiple primers/variable regions may be found in a single QIIME artifact.

This is possible with native Cutadapt (as of v4.5) using steps to demultiplex, but not in the QIIME 2 plugin as its inputs are restricted to specific semantic types.

Current Behavior

The QIIME 2 plugin performs a similar function with qiime cutadapt demux (based on adapter sequence), but generates only a single output for demultiplexed sequences. It also requires an input artifact of type MultiplexedSingleEndBarcodeInSequence and does not accept SampleData[Single/PairedEndSequencesWithQuality].
qiime cutadapt trim could technically perform this by running the command once per primer (pair), but that is quite inefficient.

Proposed Behavior

q2-cutadapt would take as input 1) a FASTQ artifact of SampleData[Single/PairedEndSequencesWithQuality], which contains N different primer sequences among its many reads, and 2) a tab-separated metadata file containing the N primer names and corresponding primer sequences.
As output, it would generate N artifacts of SampleData[Single/PairedEndSequencesWithQuality]; each output artifact would contain reads of the same primer sequence. There would also be an output artifact (also SampleData[Single/PairedEndSequencesWithQuality]) of sequences that did not have any of the N primer names.

Questions

Does QIIME 2 allow for variable numbers of output artifacts? I suppose that would be a blocker to implementation.

References

trim-*: discard unmatched

This cutadapt parameter controls if unmatched reads should be discarded - this would be pretty useful to wrap (and straightforward).

This recently came up on the forum.

Add Log Ouput

Hi
Is it possible to add an option to print/save the log of the software?
I just want to be sure that all the reads contained the primers that I want to remove.
This information was shown in the "original" cutadapt software.
Best
Greg

Plugin error from cutadapt:

Hi all,

I have an error with cudadapt plugin. I am trying to demultiplex a file which contain single end sequences of both forward and reverse reads. I have just tried once this demultiplexing script with a similar file and it worked, but now that I used another file with the same characteristics it didn't. Could it may be related with some wrong sample, which has no forward reads?

Here i live you the error message I get:

Command '['cutadapt', '--front', 'file:/scratch-local/mbloemen/tmpvu_nizlr', '--error-rate', '0.0', '-o', '/scratch-local/mbloemen/q2-CasavaOneEightSingleLanePerSampleDirFmt-s8ii_4q0/{name}.1.fastq.gz', '--untrimmed-output', '/scratch-local/mbloemen/q2-MultiplexedSingleEndBarcodeInSequenceDirFmt-d72v4q_v/forward.fastq.gz', '/scratch-local/mbloemen/qiime2-archive-23n9wvxj/2cc054c9-525f-4181-951e-9ea0e0c5b3c6/data/forward.fastq.gz']' returned non-zero exit status -9

Thank you very much in advance,

Serena

reads are in mixed orientation (both R1 and R2 reads contain forward/reverse reads)
reads also often contain barcodes
reads also probably contain primers

Sounds like this is the output produced by at least one company, so may want to provide support for this.

Proposed Behavior

add method to demux "mixed orientation" read format in one step
add a method to reorient reads (I think this may make more sense to allow a modular pipeline of import as paired-end --> reorient reads --> extract barcodes --> trim primers)

References
raised on forum

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs

Jooble