I've encountered a weird issue that I can't figure out - hopefully one of you can! Here's what I did:
I noticed the issue when I looked at the standard output from this command. There were 6 files out of 37 that said no adapters were detected (in this case primers). I checked manually, and sure enough they were there in the fastq files. A simple grep search showed that most of the reads had the fwd/rev primer in them.
So, then I tried importing two of the samples separately (1 that worked and 1 that didn't). This time, the issue didn't occur... the file that failed before had its primers detected successfully.
I've also confirmed that cutadapt standalone does not have this issue. I have included some of the log information below. I'm running the same cutadapt that qiime has access to (qiime2-2018.2, installed from conda, both were run from same environment).
As far as I can tell the options are more or less the same, so I'm not sure how I am getting this unpredictable behaviour.
Upon closer inspection of the logs, it appears that somehow the CLI arguments are getting messed up such that R1 and R2 are switched in these samples in qiime2 (see below) which would explain why it's not finding primers. I don't understand how this can be the case as the manifest file I used for the initial batch test looks fine to me (pasted at the very bottom) and it works for the majority of samples. And I also ran another test where I pulled out one of the offending files from the qza archive and ran it with standalone cutadapt and didn't get this issue. So to me, it seems like qiime2 is importing it correctly according to the manifest but somehow the instructions to cutadapt are getting garbled. But I would love this to be a simple mistake on my part but I just can't see where it is...
Log for failed file (batch import of 37 samples to qiime2 then cutadapt run on all together with qiime2):
This is cutadapt 1.15 with Python 3.5.5
Command line parameters: --cores 1 --error-rate 0.1 --times 1 --overlap 3 -o /tmp/q2-CasavaOneEightSingleLanePerSampleDirFmt-r6zhwz2c/stationP5_41_L001_R2_001.fastq.gz -p /tmp/q2-CasavaOneEightSingleLanePerSampleDirFmt-r6zhwz2c/stationP5_4_L001_R1_001.fastq.gz --front CCTACGGGNGGCWGCAG -G GACTACHVGGGTATCTAATCC /tmp/qiime2-archive-gwcyl56w/08e4e960-f54e-4038-86e5-486610afe00b/data/stationP5_41_L001_R2_001.fastq.gz /tmp/qiime2-archive-gwcyl56w/08e4e960-f54e-4038-86e5-486610afe00b/data/stationP5_4_L001_R1_001.fastq.gz
Running on 1 core
Trimming 2 adapters with at most 10.0% errors in paired-end mode ...
Finished in 18.84 s (111 us/read; 0.54 M reads/minute).
=== Summary ===
Total read pairs processed: 170,296
Read 1 with adapter: 3 (0.0%)
Read 2 with adapter: 6 (0.0%)
Pairs written (passing filters): 170,296 (100.0%)
Total basepairs processed: 85,488,592 bp
Read 1: 42,744,296 bp
Read 2: 42,744,296 bp
Total written (filtered): 85,488,496 bp (100.0%)
Read 1: 42,744,254 bp
Read 2: 42,744,242 bp
=== First read: Adapter 1 ===
Sequence: CCTACGGGNGGCWGCAG; Type: regular 5'; Length: 17; Trimmed: 3 times.
No. of allowed errors:
0-9 bp: 0; 10-17 bp: 1
Overview of removed sequences
length count expect max.err error counts
8 1 2.6 0 1
17 2 0.0 1 2
=== Second read: Adapter 2 ===
Sequence: GACTACHVGGGTATCTAATCC; Type: regular 5'; Length: 21; Trimmed: 6 times.
No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1; 20-21 bp: 2
Overview of removed sequences
length count expect max.err error counts
3 4 2660.9 0 4
21 2 0.0 2 0 2
Log for successful run where I imported the failed file in a smaller batch (only 2 samples) and it worked inexplicably:
This is cutadapt 1.15 with Python 3.5.5
Command line parameters: -a CCTACGGGNGGCWGCAG -A GACTACHVGGGTATCTAATCC -o stationP5outR1.fastq.gz -p stationP5outR2.fastq.gz stationP5_4_L001_R1_001.fastq.gz stationP5_41_L001_R2_001.fastq.gz
Running on 1 core
Trimming 2 adapters with at most 10.0% errors in paired-end mode ...
Finished in 14.72 s (86 us/read; 0.69 M reads/minute).
=== Summary ===
Total read pairs processed: 170,296
Read 1 with adapter: 169,781 (99.7%)
Read 2 with adapter: 169,584 (99.6%)
Pairs written (passing filters): 170,296 (100.0%)
Total basepairs processed: 85,488,592 bp
Read 1: 42,744,296 bp
Read 2: 42,744,296 bp
Total written (filtered): 312,231 bp (0.4%)
Read 1: 131,759 bp
Read 2: 180,472 bp
=== First read: Adapter 1 ===
Sequence: CCTACGGGNGGCWGCAG; Type: regular 3'; Length: 17; Trimmed: 169781 times.
No. of allowed errors:
0-9 bp: 0; 10-17 bp: 1
Bases preceding removed adapters:
A: 0.0%
C: 0.0%
G: 0.0%
T: 0.0%
none/other: 100.0%
Overview of removed sequences
length count expect max.err error counts
3 6 2660.9 0 6
4 4 665.2 0 4
250 18 0.0 1 15 3
251 169753 0.0 1 162055 7698
=== Second read: Adapter 2 ===
Sequence: GACTACHVGGGTATCTAATCC; Type: regular 3'; Length: 21; Trimmed: 169584 times.
No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1; 20-21 bp: 2
Bases preceding removed adapters:
A: 0.0%
C: 0.0%
G: 0.0%
T: 0.0%
none/other: 100.0%
Overview of removed sequences
length count expect max.err error counts
3 4 2660.9 0 4
5 1 166.3 0 1
9 2 0.6 0 0 2
244 1 0.0 2 1
248 4 0.0 2 0 0 4
250 19 0.0 2 10 9
251 169553 0.0 2 164116 5030 407
This is cutadapt 1.15 with Python 3.5.5
Command line parameters: --discard-untrimmed -g CCTACGGGNGGCWGCAG -G GACTACHVGGGTATCTAATCC -o 3237-WHJ-0005_S5_L001_R1_primers-trimmed.fastq.gz -p 3237-WHJ-0005_S5_L001_R2_primers-trimmed.fastq.gz 3237-WHJ-0005_S5_L001_R1_001.fastq.gz 3237-WHJ-0005_S5_L001_R2_001.fastq.gz
Running on 1 core
Trimming 2 adapters with at most 10.0% errors in paired-end mode ...
Finished in 14.24 s (84 us/read; 0.72 M reads/minute).
=== Summary ===
Total read pairs processed: 170,296
Read 1 with adapter: 169,895 (99.8%)
Read 2 with adapter: 169,605 (99.6%)
Pairs written (passing filters): 169,282 (99.4%)
Total basepairs processed: 85,488,592 bp
Read 1: 42,744,296 bp
Read 2: 42,744,296 bp
Total written (filtered): 78,550,377 bp (91.9%)
Read 1: 39,613,870 bp
Read 2: 38,936,507 bp
=== First read: Adapter 1 ===
Sequence: CCTACGGGNGGCWGCAG; Type: regular 5'; Length: 17; Trimmed: 169895 times.
No. of allowed errors:
0-9 bp: 0; 10-17 bp: 1
Overview of removed sequences
length count expect max.err error counts
3 85 2660.9 0 85
11 1 0.0 1 1
14 4 0.0 1 2 2
15 24 0.0 1 9 15
16 943 0.0 1 203 740
17 168558 0.0 1 162055 6503
18 280 0.0 1 15 265
=== Second read: Adapter 2 ===
Sequence: GACTACHVGGGTATCTAATCC; Type: regular 5'; Length: 21; Trimmed: 169605 times.
No. of allowed errors:
0-9 bp: 0; 10-19 bp: 1; 20-21 bp: 2
Overview of removed sequences
length count expect max.err error counts
15 2 0.0 1 0 2
16 3 0.0 1 0 3
17 13 0.0 1 10 3
18 20 0.0 1 3 17
19 103 0.0 1 4 5 94
20 1748 0.0 2 86 1604 58
21 167289 0.0 2 164116 2972 201
22 420 0.0 2 10 379 31
23 2 0.0 2 0 0 2
25 4 0.0 2 0 0 4
28 1 0.0 2 1
sample-id,absolute-filepath,direction
stationP1,$PWD/BACT-341F-805R/3237-WHJ-0001_S1_L001_R1_001.fastq.gz,forward
stationP2,$PWD/BACT-341F-805R/3237-WHJ-0002_S2_L001_R1_001.fastq.gz,forward
stationP3,$PWD/BACT-341F-805R/3237-WHJ-0003_S3_L001_R1_001.fastq.gz,forward
stationP4,$PWD/BACT-341F-805R/3237-WHJ-0004_S4_L001_R1_001.fastq.gz,forward
stationP5,$PWD/BACT-341F-805R/3237-WHJ-0005_S5_L001_R1_001.fastq.gz,forward
stationP6,$PWD/BACT-341F-805R/3237-WHJ-0006_S6_L001_R1_001.fastq.gz,forward
stationP7,$PWD/BACT-341F-805R/3237-WHJ-0007_S7_L001_R1_001.fastq.gz,forward
stationP8,$PWD/BACT-341F-805R/3237-WHJ-0008_S8_L001_R1_001.fastq.gz,forward
stationP9,$PWD/BACT-341F-805R/3237-WHJ-0009_S9_L001_R1_001.fastq.gz,forward
stationP10,$PWD/BACT-341F-805R/3237-WHJ-0010_S10_L001_R1_001.fastq.gz,forward
stationP11,$PWD/BACT-341F-805R/3237-WHJ-0011_S11_L001_R1_001.fastq.gz,forward
stationP12,$PWD/BACT-341F-805R/3237-WHJ-0012_S12_L001_R1_001.fastq.gz,forward
stationP13,$PWD/BACT-341F-805R/3237-WHJ-0013_S13_L001_R1_001.fastq.gz,forward
stationP14,$PWD/BACT-341F-805R/3237-WHJ-0014_S14_L001_R1_001.fastq.gz,forward
stationP15,$PWD/BACT-341F-805R/3237-WHJ-0015_S15_L001_R1_001.fastq.gz,forward
stationP16,$PWD/BACT-341F-805R/3237-WHJ-0016_S16_L001_R1_001.fastq.gz,forward
stationP17,$PWD/BACT-341F-805R/3237-WHJ-0017_S17_L001_R1_001.fastq.gz,forward
stationP18,$PWD/BACT-341F-805R/3237-WHJ-0018_S18_L001_R1_001.fastq.gz,forward
stationP20,$PWD/BACT-341F-805R/3237-WHJ-0019_S19_L001_R1_001.fastq.gz,forward
stationP21,$PWD/BACT-341F-805R/3237-WHJ-0020_S20_L001_R1_001.fastq.gz,forward
stationP22,$PWD/BACT-341F-805R/3237-WHJ-0021_S21_L001_R1_001.fastq.gz,forward
stationP23,$PWD/BACT-341F-805R/3237-WHJ-0022_S22_L001_R1_001.fastq.gz,forward
stationP24,$PWD/BACT-341F-805R/3237-WHJ-0023_S23_L001_R1_001.fastq.gz,forward
stationP25,$PWD/BACT-341F-805R/3237-WHJ-0024_S24_L001_R1_001.fastq.gz,forward
stationP26,$PWD/BACT-341F-805R/3237-WHJ-0025_S25_L001_R1_001.fastq.gz,forward
stationP27,$PWD/BACT-341F-805R/3237-WHJ-0026_S26_L001_R1_001.fastq.gz,forward
stationP28,$PWD/BACT-341F-805R/3237-WHJ-0027_S27_L001_R1_001.fastq.gz,forward
stationP29,$PWD/BACT-341F-805R/3237-WHJ-0028_S28_L001_R1_001.fastq.gz,forward
stationP30,$PWD/BACT-341F-805R/3237-WHJ-0029_S29_L001_R1_001.fastq.gz,forward
stationP31,$PWD/BACT-341F-805R/3237-WHJ-0030_S30_L001_R1_001.fastq.gz,forward
stationM1,$PWD/BACT-341F-805R/3237-WHJ-0031_S31_L001_R1_001.fastq.gz,forward
stationM2,$PWD/BACT-341F-805R/3237-WHJ-0032_S32_L001_R1_001.fastq.gz,forward
stationM3,$PWD/BACT-341F-805R/3237-WHJ-0033_S33_L001_R1_001.fastq.gz,forward
stationM4,$PWD/BACT-341F-805R/3237-WHJ-0034_S34_L001_R1_001.fastq.gz,forward
stationM5,$PWD/BACT-341F-805R/3237-WHJ-0035_S35_L001_R1_001.fastq.gz,forward
stationM6,$PWD/BACT-341F-805R/3237-WHJ-0036_S36_L001_R1_001.fastq.gz,forward
stationM7,$PWD/BACT-341F-805R/3237-WHJ-0037_S37_L001_R1_001.fastq.gz,forward
stationP1,$PWD/BACT-341F-805R/3237-WHJ-0001_S1_L001_R2_001.fastq.gz,reverse
stationP2,$PWD/BACT-341F-805R/3237-WHJ-0002_S2_L001_R2_001.fastq.gz,reverse
stationP3,$PWD/BACT-341F-805R/3237-WHJ-0003_S3_L001_R2_001.fastq.gz,reverse
stationP4,$PWD/BACT-341F-805R/3237-WHJ-0004_S4_L001_R2_001.fastq.gz,reverse
stationP5,$PWD/BACT-341F-805R/3237-WHJ-0005_S5_L001_R2_001.fastq.gz,reverse
stationP6,$PWD/BACT-341F-805R/3237-WHJ-0006_S6_L001_R2_001.fastq.gz,reverse
stationP7,$PWD/BACT-341F-805R/3237-WHJ-0007_S7_L001_R2_001.fastq.gz,reverse
stationP8,$PWD/BACT-341F-805R/3237-WHJ-0008_S8_L001_R2_001.fastq.gz,reverse
stationP9,$PWD/BACT-341F-805R/3237-WHJ-0009_S9_L001_R2_001.fastq.gz,reverse
stationP10,$PWD/BACT-341F-805R/3237-WHJ-0010_S10_L001_R2_001.fastq.gz,reverse
stationP11,$PWD/BACT-341F-805R/3237-WHJ-0011_S11_L001_R2_001.fastq.gz,reverse
stationP12,$PWD/BACT-341F-805R/3237-WHJ-0012_S12_L001_R2_001.fastq.gz,reverse
stationP13,$PWD/BACT-341F-805R/3237-WHJ-0013_S13_L001_R2_001.fastq.gz,reverse
stationP14,$PWD/BACT-341F-805R/3237-WHJ-0014_S14_L001_R2_001.fastq.gz,reverse
stationP15,$PWD/BACT-341F-805R/3237-WHJ-0015_S15_L001_R2_001.fastq.gz,reverse
stationP16,$PWD/BACT-341F-805R/3237-WHJ-0016_S16_L001_R2_001.fastq.gz,reverse
stationP17,$PWD/BACT-341F-805R/3237-WHJ-0017_S17_L001_R2_001.fastq.gz,reverse
stationP18,$PWD/BACT-341F-805R/3237-WHJ-0018_S18_L001_R2_001.fastq.gz,reverse
stationP20,$PWD/BACT-341F-805R/3237-WHJ-0019_S19_L001_R2_001.fastq.gz,reverse
stationP21,$PWD/BACT-341F-805R/3237-WHJ-0020_S20_L001_R2_001.fastq.gz,reverse
stationP22,$PWD/BACT-341F-805R/3237-WHJ-0021_S21_L001_R2_001.fastq.gz,reverse
stationP23,$PWD/BACT-341F-805R/3237-WHJ-0022_S22_L001_R2_001.fastq.gz,reverse
stationP24,$PWD/BACT-341F-805R/3237-WHJ-0023_S23_L001_R2_001.fastq.gz,reverse
stationP25,$PWD/BACT-341F-805R/3237-WHJ-0024_S24_L001_R2_001.fastq.gz,reverse
stationP26,$PWD/BACT-341F-805R/3237-WHJ-0025_S25_L001_R2_001.fastq.gz,reverse
stationP27,$PWD/BACT-341F-805R/3237-WHJ-0026_S26_L001_R2_001.fastq.gz,reverse
stationP28,$PWD/BACT-341F-805R/3237-WHJ-0027_S27_L001_R2_001.fastq.gz,reverse
stationP29,$PWD/BACT-341F-805R/3237-WHJ-0028_S28_L001_R2_001.fastq.gz,reverse
stationP30,$PWD/BACT-341F-805R/3237-WHJ-0029_S29_L001_R2_001.fastq.gz,reverse
stationP31,$PWD/BACT-341F-805R/3237-WHJ-0030_S30_L001_R2_001.fastq.gz,reverse
stationM1,$PWD/BACT-341F-805R/3237-WHJ-0031_S31_L001_R2_001.fastq.gz,reverse
stationM2,$PWD/BACT-341F-805R/3237-WHJ-0032_S32_L001_R2_001.fastq.gz,reverse
stationM3,$PWD/BACT-341F-805R/3237-WHJ-0033_S33_L001_R2_001.fastq.gz,reverse
stationM4,$PWD/BACT-341F-805R/3237-WHJ-0034_S34_L001_R2_001.fastq.gz,reverse
stationM5,$PWD/BACT-341F-805R/3237-WHJ-0035_S35_L001_R2_001.fastq.gz,reverse
stationM6,$PWD/BACT-341F-805R/3237-WHJ-0036_S36_L001_R2_001.fastq.gz,reverse
stationM7,$PWD/BACT-341F-805R/3237-WHJ-0037_S37_L001_R2_001.fastq.gz,reverse