GithubHelp home page GithubHelp logo

usda-ars-gbru / itsxpress Goto Github PK

View Code? Open in Web Editor NEW
7.0 7.0 8.0 11.75 MB

Software to trim the ITS region of FASTQ sequences for amplicon sequencing analysis

License: Other

Python 90.79% Dockerfile 0.74% TeX 8.47%

itsxpress's People

Contributors

arivers avatar seina001 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

itsxpress's Issues

ITSexpress and DADA2 with novaseq

Hello, I am trying to incorporate ITSexpress with my Novaseq data. I filtered my sequences using dada2 and then ran it through ITSexpress. I want to learn my error rates but I was wondering if ITSexpress will interfere with this process.

Cannot properly install on qiime2

Hi, I just installed itsxpress to my qiime2 environment and got some error.
When I install with “conda install -c bioconda itsxpress” , itsxpress was not associated with conda command and “qiime itsxpress” returns “Error: QIIME 2 has no plugin/command named 'itsxpress'.”
Just hit “itsxpress” returuns some message
~$ itsxpress
usage: itsxpress [-h] --fastq FASTQ [--single_end] [--fastq2 FASTQ2] –outfile…
So, it seems something is wrong with conda installation.
I tried qiime2-2021.11 and 2021.8, my conda version is 4.13.0.
With pip installation, I can successfully install the itsxpress.

ITSx on reverse-complement

Hi,
First, thanks for this useful tool and the qiime2 plugin as well!
It seems from my testing that the ITSx implementation here does not also search reverse-complement of input, as ITSx originally did by default. Is this correct, and if so, would it be possible to implement a flag to search the other orientation?
It seems reasonable to not do both as ITSx defaults did, for reasons of speed. However, In our case, we sequence ITS2, but read 1 is obtained from the downstream/28S side and read2 is primed off the upstream / 5.8S side. This was done to try to get the best quality sequence from the slightly more information-rich side. I find that though itsxpress can pair our reads nicely, it fails to find any ITS sequences and extract them (though primers are on there as well). However, if I pair outside of itsxpress, then reverse-complement and put the paired reads through as single-end reads, it finds ITS2 subregion in majority of the reads nicely. Also, running directly in ITSx works fine, so I am left believing there is no orientation check in itsxpress, but cannot tell for sure. Is this correct?

Thanks.

fasta file being passed to vsearch

Hi, I'm using ITSxpress for the first time and I'm running into an error that states "FASTQ input is only allowed with the fastx_uniques command". I'm supplying a FASTQ file, but the sequences are written to a temporary FASTA file by vsearch before dereplication. I'm supplying one .fastq file with reads that have already been merged.

itsxpress --fastq 01_merged_set_1.fastq --single_end --region ITS2 --taxa Fungi --log 02_itsxpress.txt --outfile 01_merged_its2_set_1.fastq
1.0
2022-11-17 13:59:13,847: INFO     Verifying the input sequences.
2022-11-17 14:01:31,655: INFO     Sequences are assumed to be single-end.
2022-11-17 14:01:31,657: INFO     Temporary directory is: /tmp/itsxpress_rc5x9i21
2022-11-17 14:01:31,657: INFO     Unique sequences are being written to a temporary FASTA file with Vsearch.
2022-11-17 14:01:31,675: INFO     vsearch v2.22.1_linux_x86_64, 94.2GB RAM, 12 cores
https://github.com/torognes/vsearch



Fatal error: FASTQ input is only allowed with the fastx_uniques command

2022-11-17 14:01:31,675: ERROR    Could not perform dereplication with Vsearch. Error from Vsearch was:
 vsearch v2.22.1_linux_x86_64, 94.2GB RAM, 12 cores
https://github.com/torognes/vsearch



Fatal error: FASTQ input is only allowed with the fastx_uniques command
Traceback (most recent call last):
  File "/home/austenapigo/miniconda3/lib/python3.9/site-packages/itsxpress/main.py", line 512, in deduplicate
    p2.check_returncode()
  File "/home/austenapigo/miniconda3/lib/python3.9/subprocess.py", line 460, in check_returncode
    raise CalledProcessError(self.returncode, self.args, self.stdout,
subprocess.CalledProcessError: Command '['vsearch', '--derep_fulllength', '01_merged_set_1.fastq', '--output', '/tmp/itsxpress_rc5x9i21/rep.fa', '--uc', '/tmp/itsxpress_rc5x9i21/uc.txt', '--strand', 'both', '--threads', '1']' returned non-zero exit status 1.
2022-11-17 14:01:31,678: ERROR    ITSxpress terminated with errors. See the log file for details.
2022-11-17 14:01:31,679: ERROR    Command '['vsearch', '--derep_fulllength', '01_merged_set_1.fastq', '--output', '/tmp/itsxpress_rc5x9i21/rep.fa', '--uc', '/tmp/itsxpress_rc5x9i21/uc.txt', '--strand', 'both', '--threads', '1']' returned non-zero exit status 1.

Conda install not compatible with QIIME2 2019.10

Hi,
Right now it seems that ITSxpress (1.7.2 and 1.8.0) is not compatible with QIIME2 2019.10 when installing both through conda (https://data.qiime2.org/distro/core/qiime2-2019.10-py36-linux-conda.yml).
The incompatibility rises from HMMER version specification between ITSxpress and SEPP (qiime2 2019.10 dependancy):

UnsatisfiableError: The following specifications were found to be in conflict:
  - itsxpress=1.7.2 -> hmmer[version='>=3.1']
  - sepp=4.3.10 -> hmmer==3.1b2

In conda versioning, v3.1b is strictly inferior to v3.1 which leads to a conflict.
Would it be possible to lower ITSxpress bioconda specification of HMMER version to >=3.1b in order to make it compatible with QIIME2 latest version?

Thanks in advance,
Pierre

ITSxpress detecting 3 ITS2 loci, same file ITSx detects 200000 ITS2 loci

combined_seq_412.fastq.gz
/home/ubuntu/miniconda3/bin/itsxpress --fastq /home/ubuntu/combined_seq_412.fastq.gz --single_end --outfile /home/ubuntu/ITSxpress/combined_seq_ITS2_T_412.fastq.gz --region ITS2 --taxa Tracheophyta --cluster_id 1 --threads 10
produces this result:
combined_seq_ITS2_T_412.fastq.gz

but ITSx produces this result:
/home/ubuntu/ITSxpress/ITSx_1.1.2/ITSx -i /home/ubuntu/combined_seq_412.fasta -o /home/ubuntu/ITSxpress/412 --save_regions ITS2 --minlen 60 --not_found F --graphical F --cpu 28 --complement F -t T --reset T
ITSx412.tar.gz

q2-ITSxpress generates empty sequences?

First things first, sorry for cross-posting: this same question was posted on the QIIME2 forum. I will keep both posts updated, of course.

So, I am using vsearch to perform dereplication on some SampleData[JoinedSequencesWithQuality] that are obtained through ITSxpress. Problem is that vsearch is returning an error stating that some sequences are empty (Found blank or whitespace-only line before '+' in FASTQ file). I actually looked into the sequences in the .qza from ITSxpress and I can find a lot of unexpected stuff, i.e. empty sequences and super short sequences. A small example here:

@M01168:5:000000000-A7C98:1:1101:10151:9983 1:N:0:2
AACCC
+
JJJJJ
@M01168:5:000000000-A7C98:1:1101:10155:15804 2:N:0:2
CACCAATCAAGCCTGGCTTGGTATTGGGCGACGGGGTGCACCCGCGCCTCAATTTCTCCGGCTGAACGACCACTATCTCAGCGCTGTGATAATCAATTCGCTGTCGAGACGGGTGCTCACGCCGTTAAAGATTTTATACA
+
JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJGJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
@M01168:5:000000000-A7C98:1:2101:23914:6789 1:N:0:2

@M01168:5:000000000-A7C98:1:1101:10165:3002 1:N:0:2
TCAAC
+
JIJJJ
@M01168:5:000000000-A7C98:1:1101:10181:5305 1:N:0:2
TCAA
+
JJJJ

Now, I understand there was an issue with a previous version of ITSxpress back in 2018 but it looked like with newer releases of the plugin, it was solved. However, here I am. I also found this other post reporting same issue by vsearch but I believe it's not a vsearch issue but an ITSxpress-related issue.

I run the same data with the DADA2 plugin and no problem here (it was also suggested in the same post i already linked). However, I wanted to perform the analysis on the same samples using both OTU and ASV approach but currently I can't go through vsearch due to this issue. I am also considering the possibility that trimming ITS sequences and then dereplicating and clustering with vsearch shouldn't be done but then...why not?

It might be worth noting that I am using QIIME2 2021.8 with ITSxpress version 1.8.0 and q2-itsxpress version 1.8.0.

ITSxpress not working with reverse complimented reads

Hello!

I have some reverse compliment single end fungal ITS1 reads obtained via IonTorrent sequencing that I am trying to use ITSxpress on to extract the ITS1 region. When I run ITSxpress, a fastq with no text is produced. I tried incorporating --reversed_primers to account for this, but I get the same result, a fastq file with no text in it.

Does anyone have a solution for this?

clustering behavior of itsxpress

Hello,

in itsxpress it is possible to cluster using the --cluster_id parameter, and if I'm not mistaken, it uses vsearch for this.

However, in the output files, I cannot seem to find the number of reads that were inside of a cluster? You will lose the majority of reads in this way? Also, are the centroids or consensus sequences in the output?

Will clustering occur if --cluster_id is not stated? In the itsxpress --help output, there is no default value specified.

Perhaps I'm missing something very obvious.

Thanks!

Q2-ITSxpress: File too small?

Hi all,

I am using the Qiime2 plugin of ITSxpress to do some fungal seq analysis. I've used this plugin previously without any issue. I am running Qiime2-2021.8 and ITSxpress v1.8.0 (both Conda and pip installation methods). I first imported my demultiplexed data then proceeded directly to ITSxpress with the command:

qiime itsxpress trim-pair-output-unmerged \ --i-per-sample-sequences 1_0_input_seqs.qza \ --p-region ITS1 \ --p-taxa F \ --p-threads 8 \ --o-trimmed 1_1_pair_trimmed_itsxpress.qza \ --verbose

It runs for a while, but gives me the error:

Plugin error from itsxpress:

  Command '['vsearch', '--cluster_size', '/var/folders/q4/1wyb8l014g5bzmrvwg77plz00000gp/T/itsxpress_s74av6rt/seq.fq.gz', '--centroids', '/var/folders/q4/1wyb8l014g5bzmrvwg77plz00000gp/T/itsxpress_s74av6rt/rep.fa', '--uc', '/var/folders/q4/1wyb8l014g5bzmrvwg77plz00000gp/T/itsxpress_s74av6rt/uc.txt', '--strand', 'both', '--id', '0.995', '--threads', '8']' returned non-zero exit status 1.

See above for debug info.

For more debug info:


...
...
Reading file /var/folders/q4/1wyb8l014g5bzmrvwg77plz00000gp/T/itsxpress_zk_pqxzy/seq.fq.gz 100%
9641893 nt in 35445 seqs, min 68, max 491, avg 272
Masking 100%
Sorting by abundance 100%
Counting k-mers 100%
Clustering 100%
Sorting clusters 100%
Writing clusters 100%
Clusters: 6883 Size min 1, max 14632, avg 5.1
Singletons: 4514, 12.7% of seqs, 65.6% of clusters

vsearch v2.7.0_macos_x86_64, 16.0GB RAM, 8 cores
https://github.com/torognes/vsearch



Fatal error: File too small

ERROR:root:Could not perform clustering with Vsearch. Error from Vsearch was:
 vsearch v2.7.0_macos_x86_64, 16.0GB RAM, 8 cores
https://github.com/torognes/vsearch



Fatal error: File too small
Traceback (most recent call last):
  File "/Users/stricba1/opt/anaconda3/envs/qiime2-2021.8/lib/python3.8/site-packages/itsxpress/main.py", line 540, in cluster
    p2.check_returncode()
  File "/Users/stricba1/opt/anaconda3/envs/qiime2-2021.8/lib/python3.8/subprocess.py", line 448, in check_returncode
    raise CalledProcessError(self.returncode, self.args, self.stdout,
subprocess.CalledProcessError: Command '['vsearch', '--cluster_size', '/var/folders/q4/1wyb8l014g5bzmrvwg77plz00000gp/T/itsxpress_s74av6rt/seq.fq.gz', '--centroids', '/var/folders/q4/1wyb8l014g5bzmrvwg77plz00000gp/T/itsxpress_s74av6rt/rep.fa', '--uc', '/var/folders/q4/1wyb8l014g5bzmrvwg77plz00000gp/T/itsxpress_s74av6rt/uc.txt', '--strand', 'both', '--id', '0.995', '--threads', '8']' returned non-zero exit status 1.
Traceback (most recent call last):
  File "/Users/stricba1/opt/anaconda3/envs/qiime2-2021.8/lib/python3.8/site-packages/q2cli/commands.py", line 329, in __call__
    results = action(**arguments)
  File "<decorator-gen-287>", line 2, in trim_pair_output_unmerged
  File "/Users/stricba1/opt/anaconda3/envs/qiime2-2021.8/lib/python3.8/site-packages/qiime2/sdk/action.py", line 245, in bound_callable
    outputs = self._callable_executor_(scope, callable_args,
  File "/Users/stricba1/opt/anaconda3/envs/qiime2-2021.8/lib/python3.8/site-packages/qiime2/sdk/action.py", line 391, in _callable_executor_
    output_views = self._callable(**view_args)
  File "/Users/stricba1/opt/anaconda3/envs/qiime2-2021.8/lib/python3.8/site-packages/q2_itsxpress/_itsxpress.py", line 151, in trim_pair_output_unmerged
    results = main(per_sample_sequences=per_sample_sequences,
  File "/Users/stricba1/opt/anaconda3/envs/qiime2-2021.8/lib/python3.8/site-packages/q2_itsxpress/_itsxpress.py", line 208, in main
    sobj.cluster(threads=threads, cluster_id=cluster_id)
  File "/Users/stricba1/opt/anaconda3/envs/qiime2-2021.8/lib/python3.8/site-packages/itsxpress/main.py", line 543, in cluster
    raise e
  File "/Users/stricba1/opt/anaconda3/envs/qiime2-2021.8/lib/python3.8/site-packages/itsxpress/main.py", line 540, in cluster
    p2.check_returncode()
  File "/Users/stricba1/opt/anaconda3/envs/qiime2-2021.8/lib/python3.8/subprocess.py", line 448, in check_returncode
    raise CalledProcessError(self.returncode, self.args, self.stdout,
subprocess.CalledProcessError: Command '['vsearch', '--cluster_size', '/var/folders/q4/1wyb8l014g5bzmrvwg77plz00000gp/T/itsxpress_s74av6rt/seq.fq.gz', '--centroids', '/var/folders/q4/1wyb8l014g5bzmrvwg77plz00000gp/T/itsxpress_s74av6rt/rep.fa', '--uc', '/var/folders/q4/1wyb8l014g5bzmrvwg77plz00000gp/T/itsxpress_s74av6rt/uc.txt', '--strand', 'both', '--id', '0.995', '--threads', '8']' returned non-zero exit status 1.

I've seen a couple of issues relating to this bug, and I've initiated a pre-import sequence cutoff of 200, 500, 1000, 5000 and still getting the same error. I must have a bad file somewhere that I cannot find. The new version of Vsearch now flags this issue as a warning, not something that will kill the program. (see torognes/vsearch#366)

Again, I've seen this issue come up a few times, and I cannot seem to find a solution. Any help would be appreciated!

A question

Does ITSxpress trim paired-end reads that can't be merged because of lack of overlap?

Running ITSxpress in parallel

Hi

Are you able to run ITSxpress in parallel? I can't figure this one out. The best I could do are the two codes below, but ITSxpress doesn't recognise the comand.

parallel -j 4 'itsxpress \
	--fastq cutadapt/*.fq.gz \
	--single_end \
	--region ALL \
	--taxa Fungi \
	--log itsxpress/logfile/{/.}_ITS-ALL.txt \
	--outfile itsxpress/{/.}_ITS-ALL.fq.gz {}' \
	::: cutadapt/*.fq.gz

itsxpress: error: unrecognized arguments: cutadapt/S063.fq.gz cutadapt/S064.fq.gz cutadapt/S066.fq.gz ...and so on.

parallel -j 4 'itsxpress \
	--fastq .fq.gz \
	--single_end \
	--region ALL \
	--taxa Fungi \
	--log itsxpress/logfile/{/.}_ITS-ALL.txt \
	--outfile itsxpress/{/.}_ITS-ALL.fq.gz {}' \
	::: --fastq cutadapt/*.fq.gz

itsxpress: error: unrecognized arguments: cutadapt/S063.fq.gz...and so on.

Cheers
Luke

ITS produces empty files after merging - No ITS start or stop sites detected

Hey all, I am having trouble getting ITSexpress to work on my files. The sequences were amplified using ITS4Fun and 5.8S primers to capture the ITS2 region.

The merging step seems to be working okay because I am generating data on the %merged reads and read lengths and what not, however my sequences seem to be erroring out and I am getting a message that says no ITS start or stop sites detected. This error line repeats for many many lines.

Any insight would be greatly appreciated!

Here is the code I am using:

`conda activate trim_3p

INDIR=/mnt/home/ernakovich/srs1085/DATA/Rhizo_pilot/ITS_dada2/02_filter/preprocessed_F
INDIR2=/mnt/home/ernakovich/srs1085/DATA/Rhizo_pilot/ITS_dada2/02_filter/preprocessed_R
OUTDIR=ITSxpress_f
OUTDIR2=ITSexpress_r
mkdir $OUTDIR
mkdir $OUTDIR2

for i in $INDIR/R1
do(
FILE=${i##/}
BEFFILE=${FILE%R1
}
AFTFILE=${FILE##*R1}
R1=$FILE
R2=${BEFFILE}R2${AFTFILE}
echo $R1
if [ -f $OUTDIR2/$R2 ]
then
continue
fi

srun ~/.local/bin/itsxpress \
--fastq $INDIR/$R1 --fastq2 $INDIR2/$R2 \
--outfile $OUTDIR/$R1 --outfile2 $OUTDIR2/$R2 \
--region ITS2 --taxa 'Fungi' --cluster_id 1 \
--reversed_primers \
--threads 16 \
--log itsxpress.log

)
done
`

Trimming primers before running ITS xpress

Hi

I am working on some single-ended PacBio data. The following to things are unclear to me:

  1. Is it necessary to trim the primers from the circular consensus reads before running ITS xpress
  2. Should I check if reads classified as not having the full region contains either ITS1 or ITS2

Thanks a ton for the help (in advance)!

Write SSU/LSU regions

The original ITSx has the option to also output the flanking regions. I use long amplicons that include ITS and also a big piece of LSU, which I use for different purposes, so this would be very useful to me.

itsxpress produces empty files after merging

Hey guys,

thanks for making this plugin. I'm̀ currently trying to get this plugin to work for the first time with my first fungal data set. I have demultiplexed my dataset and tried itsxpress both in qiime2 and as standalone and both in paired and single read mode but I always get empty read files even though I am getting sequences while the program runs:

e.g. when working with single end reads
7964852 nt in 32803 seqs, min 65, max 246, avg 243
minseqlength 32: 3 sequences discarded.
Sorting 100%
16376 unique sequences, avg cluster 2.0, median 1, max 3949
Writing output file 100%
Writing uc file, first part 100%
Writing uc file, second part 100%

and with paired-end reads:
Writing mergable reads merged.
Started output threads.
Total time: 1.139 seconds.

Pairs: 42110
Joined: 27279 64.780%
Ambiguous: 4403 10.456%
No Solution: 10428 24.764%
Too Short: 0 0.000%

Avg Insert: 278.6
Standard Deviation: 31.9
Mode: 266

Insert range: 73 - 458
90th percentile: 321
75th percentile: 285
50th percentile: 267
25th percentile: 265
10th percentile: 258

I tried both the new version 1.8.0 and 1.7.2 and both produces the same empty read files:

my commands are as followed in the standalone version:
###paired reads
for i in L001_R.fastq.gz ;do
varID=$(echo $i | grep -o -a "^S[[:digit:]]*" )
echo "processing $varID \n"
itsxpress --fastq "$varID"_"$varID"L001_R1_001.fastq.gz --fastq2 "$varID""$varID"_L001_R2_001.fastq.gz --region ALL
--taxa Fungi --log logfile.txt --outfile itsxpress/"$varID"_trimmed_reads.fastq.gz --threads 10
done

###single reads
for i in L001_R.fastq.gz ;do
varID=$(echo $i | grep -o -a "^S[[:digit:]]*" )
echo "processing $varID \n"
itsxpress --fastq "$varID"_"$varID"_L001_R1_001.fastq.gz --single_end --region ALL
--taxa Fungi --log logfile.txt --outfile itsxpress/"$varID"_trimmed_reads.fastq.gz --threads 10
done

Any help? :) Thanks

Cheers, Max

BBMerge fails to merge reads

Hey,
I'm trimming each of my samples separately because of the issue with the empty sequences.
By doing so, I could see the output of BBMerge, which turns out to perform badly compared to PEAR:

Pairs:                  26220
Joined:                 11798           44.996%
Ambiguous:              11712           44.668%
No Solution:            2710            10.336%
Too Short:              0               0.000%

vs

Assembled reads ...................: 25,490 / 26,220 (97.216%)
Discarded reads ...................: 0 / 26,220 (0.000%)
Not assembled reads ...............: 730 / 26,220 (2.784%)

I'm not familiar with BBMerge or it's parameters.

  1. Is it OK to pre-merge the reads with PEAR and then pass them to itsxpress > qiime2 (DADA2) as single-end?
  2. Does it indicate there's an issue with the samples? If not, maybe notify the user that reads were not paired properly (such report doesn't exist in qiime, as far as I know).

UPDATE:
After merging with PEAR, ~95% of the merged reads were successfully trimmed by itsxrpess.

Thanks,
Omer

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.