opengene / afterqc Goto Github PK

Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data

License: MIT License

Python 93.33% Makefile 0.10% C++ 5.72% C 0.85%

quality-control fastq ngs sequencing bioinformatics overlap trimming filtering error qc adapter-trimming

afterqc's Introduction

AfterQC

Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data
AfterQC can simply go through all fastq files in a folder and then output three folders: good, bad and QC folders, which contains good reads, bad reads and the QC results of each fastq file/pair.
Currently it supports processing data from HiSeq 2000/2500/3000/4000, Nextseq 500/550, MiniSeq...and other Illumina 1.8 or newer formats

The author has reimplemented this tool in C++ with multithreading support to make it much faster. The new tool is called fastp and can be found at: https://github.com/OpenGene/fastp . If you prefer a C++ based tool, please use fastp instead.

An Example of Report

The report of AfterQC is a single HTML page with figures contained in. See an example: http://opengene.org/AfterQC/report.html

Features:

AfterQC does following tasks automatically:

Filters reads with too low quality, too short length or too many N
Filters reads with abnormal PolyA/PolyT/PolyC/PolyG sequences
Does per-base quality control and plots the figures
Trims reads at front and tail, according to QC results
For pair-end sequencing data, AfterQC automatically corrects low quality wrong bases in overlapped area of read1/read2
Detects and eliminates bubble artifact caused by sequencer due to fluid dynamics issues
Single molecule barcode sequencing support: if all reads have a single molecule barcode (see duplex sequencing), AfterQC shifts the barcodes from the reads to the fastq query names
Support both single-end sequencing and pair-end sequencing data
Automatic adapter cutting for pair-end sequencing data
Sequencing error estimation, and error distribution profiling

Get AfterQC

with bioconda conda install afterqc
latest: git clone https://github.com/OpenGene/AfterQC.git or download https://github.com/OpenGene/AfterQC/archive/master.zip
stable: Releases

PyPy suggestion:

AfterQC is compitable with PyPy. Using PyPy to run AfterQC is strongly suggested since it can make AfterQC 3X faster than native Python (CPython). To run with pypy, just replace python with pypy in the commands.

Simple usage:

Prepare your fastq files in a folder
For single-end sequencing, the filenames in the folder should be *R1*, otherwise you should specify --read1_flag
For pair-end sequencing, the filenames in the folder should be *R1* and *R2*, otherwise you should specify --read1_flag and --read2_flag

cd /path/to/fastq/folder
python path/to/AfterQC/after.py

three folders will be automatically generated, a folder good stores the good reads, a folder bad stores the bad reads and a folder QC stores the report of quality control
AfterQC will print some statistical information after it is done, such how many good reads, how many bad reads, and how many reads are corrected.
if you want to run AfterQC only with a single file/pair:

# with a single file
python after.py -1 R1.fq

# with a single pair
python after.py -1 R1.fq -2 R2.fq

Quality Control only

If you only want to get quality control statistics, run:

python after.py --qc_only

Gzip output

If the input FastQ files are gzipped, then the output will be also gzipped.
If the input FastQ files are not gzipped, you can enable --gzip or -z option to force gzip compression.
Use --compression to change the compression level (0~9), default is 2. The better the compression, the lower the speed.

Full options:

Common options

  --version             show program's version number and exit
  -h, --help            show this help message and exit

File (name) options


  -1 READ1_FILE, --read1_file=READ1_FILE
                        file name of read1, required. If input_dir is
                        specified, then this arg is ignored.
  -2 READ2_FILE, --read2_file=READ2_FILE
                        file name of read2, if paired. If input_dir is
                        specified, then this arg is ignored.
  -7 INDEX1_FILE, --index1_file=INDEX1_FILE
                        file name of 7' index. If input_dir is specified, then
                        this arg is ignored.
  -5 INDEX2_FILE, --index2_file=INDEX2_FILE
                        file name of 5' index. If input_dir is specified, then
                        this arg is ignored.
  -d INPUT_DIR, --input_dir=INPUT_DIR
                        the input dir to process automatically. If read1_file
                        are input_dir are not specified, then current dir (.)
                        is specified to input_dir
  -g GOOD_OUTPUT_FOLDER, --good_output_folder=GOOD_OUTPUT_FOLDER
                        the folder to store good reads, by default it is the
                        same folder contains read1
  -b BAD_OUTPUT_FOLDER, --bad_output_folder=BAD_OUTPUT_FOLDER
                        the folder to store bad reads, by default it is same
                        as good_output_folder
  --read1_flag=READ1_FLAG
                        specify the name flag of read1, default is R1, which
                        means a file with name *R1* is read1 file
  --read2_flag=READ2_FLAG
                        specify the name flag of read2, default is R2, which
                        means a file with name *R2* is read2 file
  --index1_flag=INDEX1_FLAG
                        specify the name flag of index1, default is I1,
                        which means a file with name *I1* is index2 file
  --index2_flag=INDEX2_FLAG
                        specify the name flag of index2, default is I2,
                        which means a file with name *I2* is index2 file

Filter options

  -f TRIM_FRONT, --trim_front=TRIM_FRONT
                        number of bases to be trimmed in the head of read. -1
                        means auto detect
  -t TRIM_TAIL, --trim_tail=TRIM_TAIL
                        number of bases to be trimmed in the tail of read. -1
                        means auto detect
  --trim_pair_same=TRIM_PAIR_SAME
                        use same trimming configuration for read1 and read2 to
                        keep their sequence length identical, default is true
                        lots of dedup algorithms require this feature
  -q QUALIFIED_QUALITY_PHRED, --qualified_quality_phred=QUALIFIED_QUALITY_PHRED
                        the quality value that a base is qualifyed. Default 20
                        means base quality >=Q20 is qualified.
  -u UNQUALIFIED_BASE_LIMIT, --unqualified_base_limit=UNQUALIFIED_BASE_LIMIT
                        if exists more than unqualified_base_limit bases that
                        quality is lower than qualified quality, then this
                        read/pair is bad. Default 0 means do not filter reads
                        by low quality base count
  -p POLY_SIZE_LIMIT, --poly_size_limit=POLY_SIZE_LIMIT
                        if exists one polyX(polyG means GGGGGGGGG...), and its
                        length is >= poly_size_limit, then this read/pair is
                        bad. Default is 35
  -a ALLOW_MISMATCH_IN_POLY, --allow_mismatch_in_poly=ALLOW_MISMATCH_IN_POLY
                        the count of allowed mismatches when evaluating
                        poly_X. Default 5 means disallow any mismatches
  -n N_BASE_LIMIT, --n_base_limit=N_BASE_LIMIT
                        if exists more than maxn bases have N, then this
                        read/pair is bad. Default is 5
  -s SEQ_LEN_REQ, --seq_len_req=SEQ_LEN_REQ
                        if the trimmed read is shorter than seq_len_req, then
                        this read/pair is bad. Default is 35

Debubble options (not suggested for regular tasks)
If you want to eliminate bubble artifact, turn debubble option on (this is slow, usually you don't need to do this):

  --debubble            enable debubble algorithm to remove the
                        reads in the bubbles. Default is False
  --debubble_dir=DEBUBBLE_DIR
                        specify the folder to store output of debubble
                        algorithm, default is debubble
  --draw=DRAW           specify whether draw the pictures or not, when use
                        debubble or QC. Default is on

Barcoded sequencing options

  --barcode=BARCODE     specify whether deal with barcode sequencing files, default is on
  --barcode_length=BARCODE_LENGTH
                        specify the designed length of barcode
  --barcode_flag=BARCODE_FLAG
                        specify the name flag of a barcoded file, default is
                        barcode, which means a file with name *barcode* is a
                        barcoded file
  --barcode=BARCODE     specify whether deal with barcode sequencing files,
                        default is on, which means all files with barcode_flag
                        in filename will be treated as barcode sequencing
                        files

QC options

  --qc_only             enable this option, only QC result will be output, this
                        can be much faster
  --qc_sample=QC_SAMPLE
                        sample up to qc_sample when do QC, default is 1000,000
  --qc_kmer=QC_KMER     specify the kmer length for KMER statistics for QC,
                        default is 8

Understand the report

AfterQC will generate a QC folder, which contains lots of figures.
For pair-end sequencing data, both read1 and read2 figures will be in the same folder with the folder name of read1's filename. R1 means read1, R2 means read2.
For single-end sequencing data, it will still have R1.
prefilter means before filtering, postfilter means after filtering
For pair-end sequencing data, After will do an overlap analysis. read1 and read2 will be overlapped when read1_length + read2_length > DNA_template_length.

Cite AfterQC

Shifu Chen, Tanxiao Huang, Yanqing Zhou, Yue Han, Mingyan Xu and Jia Gu. AfterQC: automatic filtering, trimming, error removing and quality control for fastq data. BMC Bioinformatics 2017 18(Suppl 3):80 https://doi.org/10.1186/s12859-017-1469-3

afterqc's People

Contributors

Stargazers

Watchers

afterqc's Issues

question about AfterQC/preprocesser.py

Hi , in preprocesser.py line 498 , your code is :
if lowQual1 > self.options.unqualified_base_limit or lowQual1 > self.options.unqualified_base_limit:
I suppose the code should be :
if lowQual1 > self.options.unqualified_base_limit or lowQual2 > self.options.unqualified_base_limit:

Afterqc with pypy

Hello,
I have a 12 paired end files(50gb~). For each file it takes me 14hrs approx to run with native python. I read that with pypy command it runs 3 times faster. But when i edit my script with pypy command it returs error saying there is no such command. Am i need to download somwthing to use it with pypy command?

Remove overrepresented sequences

It may be a good idea to implement and option to remove specific sequences like overrepresented sequences, usually being rRNA or PCR artefacts; well passing these sequences in a fasta or identifying them by afterqc-self

Tool to keep reads where all bases are above a specific quality score.

Hi. First I just discovered and tested your tool a few days ago. If my request has no sense or linked to something I didn't understand well, feel free to remove my message.

I would like to filter reads of FASTQ files to keep only very high quality sequences : reads where ALL bases are above Q30 by example. Two filter options seem to be appropriate :
-q QUALIFIED_QUALITY_PHRED --> set to "-q 30"
-u UNQUALIFIED_BASE_LIMIT --> would be logic to set "-u 0" to be sure to remove all reads where at least one base is under Q30. But sadly if you set "-u 0" in fact by default it do not filter reads by low quality base count at all (see -u UNQUALIFIED_BASE_LIMIT info). So I guess it is impossible to filter "perfect quality reads" (reads with no bases lower a specific quality). Default option to deactivate UNQUALIFIED_BASE_LIMIT should not be something other than 0?

Thanks a lot,
Max

Pack (gzip) output good and bad reads

I think it would be great to pack the output reads (good & bad)!

String index out of range.

Hi!

I managed to run AfterQC succesfully the first time I used it with one of my libraries with the following command line:

pypy /home/pop_manuel/proyecto_transcriptoma_rhizophagus/software/AfterQC-master/after.py

I tried to use the exact same command to run AfterQC on my second pair of libraries (from/in a different directory wheres these libraries are) and this error appeared:

specify current dir as input dir
l4i2-unpaired_R1.fastq
l4i2-trimmo_R1.fastq
Process Process-1:
Traceback (most recent call last):
File "/usr/lib/pypy/lib-python/2.7/multiprocessing/process.py", line 267, in _bootstrap
self.run()
File "/usr/lib/pypy/lib-python/2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/home/pop_manuel/proyecto_transcriptoma_rhizophagus/software/AfterQC-master/after.py", line 175, in processOptions
filter.run()
File "/home/pop_manuel/proyecto_transcriptoma_rhizophagus/software/AfterQC-master/preprocesser.py", line 249, in run
self.r1qc_prefilter.statFile(self.options.read1_file)
File "/home/pop_manuel/proyecto_transcriptoma_rhizophagus/software/AfterQC-master/qualitycontrol.py", line 350, in statFile
self.statRead(read)
File "/home/pop_manuel/proyecto_transcriptoma_rhizophagus/software/AfterQC-master/qualitycontrol.py", line 107, in statRead
if seq[j] != seq[j+1]:
IndexError: string index out of range
Process Process-2:
Traceback (most recent call last):
File "/usr/lib/pypy/lib-python/2.7/multiprocessing/process.py", line 267, in _bootstrap
self.run()
File "/usr/lib/pypy/lib-python/2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/home/pop_manuel/proyecto_transcriptoma_rhizophagus/software/AfterQC-master/after.py", line 175, in processOptions
filter.run()
File "/home/pop_manuel/proyecto_transcriptoma_rhizophagus/software/AfterQC-master/preprocesser.py", line 249, in run
self.r1qc_prefilter.statFile(self.options.read1_file)
File "/home/pop_manuel/proyecto_transcriptoma_rhizophagus/software/AfterQC-master/qualitycontrol.py", line 350, in statFile
self.statRead(read)
File "/home/pop_manuel/proyecto_transcriptoma_rhizophagus/software/AfterQC-master/qualitycontrol.py", line 107, in statRead
if seq[j] != seq[j+1]:
IndexError: string index out of range
Time used: 0.0947189331055

I tried executing the program with pypy and python2, but it returns the same error.

Thank you for your help!

Issue with overlap analysis

Hi,

I'm working on SRA data (SRR4292097). I get the following error
after.py specify current dir as input dir SRR4292097_R1.fastq.gz ./SRR4292097_R1.fastq.gz options: {'read1_file': './SRR4292097_R1.fastq.gz', 'read2_file': './SRR4292097_R2.fastq.gz', 'index1_file': None, 'index2_file': None, 'input_dir': '.', 'good_output_folder': 'good', 'bad_output_folder': None, 'report_output_folder': None, 'read1_flag': 'R1', 'read2_flag': 'R2', 'index1_flag': 'I1', 'index2_flag': 'I2', 'trim_front': 8, 'trim_tail': 0, 'trim_pair_same': True, 'qualified_quality_phred': 15, 'unqualified_base_limit': 60, 'poly_size_limit': 35, 'allow_mismatch_in_poly': 2, 'n_base_limit': 5, 'seq_len_req': 35, 'debubble': False, 'debubble_dir': 'debubble', 'draw': True, 'barcode': False, 'barcode_length': 12, 'barcode_flag': 'barcode', 'barcode_verify': 'CAGTA', 'store_overlap': False, 'overlap_output_folder': None, 'qc_only': False, 'qc_sample': 200000, 'qc_kmer': 8, 'no_correction': False, 'mask_mismatch': False, 'no_overlap': False, 'version': '0.9.6', 'trim_front2': 8, 'trim_tail2': 0} Process Process-1: Traceback (most recent call last): File "/home/XX/Softs/miniconda3/lib-python/2.7/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/home/XX/Softs/miniconda3/lib-python/2.7/multiprocessing/process.py", line 114, in run self._target(*self._args, **self._kwargs) File "/home/XX/Softs/miniconda3/envs/py27/bin/after.py", line 171, in processOptions filter.run() File "/home/XX/Softs/miniconda3/envs/py27/share/afterqc-0.9.6-0/preprocesser.py", line 512, in run overlap_histgram[overlap_len] += 1 IndexError: list index out of range Time used: 16.7429320812

Everything is working fine with the option --no_overlap.

Thanks,
Maxime

Parallel mode for one pair of reads

What do you think about some kind of parallel mode for processing a single pair of PE reads (-1 and -2 options) when good reads are to be generated?

Report

Hi,

I have two suggestions.

Could you implement an option to put out the graphs separately in a folder? Would make it possible to include them in an automated report concerning my entire NGS pipeline.
The order is all graphs prior filtering, then all graphs after filtering. Could you make them side by side? Forward prior next to forward after? That way you could immediately compare prior and after filtering, instead of having to scroll all the time.

Thanks
Anselm

Packaging: make available on Pypi/conda

Hi,

Would it be possible to make this package available in Pypi and/or conda? This would greatly improve the accessibility of your tool.
We're interested in integrating AfterQC in our QC pipeline, but this depends on the package being available from Pypi or Conda.

Thanks!
M

Specify output folder name

Is it possible to make option for specifying output folder name with the report files rather than using input files names?

Yours faithfully,
Katerina

default good folder

Hello!
Due to default = "good" in line 27, the description in line 28 contains a mistake. If option -g is not set by user, then the good folder will be created in the current dir, but not in the dir of read1. And it's ok! Could you change the description in line 28, please?
Also lines 287 and 288 of the same file are useless.

TODO: change trimming function to make trimmed read1/read2 have identical length

This is to make some mark duplication tool work well.

Wrong (low quality) bases in overlapped regions failed to be corrected

AfterQC has helped me to improve the quality of my data. Thanks for creating it.

I did notice, though, that some wrong (low quality) bases in overlapped regions failed to be corrected and I am not sure why.

Could you perhaps have a look, below one read as an example.
---------------- test_R1.fastq
@NS500261:202:HKMTKAFXX:1:21112:5370:11983 1:N:0:ATCTAGCCGGCC
CTGCGCCTGGTTGGGCATCGCTCCGCTAGGTGTCAGCGGCTCCACCAGCTGGGGTGAGGGGGTGGTGGGTCAGTGCTGGGGGCCGGTGCAGACCCCACGCGGGCTGGGAGGACTTCACCCCGCCTCACCTCCGTTTCCTGCAGATCGGAAG
+
AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEAEEAEEEEEEEEEEEEEEEEEE/EEEE/EEEEEEEEEEEAEE/AA<AAEEAAAAAEE<EAAA<A<6AAAAE<<AEAAAAAEE
---------------- test_R2.fastq
@NS500261:202:HKMTKAFXX:1:21112:5370:11983 2:N:0:ATCTAGCCGGCC
GCAGGAAACGGAGGTGAGGCGGGGTGAAGTCCTCCCAGCCCGCGTGGGGTCTGCACCGGCCCCCAGCCCTGACCCACCACCCCCTCACCCCAGCTGGTGGAGCCGCGGCCCCCTAGCGGCGCGATGCCCAACCAGGCGCAGAGCTC
+
AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEAEEEEEEE/AE/E/E/EEE/EE/EEEEE/EAE6EE/EE//E/E//EEE/A/A/E/AE/AAE/E/EE</<AAEE//AE//<E6E////6<<

ValueError: max() arg is an empty sequence

Hi：when I run:

python after.py --qc_only -1 ../fastq1/SRR1294494_1.fastq.gz -2 ../fastq1/SRR1294494_2.fastq.gz
../fastq1/SRR1294494_1.fastq.gz options:

{'qc_only': True, 'version': '0.9.6', 'seq_len_req': 35, 'index1_file': None, 'trim_tail': 1, 'report_output_folder': None, 'trim_pair_same': True, 'no_correction': False, 'debubble_dir': 'debubble', 'barcode_flag': 'barcode', 'read2_file': '../fastq1/SRR1294494_2.fastq.gz', 'barcode_length': 12, 'trim_tail2': 1, 'unqualified_base_limit': 60, 'allow_mismatch_in_poly': 2, 'read2_flag': 'R2', 'store_overlap': False, 'debubble': False, 'read1_flag': 'R1', 'index2_flag': 'I2', 'draw': True, 'index1_flag': 'I1', 'mask_mismatch': False, 'barcode': False, 'overlap_output_folder': None, 'barcode_verify': 'CAGTA', 'index2_file': None, 'qualified_quality_phred': 15, 'trim_front': 2, 'good_output_folder': 'good', 'poly_size_limit': 35, 'n_base_limit': 5, 'qc_sample': 200000, 'trim_front2': 2, 'no_overlap': False, 'input_dir': None, 'read1_file': '../fastq1/SRR1294494_1.fastq.gz', 'qc_kmer': 8, 'bad_output_folder': None}

it has error

Traceback (most recent call last):
File "after.py", line 224, in
main()
File "after.py", line 218, in main
processOptions(options)
File "after.py", line 171, in processOptions
filter.run()
File "/disk/zhw/cross_talk/GSE57872/fastq/AfterQC-master/preprocesser.py", line 768, in run
self.addFiguresToReport(reporter)
File "/disk/zhw/cross_talk/GSE57872/fastq/AfterQC-master/preprocesser.py", line 783, in addFiguresToReport
reporter.addFigure('Read1 per base discontinuity after filtering', self.r1qc_postfilter.discontinuityPlotly("r1_post_discontinuity", 'Read1 discontinuity curve after filtering'), 'r1_post_discontinuity', "")
File "/disk/zhw/cross_talk/GSE57872/fastq/AfterQC-master/qualitycontrol.py", line 234, in discontinuityPlotly
json_str += "var layout={title:'" + title + "', xaxis:{title:'cycles'}, yaxis:{title:'discontinuity', range:" + makeRange(0.0, max(self.meanDiscontinuity)*1.5) + "}};\n"

ValueError: max() arg is an empty sequence

Float division by zero in circledetector.py

Hi, Shifu

I and my colleague (@yodeng) encountered this issue recently.

This issue may be caused by the reassignment of empty list to self.records at line 19 in circledetector.py. The initial assignment
is at line 14.

When we commented out line 19, the division by zero error disappeared and we got the circles data.

Best,
Richard

output files are truncated

Hello,

I'm processing some files that were output by afterQC, but I'm double checking with FastQC, and it looks like the output from AfterQC is truncated:

Failed to process file SRR5335803_1.bz2.good.fq.bz2
uk.ac.babraham.FastQC.Sequence.SequenceFormatException: Ran out of data in the middle of a fastq entry.  Your file is probably truncated

perhaps afterQC isn't compatible with reading bz2 format?

-Dave

Python 2 is a requirement?

I installed this in a venv with Python3 and got:

  Traceback (most recent call last):
  File "/config/binaries/afterqc/0.9.2/afterqc/after.py", line 7, in <module>
    import preprocesser
  File "/config/binaries/afterqc/0.9.2/afterqc/preprocesser.py", line 8, in <module>
    import util
  File "/config/binaries/afterqc/0.9.2/afterqc/util.py", line 167
    print overlap(r1, r2)

which I presume is a python3 thing.

You should note that python2 is a requirement in the install notes.

AfterQC total bases calculation

AfterQC is constantly counting wrong the total bases in paired end Miseq fastq reads. For example two paired end files with total bases 611,060,153 (as calculated by various programs) it seems to have only 582,146,000 total bases in AfterQC.
I was wondering why this difference exists, and if it affects the downstream filtering process.

AterQc

Please somebody guide me install AfterQC. I have downloaded the zip folder and extracted the contents but can not find any executable file. I have python 3.7.

AfterQC in FASTQ joined

I would like to ask a question. Can I parse paired-end files after the process joined by AfterQC?

filter only for poly-X but nothing else

What would be the command-line to run AfterQC in order to filter only for poly-X reads but nothing else?

Too slow with python implementation

Should try rust or julia

Read length distributon after processing

Hello, I have using afterqc with default settings to remove adapter sequences from RNA Seq read files. My original files have all the reads of uniform length. After processing with afterqc the sequence lengths are not uniform. Therefore, I have two questions 1) Can I somehow make all the reads of uniform length? 2) What implications are we talking if the length in not uniform down the analysis (Original read length for all reads was 151 bases, after afterqc read length is between 35-142).

support bzip2 format

Need to support bzip2 format since it has a better compression ratio

output gzipped data

If the input is gzipped data, the output should be automatically gzipped. Otherwise users would encounter large files

My bioconda install Python version

Hi my bioconda install is reporting this needs Py <3.0. I can set up a Py 2.7 environment for just this of course, but wondered if there would be a Py3.6+ version or if I am being silly in some way?
Kind Regards, sounds a fabulous tool

Deafult multiprocessing behavior

Do you create as many jobs with python multiprocessing as there are input files? I ran it with default parameters and it seemed to occupy all 32 cores. Is it possible to limit the number of jobs created, cause it is not always possible to use all the resources when computing on shared server.

Float division by zero in circledetector.py

I've encountered following error:

finished polyX stat for all files
write records to poly_X.csv
process records by tile
Traceback (most recent call last):
  File "AfterQC/after.py", line 221, in <module>
    main()
  File "AfterQC/after.py", line 205, in main
    runDebubble(options)
  File "AfterQC/after.py", line 180, in runDebubble
    debubble.debubbleDir(options.input_dir, 20, options.debubble_dir, options.draw)
  File "AfterQC/debubble.py", line 47, in debubbleDir
    circles = bp.run()
  File "AfterQC/bubbleprocesser.py", line 74, in run
    self.processByTile()
  File "AfterQC/bubbleprocesser.py", line 161, in processByTile
    self.detectBubbleForTile(tileRecords, lastTileWithBothSurface, laneOfLastTile)
  File "AfterQC/bubbleprocesser.py", line 142, in detectBubbleForTile
    c = bd.detect()
  File "AfterQC/bubbledetector.py", line 60, in detect
    self.detectCircles()
  File "AfterQC/bubbledetector.py", line 284, in detectCircles
    labelCircles = cd.detect()
  File "AfterQC/circledetector.py", line 25, in detect
    if self.isInCorner():
  File "AfterQC/circledetector.py", line 102, in isInCorner
    if float(cornerCount) / len(self.records) > 0.1:
ZeroDivisionError: float division by zero

multithreading

Is there a way to limit the number of threads that the program uses? It looks like it is running one thread per fastq file, but I would like it to only run a limited number of threads on our HPC node. I suppose that I could run afterqc on only a limited number of fastq files at a time, but I thought that there might be a more elegant solution.
Thanks,
Ken

bubble

Hi, what is the principle of AfterQC to detect bubbles? how is it reflected in the data?

Adapters trimming

As far as I understood you don't have predefined sequences for adapters:

"By searching the best overlapping of each pair, AfterQC automatically detects and cuts adapters for pair-end data, with no need of adapter sequence input"

I used AfterQC on metagenomics data and it seemed to reduce the numbers of adapters, but not entirely. What could be the cause? Also, what is important, I couldn't see it from the AfterQC report, I checked it with fastqc/multiqc, so from the user's point of view I would want to still see the validation for typical adapters in QC report.

Please, check the last post here - Sudden quality drop in the middle of HiSeq R1 reads but not in R2

If it is somehow beneficial for you I can send you full reports.

Error despite creating env with 2.7 in conda

Hi there
I am getting the following error
Traceback (most recent call last):
File "/Users/apple/miniconda3/envs/py27/bin/after.py", line 228, in
main()
File "/Users/apple/miniconda3/envs/py27/bin/after.py", line 222, in main
processOptions(options)
File "/Users/apple/miniconda3/envs/py27/bin/after.py", line 175, in processOptions
filter.run()
File "/Users/apple/miniconda3/envs/py27/share/afterqc-0.9.7-3/preprocesser.py", line 249, in run
self.r1qc_prefilter.statFile(self.options.read1_file)
File "/Users/apple/miniconda3/envs/py27/share/afterqc-0.9.7-3/qualitycontrol.py", line 350, in statFile
self.statRead(read)
File "/Users/apple/miniconda3/envs/py27/share/afterqc-0.9.7-3/qualitycontrol.py", line 80, in statRead
self.totalNum[i] += 1
IndexError: list index out of range

Regards
Dinesh

TODO: integrate pair-end sequencing based error correction

To reduce SV false alarm rate.

Removal of PCR/RNA primers

I am currently using afterqc to do QC trimming of the reads. Generally it performs well, removes adapters etc. However I noticed it doesn't seem to screen out primers. Would it be possible to have this as an add on?

Cheers
Amali

Plans for reading .gz ?

Hi, nice tool.

Are there any plans for reading gzipped files in the future ? This can be quite helpful, especially when reanalyzing quality of older compressed projects.

Also, is there an internal adapter library or does AfterQC find adapters by itself in the reads ? I couldn't understand the command line help about barcodes.

Thanks.
Colin

extra whitespace before shebang line in after.py

Hi,

please remove the first space before the first-line # in after.py, it is causing this error (the file interpreter is not recognized):

$ ./after.py
from: can't read /var/mail/optparse
from: can't read /var/mail/multiprocessing
from: can't read /var/mail/util
./after.py: line 12: syntax error near unexpected token `('
./after.py: line 12: `def parseCommand():

Thanks :)

File exists: './QC'

Even when removing previous runs' QC folders, I still get an error when running after.py

  File "/home/sdt5z/anaconda/lib/python2.7/os.py", line 157, in makedirs
    mkdir(name, mode)
OSError: [Errno 17] File exists: './QC'

Overlap merge can be optimized

Fast forward mode can be applied

1) AfterQC is slow 2)Aggregate results from many samples into a single report

Hi,
I am running 20 paired-end RNA-seq samples since yesterday (more than 24 hours over) and only 7 samples have been completed (others are still running) on 16 GB RAM computer.
Any way to make it faster?

Secondly, I am wondering if there is any possibility to aggregate results from many samples into a single report?As per MultiQC [https://github.com/ewels/MultiQC], AfterQC is not in their list of supported tools.

Any suggestions please!
Thanks!

AttributeError: 'tuple' object has no attribute 'major'

Hi,
Firstly, I am new to Linux, so the solution might be obvious, but I can't get AfterQC to run. I keep getting this error:

Traceback (most recent call last):
File "after.py", line 208, in
main()
File "after.py", line 175, in main
if sys.version_info.major >2:
AttributeError: 'tuple' object has no attribute 'major'

I tried the -h option and adding a path to my data but get the same error. I installed the editdistance with the make command in the AfterQC directory. I am using python version 2.6.6.