nunofonseca / irap Goto Github PK

integrated RNA-seq Analysis Pipeline

License: GNU General Public License v3.0

R 34.66% TeX 0.79% CSS 0.39% HTML 9.45% Perl 5.59% Makefile 22.70% Shell 16.11% C 4.00% Prolog 0.73% Python 5.44% Dockerfile 0.14%

irap pipeline rna-seq-analysis rna-seq gene-expression transcript-quantification bulk-rna-seq single-cell-rna-seq exon-quantification differential-expression

irap's Introduction

IRAP - Integrated RNA-Seq Analysis Pipeline

iRAP is a flexible RNA-seq analysis pipeline that allows the user to select and apply their preferred combination of existing tools for mapping reads, quantifying expression and testing for differential expression. Depending upon the application, iRAP can be used to quantify expression at the gene, exon or transcript level.

Please consult the wiki (https://github.com/nunofonseca/irap/wiki) for further information.

Problems? Errors?

Errors may occur while running one of the many different pieces of software included in IRAP (e.g., a mapper, a quantification method, etc). If this happens then please report the error directly to the developers of the program.

The authors of IRAP always appreciate receiving suggestions for improvements, and reports of bugs in the pipeline or in the documentation. Please submit them through https://github.com/nunofonseca/irap/issues

irap's People

Contributors

Stargazers

Watchers

irap's Issues

Latest docker image is not tagged

Since the latest docker image is not tagged in the repository it appears to be necessary to download all of them to get the most recent one.

E.g.

docker pull docker.io/nunofonseca/irap_ubuntu
Using default tag: latest
Trying to pull repository docker.io/nunofonseca/irap_ubuntu ...
Pulling repository docker.io/nunofonseca/irap_ubuntu
Tag latest not found in repository docker.io/nunofonseca/irap_ubuntu

Suggestion

Here are other propositions of multicore improvement in IRAP. I was working on 
revision a856017baf79c4caf37cdc20a774026f3e276a1a .

There could be several other places to parallelize with parallel package and 
mclapply .

Original issue reported on code.google.com by [email protected] on 1 Sep 2014 at 12:48

Attachments:

compress FastQ?

My inputs are BAMs. With "base quality filtering" turned on, iRAP generates 10 FastQ files per BAM (2 are along with the input BAM, 8 are in output/data folder). They are not compressed and take quite a lot of disk spaces (input BAM is 4GB, the total size of FastQs is about 70GB). If they are compressed while writing, we can run more samples simultaneously.

Set to 85% of the read by default

Dear mr. Fonseca,

In irap_fastq_qc there is this part (line 266, see below) which cuts read length by 15%. I was wondering why this is? also, there does not seem to be a way to change this setting other than altering the code, am I correct?
(Shouldn't reads only be cut based on their quality, I mean some reads are showing very high phred scores after 140 bases even.)

#Minimum length of a read after filtering
#Set to 85% of the read by default
ifndef min_len
min_len=$(shell perl -e "print int($(read_size)*0.85)")
endif
bases2trim=$(shell perl -e "print $(read_size)-$(min_len)")
$(info * min_len=$(min_len))

Contamination check errors

Hi Nuno,

While using iRAP to filter out possible contamination from a mouse genome, we ran into an issue (mind you we are still using 0.8.5p2 but I didn't see any indication of a fix in the mean time).

In the script irap_fastq_qc (in the scripts folder), on line 680, bowtie is used with the options --un, this allows one to generate a sam file with all reads mapped to the contaminating genome and a new fastq file with all those reads removed. The sam file is stored as a bam file (with samtool view -b), but this specific process goes wrong with an error which I don't have handy right now.

What we did to fix it is remove the sam to bam command and then the pipeline progresses without problems (the bam file with reads from the contaminating organism does not seem to be used anymore). We highly suspect that the error is due to a space put in the read name put there by Bowtie. We found online that other people also seem have the same problem with Bowtie specifically.

If you want we can supply more details but as said I don't have them handy and perhaps you already know what could be wrong. Please let me know if your want me to be more specific.

We have solved this problem for ourselves but perhaps you may also want to solve it in iRAP.

Highest regards,

Freek.

Edit: A colleague hunted down the error, this was it (following samtools view -b contamination_reads.sam):

[W::sam_read1] parse error at line 1
[main_samview] truncated file.
Error while flushing and closing output
terminate called after throwing an instance of 'int'

our_prefix error when analysis name is same as the raw read file name without suffix

With the following in the config file, an "out_prefix" error is raised, but this only happens when raw read quality control of the sample starts. It would be great if this can be checked at the starting point of iRAP.

Part of the config file:

    name=SRR933983
    mapper=tophat2
    quant_method=htseq1
    qual_filtering=on
    trim_reads=n
    cont_index=no
    quant_method=htseq1
    se=SRR933983
    SRR933983=SRR933983.fastq.gz
    SRR933983_rs=50
    SRR933983_qual=33

The error:

12:39:34 19/03/2018 *  Filtering SRR933983

irap_fastq_qc tmp_dir=irap_data/tmp  threads=8  qual_filtering=on  min_qual=10 trim=n cont_index=no mapper=bowtie data_dir=irap_data/raw_data/ecoli_k12_test// read_size=50   qual=33 f="SRR933983.fastq.gz" out_prefix=SRR933983 is_pe= out_dir=SRR933983/data// report_dir=SRR933983/report/riq//  || (rm -f SRR933983/data//SRR933983.f.fastq && exit 1)
make[1]: Entering directory '/var/spool/cwl'
SRR933983.fastq.gz 
f1=SRR933983.fastq.gz
f2=
*************************************************
* 
* Required Parameters:
/opt/irap/scripts/irap_fastq_qc:209: *** * out_prefix should be different than fastq file prefix.  Stop.
* FILES=irap_data/raw_data/ecoli_k12_test///SRR933983.fastq.gz pe=OFF
make[1]: Leaving directory '/var/spool/cwl'
/opt/irap/scripts/irap:1707: recipe for target 'SRR933983/data//SRR933983.f.fastq' failed
make: *** [SRR933983/data//SRR933983.f.fastq] Error 1

Problem with Report generation

Hey there,

I have Problems regarding the report generation. The analysis seems to be fine, but in report, it stops with a error. You can find the log of the report and the analysis (which seems to be missing some on the beginning?!) attached.
AnalysisLog.txt
ReportLog.txt

And I often get the following warning. Is this harmfull?

In fread(input = f, sep = "\t", header = "auto", check.names = FALSE,  :
  Bumped column 19 to type character on data row 124, field contains '04110;04114;04115;04914;05200;05215;05222'. Coercing previously read values in this column from logical, integer or numeric back to character which may not be lossless; e.g., if '00' and '000' occurred before they will now be just '0', and there may be inconsistencies with treatment of ',,' and ',NA,' too (if they occurred in this column before the bump). If this matters please rerun and set 'colClasses' to 'character' for this column. Please note that column type detection uses a sample of 1,000 rows (100 rows at 10 points) so hopefully this message should be very rare. If reporting to datatable-help, please rerun and include the output from verbose=TRUE.

Best regards,
Niklas

Step 3 (scripts/irap_install.sh -s . -x core) crashes

What steps will reproduce the problem?
1. git clone http://code.google.com/p/irap/ irap_clone
2. export IRAP_DIR=path_to_directory_where_irap_will_be_installed
3. scripts/irap_install.sh -s . -x deps (completed without error)
4. scripts/irap_install.sh -s . -x core

This last step crashes. It seems to have found all dependencies though.
I'm on Scientific Linux 6.2 (Carbon)

Please provide any additional information below.

[[email protected] irap_clone]$ scripts/irap_install.sh -s . -x core
[INFO] iRAP 0.4.2
[INFO] env found in /usr/bin/env
Linux frontend1 2.6.32-431.1.2.el6.x86_64 #1 SMP Thu Dec 12 13:59:19 CST 2013 
x86_64 x86_64 x86_64 GNU/Linux
[INFO] If installation fails then please check if the following libraries are 
installed:
[INFO] zlib-devel python-devel bzip2-devel python readline-devel libgfortran 
gcc-gfortran libX11-devel libXt-devel numpy gd-devel libxml2-devel libxml2 
libpng libcurl-devel expat-devel [db-devel|db4-devel|libdb-devel] libpangocairo
[INFO] Checking dependencies...
[INFO]  java found: /usr/bin/java
[INFO]  python found: /usr/bin/python
[INFO]  gcc found: /usr/bin/gcc
[INFO]  g++ found: /usr/bin/g++
[INFO]  gfortran found: /usr/bin/gfortran
[INFO]  curl-config found: /usr/bin/curl-config
[INFO]  git found: /usr/bin/git
[INFO] Checking dependencies...done.
[INFO] Checking paths...
[INFO] IRAP_DIR=/data/home/hhx037/irap
[INFO] SRC_DIR=/data/home/hhx037/temp_install/irap_clone
[INFO] 
PATH=/usr/bin/:/bin/:/data/home/hhx037/gperf-3.0.4/bin/:/data/home/hhx037/cuffli
nks-2.2.1/:/data/home/hhx037/new_fugue/:/data/home/hhx037/bedtools-2.20.1-13/bin
/:/data/home/hhx037/vcftools-0.1.12a/bin/:/data/home/hhx037/samtools-0.1.19/usr/
local/bin/:/data/home/hhx037/pixman-0.32.6/:/data/home/hhx037/R-3.1.1/bin/:/data
/home/hhx037/harfbuzz-0.9.27/:/data/home/hhx037/glib-2.40.0/:/data/home/hhx037/g
cta/:/data/home/hhx037/eigen-eigen-6b38706d90a9/:/data/home/hhx037/NGSQCToolkit-
2.3.3/:/data/home/hhx037/shapeit/:/data/home/hhx037/metal/:/data/home/hhx037/imp
ute-2.3.1/:/data/home/hhx037/IGV_2.3.20/:/data/home/hhx037/tophat-2.0.12/:/data/
home/hhx037/picard-tools-1.117/:/data/home/hhx037/qctool-1.4/:/data/home/hhx037/
globusconnect-1.6/:/data/home/hhx037/Hapmix-1.2/bin/:/data/home/hhx037/fastx_too
lkit-0.0.13/bin/:/data/home/hhx037/gtool-0.7.5/:/data/home/hhx037/locuszoom/:/da
ta/home/hhx037/mantra/:/data/home/hhx037/gwama/:/data/home/hhx037/FastQC/:/data/
home/hhx037/bowtie2-2.2.3/:/data/home/hhx037/metasoft/:/data/home/hhx037/boost-1
.55.0/:/data/home/hhx037/plink-1.90b2a/:/data/home/hhx037/jre1.7.0_07/:/data/hom
e/hhx037/GATK-3.2-0/:/data/home/hhx037/snptest-2.5/:/data/home/hhx037/htslib/usr
/local/:/data/home/hhx037/bwa/:/opt/sge/bin:/opt/sge/bin/lx-amd64:/usr/lib64/qt-
3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/usr/lpp/mm
fs/bin
[INFO] Checking paths...done.
[INFO] Cleaning up /data/home/hhx037/irap/tmp...
[INFO] Cleaning up /data/home/hhx037/irap/tmp...done.
[INFO] *******************************************************
[INFO] IRAP_INSTALL_TOPLEVEL_DIRECTORY=/data/home/hhx037/irap
[INFO] IRAP_SRC_TOPLEVEL_DIR=/data/home/hhx037/temp_install/irap_clone
[INFO] *******************************************************
[INFO] Created setup file /data/home/hhx037/irap/irap_setup.sh
[INFO] Loaded /data/home/hhx037/irap/irap_setup.sh
[INFO] Installing irap files...
[INFO] Installing irap files...done.
[INFO] Compiling and installing fastq/bam processing programs...
~/temp_install/irap_clone/src/fastq_utils ~/irap/tmp
gcc  -O3 -o fastq_truncate fastq_truncate.c
gcc -O3 -c hash.c
gcc -O3 -c fastq_filterpair.c
gcc -O3 -c fastq_randsample.c
gcc  -O3 -o fastq_trim fastq_trim.c
gcc -O3 -c fastq_filter_n.c
gcc -O3 -c fastq_validator.c
gcc  -O3 -o fastq_filter_n fastq_filter_n.o
gcc  hash.o fastq_filterpair.o -o fastq_filterpair
gcc  -O3 -o fastq_randsample fastq_randsample.o
gcc  -O3 hash.o fastq_validator.o -o fastq_validator
~/irap/tmp
~/temp_install/irap_clone/src/bamutils ~/irap/tmp
gcc -O1 -c  bam_fix_NH.c   -I /data/home/hhx037/irap/include/bam/
bam_fix_NH.c:26:17: error: bam.h: No such file or directory
bam_fix_NH.c:27:17: error: sam.h: No such file or directory
bam_fix_NH.c:28:27: error: kstring.h: No such file or directory
bam_fix_NH.c:38: error: expected specifier-qualifier-list before 'uint8_t'
bam_fix_NH.c: In function 'get_read_aln':
bam_fix_NH.c:55: error: 'READ_ALN' has no member named 'name'
bam_fix_NH.c:55: error: 'READ_ALN' has no member named 'name'
bam_fix_NH.c:55: error: 'READ_ALN' has no member named 'name'
bam_fix_NH.c:55: error: 'READ_ALN' has no member named 'name'
bam_fix_NH.c:55: error: 'READ_ALN' has no member named 'name'
bam_fix_NH.c:55: error: 'READ_ALN' has no member named 'name'
bam_fix_NH.c:55: error: 'READ_ALN' has no member named 'name'
bam_fix_NH.c:55: error: 'READ_ALN' has no member named 'name'
bam_fix_NH.c:55: error: 'READ_ALN' has no member named 'name'
bam_fix_NH.c:55: error: 'READ_ALN' has no member named 'name'
bam_fix_NH.c:55: error: 'READ_ALN' has no member named 'name'
bam_fix_NH.c:55: error: 'READ_ALN' has no member named 'name'
bam_fix_NH.c:55: error: 'READ_ALN' has no member named 'name'
bam_fix_NH.c:55: error: 'READ_ALN' has no member named 'name'
bam_fix_NH.c:55: error: 'READ_ALN' has no member named 'name'
bam_fix_NH.c:55: error: 'READ_ALN' has no member named 'name'
bam_fix_NH.c:55: error: 'READ_ALN' has no member named 'name'
bam_fix_NH.c:55: error: 'READ_ALN' has no member named 'name'
bam_fix_NH.c:55: error: 'READ_ALN' has no member named 'name'
bam_fix_NH.c:55: error: 'READ_ALN' has no member named 'name'
bam_fix_NH.c: In function 'new_read_aln':
bam_fix_NH.c:92: error: 'READ_ALN' has no member named 'name'
bam_fix_NH.c:93: error: 'READ_ALN' has no member named 'name'
bam_fix_NH.c:94: error: 'READ_ALN' has no member named 'name'
bam_fix_NH.c:95: error: 'READ_ALN' has no member named 'ctr'
bam_fix_NH.c:101: error: 'READ_ALN' has no member named 'ctr'
bam_fix_NH.c: In function 'main':
bam_fix_NH.c:136: error: 'bamFile' undeclared (first use in this function)
bam_fix_NH.c:136: error: (Each undeclared identifier is reported only once
bam_fix_NH.c:136: error: for each function it appears in.)
bam_fix_NH.c:136: error: expected ';' before 'in'
bam_fix_NH.c:137: error: expected ';' before 'out'
bam_fix_NH.c:146: error: 'in' undeclared (first use in this function)
bam_fix_NH.c:148: error: 'out' undeclared (first use in this function)
bam_fix_NH.c:163: error: 'bam_header_t' undeclared (first use in this function)
bam_fix_NH.c:163: error: 'header' undeclared (first use in this function)
bam_fix_NH.c:169: error: 'bam1_t' undeclared (first use in this function)
bam_fix_NH.c:169: error: 'aln' undeclared (first use in this function)
bam_fix_NH.c:170: error: 'prev' undeclared (first use in this function)
bam_fix_NH.c:180: error: 'BAM_FUNMAP' undeclared (first use in this function)
bam_fix_NH.c:181: error: 'BAM_FREAD2' undeclared (first use in this function)
bam_fix_NH.c:196: error: 'in2' undeclared (first use in this function)
bam_fix_NH.c:214: error: 'uint8_t' undeclared (first use in this function)
bam_fix_NH.c:214: error: 'old_nh' undeclared (first use in this function)
bam_fix_NH.c:215: error: 'READ_ALN' has no member named 'ctr'
bam_fix_NH.c:221: error: expected expression before ')' token
bam_fix_NH.c:227: error: expected expression before ')' token
make: *** [bam_fix_NH.o] Error 1

Original issue reported on code.google.com by [email protected] on 24 Jul 2014 at 2:54

contamination genomes built with bowtie2 won't be recognised in contamination filtering

iRAP by default uses bowtie to check contaminations from other genomes in raw reads, so following the example below, genomes built won't be taken by iRAP.
https://github.com/nunofonseca/irap/blob/master/examples/ex_add2contaminationDB2.sh#L5

I am using iRAP to do a differential expression analysis between two groups made by two biological replicates each. Sequencing was performed with 454, so reads are on average longer than 200nt.

What steps will reproduce the problem?
1.
2.
3.

What is the expected output? What do you see instead?


What version of the product are you using? On what operating system?


Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 18 Aug 2014 at 10:37

Rerunning report generator after failure does nothing

What steps will reproduce the problem?
1. Run iRAP fully to get quantification data for the libraries
2. Ensure that the next step will fail, e.g. by removing one of the required R 
packages (e.g. `agricolae`)
3. Run the report generator, e.g. `irap conf=some.conf quant_report`
4. Rerun the report generator after the previous run (step 3) failed due to the 
missing package

What is the expected output? What do you see instead?

Expected result: Report is fully generated, either by picking up where it left 
off before, or by regenerating the whole report from scratch.

Actual result: irap reports “Nothing to be done for `quant_report`”, report 
is not fully generated.

What version of the product are you using? On what operating system?

iRAP 0.2.4.

Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 17 Sep 2013 at 2:17

Error at piano step of irap

Hi,

I'm trying to execute example ecoli data to test irap pipeline but unfortunately getting error at piano step. Also when generating report of the analysis, the de folder in report is empty.

Here is the error msg:

[INFO] Data in ecoli_test/bowtie2/cufflinks2/cuffdiff2/GAvsGB.genes_de.tsv loaded: 2893 entries
[INFO] Loading annotation file ecoli_test/data/empty.gene.annot.tsv...
[INFO] Load annot...
[INFO] Loading annotation file ecoli_test/data/empty.gene.annot.tsv...done.
[ERROR] Empty annotation file ecoli_test/data/empty.gene.annot.tsv
/apps/irap_install2/scripts/../aux/mk/irap_gse.mk:140: recipe for target 'ecoli_test/bowtie2/cufflinks2/cuffdiff2/GAvsGB.genes.gse.piano.fisher.go.tsv' failed
make: *** [ecoli_test/bowtie2/cufflinks2/cuffdiff2/GAvsGB.genes.gse.piano.fisher.go.tsv] Error 1

bamToFastq pickup the wrong bam file name

When input is a bam file (eg: the_sample.bam), bamToFastq is using the_sample.bam.sorted.bam as its input, while samtools sort produces a the_sample.bam.sorted, hence bamToFastq produces empty Fastqs.

bamToFastq should raise an error at this point as well, not to wait till Fastqc complains.

Add to Dockstore with CLW mapping

Is there any chance you would consider adding the tool to Dockstore.org?

gnpplot

Hi,

I noticed that the gnuplot packaged with irap does not support png.

The command set term shows this:

Available terminal types:
canvas HTML Canvas object
cgm Computer Graphics Metafile
context ConTeXt with MetaFun (for PDF documents)
corel EPS format for CorelDRAW
dumb ascii art for anything that prints text
dxf dxf-file for AutoCad (default size 120x80)
eepic EEPIC -- extended LaTeX picture environment
emf Enhanced Metafile format
emtex LaTeX picture environment with emTeX specials
epslatex LaTeX picture environment using graphicx package
fig FIG graphics language for XFIG graphics editor
gpic GPIC -- Produce graphs in groff using the gpic preprocessor
hp2623A HP2623A and maybe others
hp2648 HP2648 and HP2647
hpgl HP7475 and relatives [number of pens] [eject]
imagen Imagen laser printer
latex LaTeX picture environment
mf Metafont plotting standard
mif Frame maker MIF 3.00 format
mp MetaPost plotting standard
pcl5 HP Designjet 750C, HP Laserjet III/IV, etc. (many options)
postscript PostScript graphics, including EPSF embedded files (*.eps)
pslatex LaTeX picture environment with PostScript \specials
pstex plain TeX with PostScript \specials
pstricks LaTeX picture environment with PSTricks macros
qms QMS/QUIC Laser printer (also Talaris 1200 and others)
regis REGIS graphics language
svg W3C Scalable Vector Graphics driver
tek40xx Tektronix 4010 and others; most TEK emulators
tek410x Tektronix 4106, 4107, 4109 and 420X terminals
texdraw LaTeX texdraw environment
tgif TGIF X11 [mode] [x,y] [dashed] ["font" [fontsize]]
tkcanvas Tk/Tcl canvas widget [perltk] [interactive]
tpic TPIC -- LaTeX picture environment with tpic \specials
unknown Unknown terminal type - not a plotting device
vttek VT-like tek40xx terminal emulator
xterm Xterm Tektronix 4014 Mode

This makes it throw an error for some parts of the irap pipeline:
set term png size 2048,768

I'm running Ubuntu 64-bit 14.04 LTS

Thanks,
Jody

Error in dockerfile, should use WORKDIR <dir> instead of RUN cd <dir>

What steps will reproduce the problem?
1. docker build <pathWhereDockerfileIs>
2.
3.

What is the expected output? What do you see instead?

The image fails to build because it cannot find ./scripts/irap_install.sh

What version of the product are you using? On what operating system?


Please provide any additional information below.

Changing

RUN cd irap_clone

for 

WORKDIR /opt/irap_clone

fixes this.

Pablo

Original issue reported on code.google.com by [email protected] on 20 Nov 2014 at 4:05

file existence check

https://github.com/nunofonseca/irap/blob/master/scripts/irap_install.sh#L83-L86
In some cases, $SRC_DIR/download/$FILE2 does not exist. The file existence needs to be checked before making the link.

Update instructions

Hi Nuno,

I noticed a small issue with the update instructions. You may want to include the command "source irap_install/irap_setup.sh" in the instructions for how to update. If someone hasn't run iRAP for a while they may forget to setup their environment before the update, which throws an error.

Thanks,

Derek

I am using iRAP to do a differential expression analysis between two groups made by two biological replicates each. Sequencing was performed with 454, so reads are on average longer than 200nt. I ran iRAP two days ago, but now it seems that the process has been slowed in a step requiring the use of R, but I don't understand which one. I attach to this message the log file generated during the process.

What steps will reproduce the problem?
1. mapping (I think)
2.
3.

What is the expected output? What do you see instead?
I see a message of warning : "WARNING: multicore has been superseded and will 
be removed shortly"

What version of the product are you using? On what operating system?
iRAP 0.4.2 on a Linux server

Please provide any additional information below.

I attach here the log file generated during the process.

Thank you very much for your help.
Claudia Calabrese

Original issue reported on code.google.com by [email protected] on 18 Aug 2014 at 10:53

Attachments:

logfile.txt

-R is ignored when R is not preinstalled

https://github.com/nunofonseca/irap/blob/master/scripts/irap_install.sh#L134-L139
Those lines mean if R is not preinstalled, even -R is issued, iRAP installation will be stopped.

Merging Matrices in libTSVAggrTransByGene fails?

Hey there,

I try to use kallisto to quant the reads, but it failes during execution of mentioned script.

new.mat<-merge(tsv.mat,g2t,by.x=colnames(tsv.mat)[1],by.y="transcript_id",all.x=TRUE,sort=F)
new.mat <- new.mat[new.mat$transcript_id %in% as.character(g2t$transcript_id),,drop=FALSE]

The first new.mat is full with information, but after, I think filtering?, the new.mat is totally empty. May it be to wrong files in reference folder?

No multithread for bowtie2-build

Hey there,
I noticed bowtie2-build is not using multi threads, even if i use the max_threads option in the configuration file. Is there any workaround or option I can pass to iRAP to enable multi thread for bowtie2-build?

Best regards,
Niklas

Why the "samtools view -F 4" and the split of tsv files

Hi Nuno,

As always I am thoroughly enjoying iRAP. I do wonder about two things though, these two things are hardcoded and I would prefer not to change "Nuno approved" settings because inevitably they lead to surprises down the road.

The first is: As an input for htseq-count, "samtools view -F 4 somebam.bam" is used. This removes the unmapped reads as input for htseq-count and sets them to 0. This means they are also disregarded in our QC reporting (using MultiQC), is there a reason for doing this?

The second: You/iRAP splits the output of htseq-count into the quantified genes and the "stats" (__no_feature, __ambiguous, etc). Again this confuses our QC reporting. This is not really a big deal because we can easily cat them together but still, I was wondering if there as a good reason for this (maybe because of subsequent normalization?)...

Highest regards,

Freek.

Not handling 'qual_filtering=none' as expected

When 'qual_filtering=none' is specified and input is not a un-compressed FastQ (e.g. a BAM or fastq.gz), iRAP creates a symbolic link to the input only without checking if it's in the right format for the downstream mapper to use.

Problem dry-running the ecoli_ex.conf file

Hi,

After installing IRAP from sources, I tried to dry-run the ecoli_ex.conf file (while in irap_install directory) as follows

ikramu:irap_install$ ./scripts/irap conf=ecoli_ex.conf mapper=tophat1 de_method=deseq max_threads=8 -n

but got the following errors.
scripts/irap:1302: /../aux/mk/irap_map.mk: No such file or directory
scripts/irap:1304: /../aux/mk/irap_jbrowse.mk: No such file or directory
scripts/irap:1306: /../aux/mk/irap_report.mk: No such file or directory
scripts/irap:1308: /../aux/mk/irap_citations.mk: No such file or directory
scripts/irap:1310: /../aux/mk/irap_quant.mk: No such file or directory
scripts/irap:1312: /../aux/mk/irap_norm.mk: No such file or directory
scripts/irap:1314: /../aux/mk/irap_de.mk: No such file or directory
scripts/irap:1316: /../aux/mk/irap_gse.mk: No such file or directory
scripts/irap:1319: /../aux/mk/irap_atlas.mk: No such file or directory
scripts/irap:1322: /../aux/mk/irap_fusion.mk: No such file or directory
scripts/irap:1334: warning: undefined variable 'gse_tool'
bash: irap_paths: command not found

Can you please guide if there is any path I forgot to set?

Thanks!
Ikram

how to norm the multiple bio replication TPM values into one tissue/condition value?

Hi.

how to norm the multiple bio replication TPM values into one tissue/condition value?

irap_single_lib is very good to process one sample(replication).

and is there a tool to norm multiple result into one value?

irap_deseq_norm seems to norm them to multiple value.

Best Regards

IGV_VERSION=2.1.24

Dear Mr. Fonseca,

The IGV version included in iRAP is from 2012. This version fails to load some things from the internet. The new version also has some nice improvements.

Best regards,

Freek.

At least one intron is needed

It appears that IRAP needs at least one intron to run correctly. This problem 
might be related to bedtools. File RESULT_DIR/data/introns.bed is empty during 
the report phase and "bedtools intersect" fails .

As a dirty fix, I modify my gtf files in order to create a fake intron.

I'm working on revision 07dbaeb7f5f137db15e2567e7320fc013386162b .

Original issue reported on code.google.com by [email protected] on 8 Aug 2014 at 12:40

missing file on R component

Hi,
I apologize if I'm missing something or being particularly naive. I wonder if someone could post in the right direction with the following exit fail. The attached output is from script and with debug set.

Thank you in advance
irap.txt

Pete

irap_raw2metric no html file, not gzipped

Hi Nuno,

First of all, I'm on 0.8.5p2 so I'm sorry of you already solved this.

I noticed (renormalizing data on protein coding genes only), that the help of irap_raw2metric specifies:

-o OUT, --out=OUT
    Output file name prefix. An HTML and TSV.gz file will be created.

But I don't see any html being generated and the resulting file is also not gzipped.

Highest regards,

Freek.

more guideline for Atlas SOP

Hi @nunofonseca

We need more guideline for Atlas SOP to get the same result like Expression Atlas.

Result sample:
https://www.ebi.ac.uk/gxa/experiments/E-MTAB-2690/Downloads

Question1:
How to get both TPM and FPKM Expression values across all genes?
"sop=atlas" does not work as expected.

Question2:
Can we run iRAP without DE when we get these TPM and FPKM Expression values ?

I used the version irap-v0.8.5p4 but failed to the right way to do these.

Best Regards

Error running irap_install.sh

Hi there,
I tried to install iRAP on my Ubuntu 14.04 system, but it fails and I cannot solve the Problem. Can you help me? I attach the .log file. If you need more information, please ask for them.
Niklas

log.txt

Unknown regexp modifier "/t" at (eval 1) line 1, at end of line

Hi,

I am getting the following error while running irap:

Time taken: 0.02338505 s
[INFO] Saved length of genes, transcripts and exons to /mnt/storage/jody/RNAseq/analysis/data/reference/Ovis_aries/Ovis_aries.Oar_v3.1.88.chr.gtf.lengths.Rdata.tmp.Rdata
[INFO] -->/mnt/storage/jody/RNAseq/analysis/data/reference/Ovis_aries/Ovis_aries.Oar_v3.1.88.chr.gtf.lengths.Rdata.tmp.gene_length.tsv

[INFO] -->/mnt/storage/jody/RNAseq/analysis/data/reference/Ovis_aries/Ovis_aries.Oar_v3.1.88.chr.gtf.lengths.Rdata.tmp.exon_length.tsv

[INFO] -->/mnt/storage/jody/RNAseq/analysis/data/reference/Ovis_aries/Ovis_aries.Oar_v3.1.88.chr.gtf.lengths.Rdata.tmp.trans_length.tsv

[INFO] #Genes: 26247
[INFO] #Transcripts: 28294
[INFO] #Exons: 236018
Unknown regexp modifier "/t" at (eval 1) line 1, at end of line
Unknown regexp modifier "/r" at (eval 1) line 1, at end of line
Unknown regexp modifier "/e" at (eval 1) line 1, at end of line
Bareword "jody" not allowed while "strict subs" in use at (eval 1) line 1.
Bareword "RNAseq" not allowed while "strict subs" in use at (eval 1) line 1.
Bareword "analysis" not allowed while "strict subs" in use at (eval 1) line 1.
Bareword "data" not allowed while "strict subs" in use at (eval 1) line 1.
Bareword "reference" not allowed while "strict subs" in use at (eval 1) line 1.
Bareword "Ovis_aries" not allowed while "strict subs" in use at (eval 1) line 1.
Bareword "Ovis_aries" not allowed while "strict subs" in use at (eval 1) line 1.
Bareword "Oar_v3" not allowed while "strict subs" in use at (eval 1) line 1.
Bareword "gtf" not allowed while "strict subs" in use at (eval 1) line 1.
Bareword "lengths" not allowed while "strict subs" in use at (eval 1) line 1.
Bareword "Rdata" not allowed while "strict subs" in use at (eval 1) line 1.
Bareword "tmp" not allowed while "strict subs" in use at (eval 1) line 1.
/mnt/storage/jody/software/irap_install/scripts/irap:1616: recipe for target '/mnt/storage/jody/RNAseq/analysis/data/reference/Ovis_aries/Ovis_aries.Oar_v3.1.88.chr.gtf.lengths.Rdata' failed
make: *** [/mnt/storage/jody/RNAseq/analysis/data/reference/Ovis_aries/Ovis_aries.Oar_v3.1.88.chr.gtf.lengths.Rdata] Error 255

Not sure if it is anything to do with the input files or the scripts themselves. Any help appreciated!

Jody

make: *** No rule to make target '

I am getting an error:

make: *** No rule to make target 'sheep/data/Ovis_aries.Oar_v3.1.88.chr.gtf.mapTrans2Gene.tsv', needed by 'sheep/star/cufflinks2//pt0a.se.transcripts.riu.cufflinks2.irap.tsv'. Stop.

Is there some additional file I have to include?

Thanks,
Jody

Mapping report bug

If there are less than 40 libraries, line 36 of aux/html/mapping_report.html 
causes an error. I attached a fix proposition.

Still working on revision 07dbaeb7f5f137db15e2567e7320fc013386162b .

Original issue reported on code.google.com by [email protected] on 8 Aug 2014 at 12:24

Attachments:

mapping_report.html.diff

bamToFastq produces overwhelming warnings due to secondary alignments

warnings like below:

*****WARNING: Query HF2_23587:5:2218:9293:30855 is marked as paired, but its mate does not occur next to it in your BAM file.  Skipping. 
*****WARNING: Query HF2_23587:5:2218:9577:35567 is marked as paired, but its mate does not occur next to it in your BAM file.  Skipping. 
*****WARNING: Query HF2_23587:5:2218:9871:3196 is marked as paired, but its mate does not occur next to it in your BAM file.  Skipping. 
*****WARNING: Query HF2_23587:5:2218:9881:40420 is marked as paired, but its mate does not occur next to it in your BAM file.  Skipping.

I checked a few, all have secondary alignments in the input BAM.

This won't be an issue if input BAM is unaligned, but ours happen to be aligned and have secondary alignments.

Bug?

Hey Nuno,

i am having Problems with the new version. If I run .conf files that worked in last version, now I get an error:
[INFO] #Genes: 58051
[INFO] #Transcripts: 198002
[INFO] #Exons: 1182163
Having no space between pattern and following word is deprecated at (eval 1) line 1.
Bareword found where operator expected at (eval 1) line 1, near "/home/niklas"
(Missing operator before niklas?)
Bareword found where operator expected at (eval 1) line 1, near "87.gtf"
(Missing operator before gtf?)
syntax error at (eval 1) line 1, near "/home/niklas"
/home/niklas/irap_install/scripts/irap:1616: recipe for target '/home/niklas/irap_install/data/reference/homo_sapiens/Homo_sapiens.GRCh38.87.gtf.lengths.Rdata' failed
make: *** [/home/niklas/irap_install/data/reference/homo_sapiens/Homo_sapiens.GRCh38.87.gtf.lengths.Rdata] Error 255

Line 1616 in scripts/irap is
irap_gtf2featlength --gtf $< -o [email protected] --cores $(max_threads) && mv [email protected] $@ && rename "$(gtf_file_abspath).lengths.Rdata.tmp" "$(gtf_file_abspath)" [email protected].*

Is this a bug or did I do something wrong?

No rule to make target

Hi ,

I am getting an error:
make: *** No rule to make target 'test/data/midge.gtf.mapTrans2Gene.tsv', needed by 'test/star/htseq2//KC0_A_AGTCAA_L005.se.transcripts.riu.htseq2.irap.tsv'. Stop.

Not sure what exactly went wrong... doesn't seem like there are any other errors before this.

Any thoughts on what the problem might be?

Thanks,
Jody

Strandedness setting in iRAP conf

Hi Nuno,

I was wondering, this section refers directly t htseq-count, right?

# strand specific protocol?
#mylib1_strand=first 
#mylib1_strand=second
# Default value is both (strands)
#mylib1_strand=both

But htseq-count only accepts --stranded=<yes/no/reverse>, and indeed I see that iRAP actually defaults to --stranded=no. Can I simply enter "reverse" as the setting (mylib1_strand=reverse) and will htseq-count then use that? Or should it be "second" for "reverse" and "first" for "stranded" and "both" for "no"?

Highest regards,

Freek.

mapper options in config file is ignored

Tried to use star_map_options= in config file to override/change mapping parameters of star, but it looked to me that the line was ignored every time.

One more parallel call in report

I transformed the R 'lapply' function call by a 'mclapply' in file 
scripts/irap_report_mapping . To give the number of cores specified in irap 
config file to mclapply, I also modified aux/mk/irap_report.mk .

Original issue reported on code.google.com by [email protected] on 8 Aug 2014 at 12:18

Attachments:

normalization of counts returns an error

Hi,

I am trying to mimic the steps of ExpressionAtlas pipeline, of experiment E-MTAB-2836 . To try it out I came up with the following configuration file:

name=mytest
species=homo_sapiens
reference=Homo_sapiens.GRCh38.dna.primary_assembly.fa
gtf_file=Homo_sapiens.GRCh38.85.gtf
mapper=tophat2
quant_method=htseq2
quant_norm_tool=irap
quant_norm_method=rpkm
qual_filtering=on
max_threads=16
data_dir=uhlen_data
N1=ERR315326_1.fastq.gz ERR315326_2.fastq.gz
N1_rs=50
N1_qual=33
N1_ins=350
N1_sd=60
se=
pe=N1

When I try a dry run I get the following output:

IRAP 0.8.1p7

Developed by Nuno Fonseca (authorname (at) acm.org)

This pipeline is distributed under the terms of the GNU General Public License 3
*

Initializing...

Trying to load configuration file try_quant_norm.conf...

Configuration loaded.

Required Parameters:

name=mytest

data_dir=uhlen_data

species=homo_sapiens

reference=Homo_sapiens.GRCh38.dna.primary_assembly.fa
  gtf_file  = Homo_sapiens.GRCh38.85.gtf
  gff3_file  = mytest/data/Homo_sapiens.GRCh38.85.gff3
  Transcripts = /data/stkachev/rnaseq/uhlen_data/reference/homo_sapiens/Homo_sapiens.GRCh38.dna.primary_assembly.cdna.all.fa
pe=N1

N1_sd=60

N1_ins=350

N1_rs=50

N1_qual=33

groups= parameter not defined, this should be defined if you intend to generate an HTML report for the inferred gene/level/transcript quantification

technical.replicates=NONE

Optional Parameters:

max_threads=16

tmp_dir=uhlen_data/tmp (temporary directory)

mapper=tophat2

quant_method=htseq2

exon_quant=n

transcript_quant=n

quant_norm_tool=irap

quant_norm_method=rpkm

mapper_splicing=yes

de_method=none

de_pvalue_cutoff=0.05
**/usr/local/irap/scripts/../aux/mk/irap_jbrowse.mk:221: * mixed implicit and normal rules. Stop.

Now if I comment out quant_norm_tool and quant_norm_method lines then the pipeline starts doing something. It would be nice however to normalization of counts component... How can I enable it to make it work?

Thanks,
Sasha.

irap install within docker image fails on Perl GD-2.56 module installation

What steps will reproduce the problem?
1.docker build <pathToDockerFile>
2.wait for many packages to be installed

What is the expected output? What do you see instead?

That no package installation returns a non-zero code, instead:

Running make for L/LD/LDS/GD-2.56.tar.gz
Warning: Prerequisite 'Module::Build => 0.42' for 'LDS/GD-2.56.tar.gz' failed 
when processing 'LEONT/Module-Build-0.4210.tar.gz' with 'make_test => FAILED 
but failure ignored because 'force' in effect'. Continuing, but chances to 
succeed are limited.

  CPAN.pm: Building L/LD/LDS/GD-2.56.tar.gz

Configuring for libgd version 2.1.0.
Checking for stray libgd header files...none found.

Unknown option: install_base
Usage: perl Build.PL [options]

Configure GD module.

 Options:
     -options       "JPEG,FT,PNG,GIF,XPM,ANIMGIF"   feature options, separated by commas
     -lib_gd_path   path            path to libgd
     -lib_ft_path   path            path to Freetype library
     -lib_png_path  path            path to libpng
     -lib_jpeg_path path            path to libjpeg
     -lib_xpm_path  path            path to libxpm
     -lib_zlib_path path            path to libpng
     -ignore_missing_gd             Ignore missing or old libgd installations and try to compile anyway

If no options are passed on the command line.  The program will
attempt to autoconfigure itself with the gdlib-config program (present
in GD versions 2.0.27 or later).  Otherwise it will prompt for these
values interactively.
Warning: No success on command[/irap_install/bin/perl Build.PL --install_base 
/irap_install/perl]
  LDS/GD-2.56.tar.gz
  /irap_install/bin/perl Build.PL --install_base /irap_install/perl -- NOT OK
Running Build test
  Make had some problems, won't test
Running Build install
  Make had some problems, won't install
Could not read metadata file. Falling back to other methods to determine 
prerequisites
2014/11/25 19:55:49 The command [/bin/sh -c ./scripts/irap_install.sh -a 
/irap_install -s .] returned a non-zero code: 1


What version of the product are you using? On what operating system?

dockerfile from the google code repo. The docker image uses fedora.

Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 26 Nov 2014 at 10:18

Error in mapping report if there is only one source in the gtf file

If only one source is found in the gtf file, line 256 of file 
irap_report_mapping causes an error. To prevent that error, I add fake lines in 
my gtf files.

0       falsepos        exon    1       6       .       +       .       gene_id 
"id_fp_2"; transcript_id "trans_fp_2"; exon_number "1"; oId "gtfgen"; tss_id 
"TSSfp"; gene_name "fake_gene_2";
0       falsepos        CDS     1       6       .       +       .       gene_id 
"id_fp_2"; transcript_id "trans_fp_2"; exon_number "1"; oId "gtfgen"; tss_id 
"TSSfp"; gene_name "fake_gene_2";
0       falsepos        start_codon     1       3       .       +       .       
gene_id "id_fp_2"; transcript_id "trans_fp_2"; exon_number "1"; oId "gtfgen"; 
tss_id "TSSfp"; gene_name "fake_gene_2";
0       falsepos        stop_codon      7       9       .       +       .       
gene_id "id_fp_2"; transcript_id "trans_fp_2"; exon_number "1"; oId "gtfgen"; 
tss_id "TSSfp"; gene_name "fake_gene_2";


I'm working on revision 07dbaeb7f5f137db15e2567e7320fc013386162b .

Original issue reported on code.google.com by [email protected] on 8 Aug 2014 at 12:45

Attachments:

irap_report_mapping.diff

Quantify the amount of rRNA in a library/sequencing experiment

Dear Nuno,

I was investigating the effects of rRNA on final sequencing results. One of the first questions I had: Does (in iRAP) rRNA abundace influence the final quantification? Meaning: are there still rRNA reads present when te final (TPM) normalization is performed?

I do not find anything about rRNA in the original publication of iRAP but I do find it referenced in this iRAP script: https://github.com/nunofonseca/irap/blob/master/aux/R/irap_utils.R (line 2015, 2024 and 2037). Is there a way in iRAP to quantify rRNA abundance?

Or would you recommend external tools like SortMeRNA?

Highest regards,

Freek.

VM errors

In your VM I do the following steps as per the suggestion when I open a terminal:
(A small example is already provided. To process the experiment just type irap conf=ecoli_ex.conf)

[irap@localhost ~]$ irap conf=ecoli_ex.conf
*****************************************************
* IRAP 0.8.0p1
* Developed by Nuno Fonseca (authorname (at) acm.org)
* This pipeline is distributed  under the terms of the GNU General Public License 3
*
* Initializing...
13:04:08 19/01/2017 * ERROR: ecoli_ex.conf not found
/home/irap/irap_install/scripts/irap:309: *** Fatal error.  Stop.
[irap@localhost ~]$

Then I get ecoli_ex.conf from here: https://github.com/nunofonseca/irap/blob/master/examples/ecoli_example.sh
and I get the following error:

[irap@localhost ~]$ irap conf=ecoli_ex.conf
*****************************************************
* IRAP 0.8.0p1
* Developed by Nuno Fonseca (authorname (at) acm.org)
* This pipeline is distributed  under the terms of the GNU General Public License 3
*
* Initializing...
13:04:08 19/01/2017 * ERROR: ecoli_ex.conf not found
/home/irap/irap_install/scripts/irap:309: *** Fatal error.  Stop.
[irap@localhost ~]$ mv ecoli_ex.conf.bak ecoli_ex.conf
[irap@localhost ~]$ irap conf=ecoli_ex.conf
*****************************************************
* IRAP 0.8.0p1
* Developed by Nuno Fonseca (authorname (at) acm.org)
* This pipeline is distributed  under the terms of the GNU General Public License 3
*
* Initializing...
* Trying to load configuration file ecoli_ex.conf...
* Configuration loaded.
* 
* Required Parameters:
*	name=ecoli_ex
*	data_dir=\/home/irap/irap_install/data
13:08:59 19/01/2017 * ERROR: \/home/irap/irap_install/data not found
/home/irap/irap_install/scripts/irap:362: *** Fatal error.  Stop.
[irap@localhost ~]$

Any suggestions? Is the config file I got wrong? Where in the VM can I find it?

Thanx,

Freek.

install script error

During step-by-step install, R v2 is installed instead of R v3 because of line 
230 of file scripts/irap_install.sh .

I'm under Debian testing, using revision 
07dbaeb7f5f137db15e2567e7320fc013386162b, version 0.4.2 of irap.

Original issue reported on code.google.com by [email protected] on 11 Jul 2014 at 1:24

Password Protected Issue While Extracting/Decompressin iRAP Files

To whom it may concern,

May I know what is the password required for all these password protected files ? Besides I have also tried the .tar version in Linux environment and there are also errors like “tar: Skipping to next header” and “tar: Error exit delayed from previous errors”. All this occur when I am trying to extract/decompress the downloaded .zip and .tar file.

Yours prompt reply is highly appreciated.
Thank you.

Best regards,
Wee Shean

failed to load GTF file

Because of this issue: Rdatatable/data.table#2272

https://github.com/nunofonseca/irap/blob/master/aux/R/irap_utils.R#L1461 will throw an error for large GTF files while running within a Docker container, which has only 64M for /dev/shm by default for "fread" to use to load GTF.

iRAP should probably use the latest version of data.table, which has the fix.

Could pre-compiled binaries by considered as part of an installation method ?

The current installation runs fine and to completion on Centos 7, although Ubuntu Xenial has been problematic. I wonder if it would help if the installation could take advantage of system provided or pre-compiled binaries.

This would greatly reduce the installation time (potentially > 3 hours) and potentially provide a known status outcome on a given platform ?

Thank you for irap
Pete

nunofonseca / irap Goto Github PK

irap's Introduction

irap's People

Contributors

Stargazers

Watchers

Forkers

irap's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs