GithubHelp home page GithubHelp logo

jy-zhou / freepsi Goto Github PK

View Code? Open in Web Editor NEW
10.0 10.0 2.0 126.68 MB

An alignment-free approach to estimating exon-inclusion ratios without a reference transcriptome

License: GNU General Public License v3.0

Python 30.43% Shell 21.41% C++ 47.47% Makefile 0.69%

freepsi's People

Contributors

jy-zhou avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

mdshw5

freepsi's Issues

"freePSI build -g" is not suitable for genomes with fragmented assembly

Hi,

I am working on some species with fragmented genome assembly. There will be 10 k ~ 1 million separated files in the genome directory for the "-g" parameter. Could you please modify the code to accept one single genome file containing all the scaffolds?

This modification will expand the application for your software.

Thank you!
Pengcheng Yang

Segmentation fault on example data

I've been trying to run freePSI on my own data, and receive a segmentation fault. Here's the traceback when run on the example data:

bash-4.1$ sh run_FreePSI.sh 
+ /home/shirlma2/.local/bin/jellyfish count -m 27 -s 100M -t 1 -Q 5 RNA-seq/reads_final.1.fastq -o RNA-seq/reads.1.jf
+ /home/shirlma2/.local/bin/jellyfish dump RNA-seq/reads.1.jf -o RNA-seq/reads.1.fa
+ /home/shirlma2/.local/bin/jellyfish count -m 27 -s 100M -t 1 -Q 5 RNA-seq/reads_final.2.fastq -o RNA-seq/reads.2.jf
+ /home/shirlma2/.local/bin/jellyfish dump RNA-seq/reads.2.jf -o RNA-seq/reads.2.fa
+ /home/shirlma2/.local/bin/freePSI build -k 27 -p 1 -g ./genome -a ./annotation/hg38_refGene_exonBoundary_chr21.bed -1 RNA-seq/reads.1.fa -2 RNA-seq/reads.2.fa -o ./hashtable.json



### Start to build theoretical and real kmer profile ...

*** Start to load exon boundary annotation ...
=== There are 316 genes from 1 chromosomes
    (If the number of chromosomes is not right, please check whether the annotation is sorted.)
*** Finish loading exon boundary annotation !
Elasped time 0s. 

*** Start to build theoretical kmer profile ...
+++ Build theoretical kmer profile from chr21 ...
*** Finish building theoretical kmer profile!
Elasped time 3s. 

*** Start to load reads from Jellyfish...
=== Totally 13338528 kmers loaded.
=== Reserve 13338528 high quality Kmers. (Reserve 100%)
=== 219914 kmers failed to match with theoretical kmer profile. (Reserve 98.3513%)
=== Finally collect 1321343 different possible kmers on Genome.
=== Finally collect 326534 different kmers occur in read.
=== About 40 occurrences of kmer on average.
*** Finish loading kmers from reads!
Elasped time 1s. 

*** Start to merge kmers which share same profile ...
=== 994809 kmers not occur in reads. (75.2877%).
=== 0 kmers occur less than 0 times in reads. (0%).
=== 319387 kmers share same profile. (24.1714%).
+++ Rehash kmer profile ...
    KmerCount load factor: Now 0.950778, before 0.00534304
    KmerTable load factor: Now 0.950778, before 0.00333197
=== 7147 kmers reserve after merging! 
    KmerCountSize: 7147
    KmerTableSize: 7147
    (Shrinkage rate is 0.540889%).
=== 2324 kmers are originated from unique exon. (32.5171%).
*** Finish merging kmer profile !
Elasped time 2s. 
### Finish building theoretical and real kmer profile!
### Finished!
Elasped time 6s. 
CPU Time elapsed: 5s. 
Natural Time elapsed: 6s. 
+ /home/shirlma2/.local/bin/freePSI quant -k 27 -p 1 -i ./hashtable.json -o .

*** Eigen mode: 
--- Not use intrinsic parallel in Eigen
--- Use Intel MKL in Eigen
Elasped time 0s. 



### Start to initialize parameters ...

*** Start to initialize variable indices ...
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> OK!
*** Finish initializing variable indices!
Elasped time 0s. 

*** Start to initialize coefficients ...
+++ Start to initialize effective length ...
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> OK!
+++ Finish initializing effective length!
+++ Start to initialize contribution matrix and kmer count vector ...
> OK!
+++ Finish initializing contribution matrix and kmer count vector! 
*** Finish initializing coefficients!
Elasped time 1s. 

*** Start to set initial values for variables (Gamma, Theta and Mu)...
+++ Start to allocate kmer count as initial value ...
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> OK!
+++ Finish allocating kmer count as initial value ...
=== On average, 5.1416 exons or juctions share one kmer.
=== 2.01483% kmers are unique to exons in large exon number gene.
=== 38.6076% elements of Z (316 in total) are zero.
=== 63.3393% elements of X (26437 in total) are zero.
=== 1171 elements of X (2398 in total) are zero.
=== 2.21519% genes (7 in total) contain more than 40 exons
+++ Start to do normalization ...
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> OK!
+++ Finish doing normalization!
*** Finish setting initial values for variables (Gamma, Alpha and Mu)!
Elasped time 0s. 

*** Start to initialize constraint matrix ...
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>OK! 
*** Finish initializing constraint matrix!
Elasped time 0s. 

*** Start to make Alpha feasible...
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>OK! 
+++ Finish making Alpha feasible!
=== Prune 21.6179% infeaible variables (infeasible junctions) ...
=== 0 new empty genes are found
=== Now there are 7926 variables. (Before 10112)
*** Finish initializing CGPA optimizer and making Alpha feasible...
Elasped time 0s. 

### Finish initializing parameters !
Elasped time 1s. 



### Start to estimate parameters ... 

*** Start to run EM algorithm ...
*** M-step mode:
--- Run M-step concurrently
 OK!
=== Initial log-likelihood is -6.56896e+07

+++ 1 iteration processed...
+++ E-step ...
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> OK!
Elasped time 0s. 
Thread number = 1
+++ M-step ...
Eigen threads = 1
Thread num = 1
>*** glibc detected *** /home/shirlma2/.local/bin/freePSI: free(): invalid next size (normal): 0x000000000155d7b0 ***
======= Backtrace: =========
/lib64/libc.so.6[0x3b16275e66]
/lib64/libc.so.6[0x3b162789ba]
/home/shirlma2/.local/bin/freePSI[0x42b08e]
/home/shirlma2/.local/bin/freePSI[0x43013d]
/home/shirlma2/.local/bin/freePSI[0x4114b1]
/usr/prog/GCCcore/6.4.0/lib64/libgomp.so.1(GOMP_parallel+0x3f)[0x7fd6347976ef]
/home/shirlma2/.local/bin/freePSI[0x40d892]
/home/shirlma2/.local/bin/freePSI[0x412502]
/home/shirlma2/.local/bin/freePSI[0x4091d2]
/home/shirlma2/.local/bin/freePSI[0x409d03]
/home/shirlma2/.local/bin/freePSI[0x40396a]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x3b1621ed5d]
/home/shirlma2/.local/bin/freePSI[0x403bc1]
======= Memory map: ========
00400000-00448000 r-xp 00000000 00:1e 1826571594                         /home/shirlma2/.local/bin/freePSI
00648000-00649000 r--p 00048000 00:1e 1826571594                         /home/shirlma2/.local/bin/freePSI
00649000-0064a000 rw-p 00049000 00:1e 1826571594                         /home/shirlma2/.local/bin/freePSI
00d61000-03472000 rw-p 00000000 00:00 0                                  [heap]
3b15a00000-3b15a20000 r-xp 00000000 09:01 1202416                        /lib64/ld-2.12.so
3b15c1f000-3b15c20000 r--p 0001f000 09:01 1202416                        /lib64/ld-2.12.so
3b15c20000-3b15c21000 rw-p 00020000 09:01 1202416                        /lib64/ld-2.12.so
3b15c21000-3b15c22000 rw-p 00000000 00:00 0 
3b15e00000-3b15e83000 r-xp 00000000 09:01 1202422                        /lib64/libm-2.12.so
3b15e83000-3b16082000 ---p 00083000 09:01 1202422                        /lib64/libm-2.12.so
3b16082000-3b16083000 r--p 00082000 09:01 1202422                        /lib64/libm-2.12.so
3b16083000-3b16084000 rw-p 00083000 09:01 1202422                        /lib64/libm-2.12.so
3b16200000-3b1638a000 r-xp 00000000 09:01 1202417                        /lib64/libc-2.12.so
3b1638a000-3b1658a000 ---p 0018a000 09:01 1202417                        /lib64/libc-2.12.so
3b1658a000-3b1658e000 r--p 0018a000 09:01 1202417                        /lib64/libc-2.12.so
3b1658e000-3b1658f000 rw-p 0018e000 09:01 1202417                        /lib64/libc-2.12.so
3b1658f000-3b16594000 rw-p 00000000 00:00 0 
3b16600000-3b16602000 r-xp 00000000 09:01 1202424                        /lib64/libdl-2.12.so
3b16602000-3b16802000 ---p 00002000 09:01 1202424                        /lib64/libdl-2.12.so
3b16802000-3b16803000 r--p 00002000 09:01 1202424                        /lib64/libdl-2.12.so
3b16803000-3b16804000 rw-p 00003000 09:01 1202424                        /lib64/libdl-2.12.so
3b16a00000-3b16a17000 r-xp 00000000 09:01 1202420                        /lib64/libpthread-2.12.so
3b16a17000-3b16c17000 ---p 00017000 09:01 1202420                        /lib64/libpthread-2.12.so
3b16c17000-3b16c18000 r--p 00017000 09:01 1202420                        /lib64/libpthread-2.12.so
3b16c18000-3b16c19000 rw-p 00018000 09:01 1202420                        /lib64/libpthread-2.12.so
3b16c19000-3b16c1d000 rw-p 00000000 00:00 0 
3b16e00000-3b16e07000 r-xp 00000000 09:01 1202421                        /lib64/librt-2.12.so
3b16e07000-3b17006000 ---p 00007000 09:01 1202421                        /lib64/librt-2.12.so
3b17006000-3b17007000 r--p 00006000 09:01 1202421                        /lib64/librt-2.12.so
3b17007000-3b17008000 rw-p 00007000 09:01 1202421                        /lib64/librt-2.12.so
7fd632b67000-7fd632b68000 rw-p 00000000 00:00 0 
7fd632b68000-7fd634555000 r-xp 00000000 00:1d 2589461281                 /cm/shared/apps/intel/composer_xe/2013_sp1.2.144/mkl/lib/intel64/libmkl_avx2.so
7fd634555000-7fd634754000 ---p 019ed000 00:1d 2589461281                 /cm/shared/apps/intel/composer_xe/2013_sp1.2.144/mkl/lib/intel64/libmkl_avx2.so
7fd634754000-7fd634764000 r--p 019ec000 00:1d 2589461281                 /cm/shared/apps/intel/composer_xe/2013_sp1.2.144/mkl/lib/intel64/libmkl_avx2.so
7fd634764000-7fd63476c000 rw-p 019fc000 00:1d 2589461281                 /cm/shared/apps/intel/composer_xe/2013_sp1.2.144/mkl/lib/intel64/libmkl_avx2.so
7fd63476c000-7fd634771000 rw-p 00000000 00:00 0 
7fd634771000-7fd634787000 r-xp 00000000 00:1d 3658082613                 /usr/prog/GCCcore/6.4.0/lib64/libgcc_s.so.1
7fd634787000-7fd634788000 r--p 00015000 00:1d 3658082613                 /usr/prog/GCCcore/6.4.0/lib64/libgcc_s.so.1
7fd634788000-7fd634789000 rw-p 00016000 00:1d 3658082613                 /usr/prog/GCCcore/6.4.0/lib64/libgcc_s.so.1
7fd634789000-7fd63478a000 rw-p 00000000 00:00 0 
7fd63478a000-7fd6347b6000 r-xp 00000000 00:1d 3678701844                 /usr/prog/GCCcore/6.4.0/lib64/libgomp.so.1.0.0
7fd6347b6000-7fd6347b7000 r--p 0002b000 00:1d 3678701844                 /usr/prog/GCCcore/6.4.0/lib64/libgomp.so.1.0.0
7fd6347b7000-7fd6347b8000 rw-p 0002c000 00:1d 3678701844                 /usr/prog/GCCcore/6.4.0/lib64/libgomp.so.1.0.0
7fd6347b8000-7fd634933000 r-xp 00000000 00:1d 3678976074                 /usr/prog/GCCcore/6.4.0/lib64/libstdc++.so.6.0.22
7fd634933000-7fd63493d000 r--p 0017a000 00:1d 3678976074                 /usr/prog/GCCcore/6.4.0/lib64/libstdc++.so.6.0.22
7fd63493d000-7fd634941000 rw-p 00184000 00:1d 3678976074                 /usr/prog/GCCcore/6.4.0/lib64/libstdc++.so.6.0.22
7fd634941000-7fd634945000 rw-p 00000000 00:00 0 
7fd63496b000-7fd635242000 r-xp 00000000 00:1d 2587239646                 /cm/shared/apps/intel/composer_xe/2013_sp1.2.144/mkl/lib/intel64/libmkl_gnu_thread.so
7fd635242000-7fd635442000 ---p 008d7000 00:1d 2587239646                 /cm/shared/apps/intel/composer_xe/2013_sp1.2.144/mkl/lib/intel64/libmkl_gnu_thread.so
7fd635442000-7fd635443000 r--p 008d7000 00:1d 2587239646                 /cm/shared/apps/intel/composer_xe/2013_sp1.2.144/mkl/lib/intel64/libmkl_gnu_thread.so
7fd635443000-7fd635450000 rw-p 008d8000 00:1d 2587239646                 /cm/shared/apps/intel/composer_xe/2013_sp1.2.144/mkl/lib/intel64/libmkl_gnu_thread.so
7fd635450000-7fd635451000 rw-p 00000000 00:00 0 
7fd635451000-7fd6368be000 r-xp 00000000 00:1d 2580792962                 /cm/shared/apps/intel/composer_xe/2013_sp1.2.144/mkl/lib/intel64/libmkl_core.so
7fd6368be000-7fd636abe000 ---p 0146d000 00:1d 2580792962                 /cm/shared/apps/intel/composer_xe/2013_sp1.2.144/mkl/lib/intel64/libmkl_core.so
7fd636abe000-7fd636b05000 r--p 0146d000 00:1d 2580792962                 /cm/shared/apps/intel/composer_xe/2013_sp1.2.144/mkl/lib/intel64/libmkl_core.so
7fd636b05000-7fd636b11000 rw-p 014b4000 00:1d 2580792962                 /cm/shared/apps/intel/composer_xe/2013_sp1.2.144/mkl/lib/intel64/libmkl_core.so
7fd636b11000-7fd636b2f000 rw-p 00000000 00:00 0 
7fd636b2f000-7fd637064000 r-xp 00000000 00:1d 2584444567                 /cm/shared/apps/intel/composer_xe/2013_sp1.2.144/mkl/lib/intel64/libmkl_intel_lp64.so
7fd637064000-7fd637263000 ---p 00535000 00:1d 2584444567                 /cm/shared/apps/intel/composer_xe/2013_sp1.2.144/mkl/lib/intel64/libmkl_intel_lp64.so
7fd637263000-7fd637265000 r--p 00534000 00:1d 2584444567                 /cm/shared/apps/intel/composer_xe/2013_sp1.2.144/mkl/lib/intel64/libmkl_intel_lp64.so
7fd637265000-7fd637271000 rw-p 00536000 00:1d 2584444567                 /cm/shared/apps/intel/composer_xe/2013_sp1.2.144/mkl/lib/intel64/libmkl_intel_lp64.so
7fd637271000-7fd637278000 rw-p 00000000 00:00 0 
7ffc3a91c000-7ffc3a941000 rw-p 00000000 00:00 0                          [stack]
7ffc3a9e3000-7ffc3a9e4000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
run_FreePSI.sh: line 53: 75765 Aborted                 ${FreePSI}/freePSI quant -k $K -p ${THREAD} -i ./hashtable.json -o .

lib64/libstdc++.so.6 error when testing example data

Hi,

I got an error when running the script run_FreePSI.sh and I want to get some help. The error messages were shown as
"

../bin/freePSI: /lib64/libstdc++.so.6: version CXXABI_1.3.8' not found (required by ../bin/freePSI) ../bin/freePSI: /lib64/libstdc++.so.6: version GLIBCXX_3.4.20' not found (required by ../bin/freePSI)
../bin/freePSI: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by ../bin/freePSI)
".

It seems this was an issue of the gcc. The default version of gcc was 4.8.5, so I installed gcc-6.3 using rpm and created a link to gcc-6.3 with the following commands:
"

update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc63 20
update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++63 20
update-alternatives --config gcc
update-alternatives --config g++
".

However, I met the same error. Would you tell me how I can update the version of libstdc++.so.6 which freePSI would call?

Software infomaiton:
OS: CentOS 7
The versions of packages and libraries have been verified to fulfil the requirement of FreePSI according to the installation instruction.

Many thanks.

interpretation of PSI values

Hi, thanks for the useful FreePSI tool to calcaulate PSI value! I have some questions, they might have been covered in the publication or previous issues (exon bed file #7), but it would be great if you reply again here:

  • What is the format of the ‘exon boundary’ file? I read that it has to be in bed12 format but I have a few specific questions:
  1. Can it have multiple rows for a gene corresponding to separate transcript id or is it compulsory to have only one row per gene?
  2. Can it have overlapping exons from multiple transcripts or do we need to merge them?
  3. Does it make sense to use FreePSI for calculating PSI value of a partial exon instead of the full exon? For example, instead of including coordinates of a full exon from transcriptome annotation, can we include regions within the exon to compare psi values of the start of the exon vs the end of the exon?
  • Can FreePSI be used for comparing the psi values for the same exon from two different samples? Is there any normalisation needed to account for different sequencing depths between samples? Here is an example using SUPPA

  • Similar to the above question, can we directly compare psi values of two different exons in the same sample? Does it take into account the lengths of the exons?

I appreciate your time!

Best,
Sarang

~95% of the kmer from RNA-seq data failed to match the theoretical kmer profile

Hi.

In my analysis of single end RNA-seq data, ~95% of kmers are discarded for not matching the theoretical kmer profile. Could you please suggest what could be the possible reason for such a low mapping?
Secondly, how reliable would be the resulting PSI in such cases of low mapping. Here is a snapshot for reference:
image
Please see the 6th last line in the snap shot to address this.

Thank you

Release statically compiled binary

Thanks for making your code available and also for providing a binary release. Unfortunately you've linked against libraries an average user may not have: Intel MKL and a modern glibc. If you provide a statically linked binary this wouldn't be an issue.

Discard BED12 records with all exon less than kmer length

I've encountered an issue with the freePSI quant command which causes malformed JSON output when a BED12 entry has only exon with length of less than the kmer (read) length.

You have code that checks the length of exons before pushing them to a list, but it does not handle the case when this routine creates an empty list.

for(int i = 0; i < exonnum; i++) {
int exonst = genest + boost::lexical_cast<int>(subst[i]);
int exoned = exonst + boost::lexical_cast<int>(sublen[i]);
if(exoned - exonst >= readLength) {
exons.push_back(make_pair(exonst, exoned));
predictTemp.push_back(i);
} else {
cout << "Caution: The length of exon " << i << " from gene " << geneBoundary.size() << " is shorter than reads." << endl;

The effect of this appears to be that serialization of the results to a JSON list creates a malformed list structure:

...
 [
  0,
  0
 ],
 [ <-
,
 [
  0,
  0,
  0
]

In EMAlgorithm.cpp you're writing your list as JSON and iterating over exons and have a gene increment check (line 815) but no check for gene start (line 807).

psiOutput << " [" << endl;
for(int e = 0; e < NE[g]; e ++) {
if(e == NE[g] - 1) {
psiOutput << " " << PSI[g][e] << endl << " ]";
} else {
psiOutput << " " << PSI[g][e] << "," << endl;
}
}
if(g == NG - 1) {
psiOutput << endl << "]" << endl;

You might consider fixing the IO portion of EMAlgorithm::computePSI to create empty lists when necessary.

Segmentation fault (core dumped)

Hi,
Thanks for the great work! When I was running freePSI build according to your run_FreePSI.sh script, I encountered this problem:
截屏2023-08-07 19 49 17
Do you have any suggestions to solve this problem? Thanks a lot!

exon bed file

Hi,
I have two questions regarding the exon bed file. Would appreciate your time and help in this.

  1. Should I provide non-overlapping disjoint exons for a gene OR all the exons from gencode gtf file would work (which often overlap at their boundaries)?

  2. In case I provide only the non-overlapping intervals for a gene, will free PSI infer other possible isoforms having exons with different ends than the provided bed files?

Best.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.