soedinglab / bammmotif2 Goto Github PK

Bayesian Markov Model motif discovery tool version 2 - An expectation maximization algorithm for the de novo discovery of enriched motifs as modelled by higher-order Markov models.

Home Page: https://bammmotif.mpibpc.mpg.de/

License: GNU General Public License v3.0

CMake 1.11% R 28.86% C++ 63.51% Shell 0.10% Python 6.22% C 0.19%

motif-discovery motif-analysis chip-seq bioinformatics ngs-analysis

bammmotif2's Issues

bug in joint probability calculation

There are a two issues with the homogenous markov models stored in the hbp and hbpc files

the precision is only four digits, which makes the probabilities not sum to 1
the joint probabilities in the hbp file sum to a value greater than one. It seems that the first entry of orders >1 is way too large.

Server does not stop updating when job crashes

Server does not stop updating when job crashes, in case of Motif-Motif comparison.

tiny typo in CMakeLists.txt: Boost_INCLUDEDIR -> Boost_INCLUDE_DIR

BaMMmotif2/CMakeLists.txt

Line 10 in b13d415

include_directories (${Boost_INCLUDEDIR})

Additional info about how to use the software

Hi, I have been using the server version for some time for motif discovery and I wanted to add the shell version to my pipeline. However, when I run:
BaMMmotif DIRPATH FILEPATH

I get the following message:

Error: No initial model is provided.

Am I missing some necessary step/required file?

boost in cmake

We should check for availability of boost in the cmake file so that we can output a proper error message during build phase.

Bug: Failure if -n <num> is larger than the total number of motifs in the pwm file

Failure if -n is larger than the total number of motifs in the pwm file.
It is better to use -n as a kind of maximum of pwm's you want to process...

File format specifications

If I am not mistaken, we currently do not have specifications for the some of our own file formats such as occurrence files.

Similar to MMseqs, I think we should start using the wiki for expanding the documentation step by step.

plotbamm: weird information calculation?

Is there a rationale behind not defining the information content as max_entropy - entropy here?

https://github.com/soedinglab/bamm-private/blob/master/R/plotBaMM.R#L259-L261

informationContent <- function( x, base=2 ){
    ifelse( all( x > 0 ), 2 + sum( x * log( x, base ) ), 0 )
}

This sets the information to zero if there's at least one nucleotide occurring with rel. frequency of zero, leading to very odd behavior:

> informationContent(c(0.25, 0.25, 0.25, 0.25))
0
> informationContent( c(0.01, 0.01, 0.01, 0.97))
1.75805926714679
> informationContent( c(0.0001, 0.0001, 0.0001, 0.9997))
1.99558094270164
> informationContent(c(0, 0, 0, 1))
0

conditional probabilities can exceed 1

The conditional probabilities in BaMM model/background files are sometimes larger than 1, by a margin that is higher than numerical instability would suggest: e.g. 1.12 in the attached model file, and 3.09 in the attached background model.

varlen_seqs.hbcp.txt
varlen_seqs_motif_1.ihbcp.txt

bug: vector allocation fails because of short sequences

command call:
BaMMmotif ./mcf7_GATA_narrow /home/mmeier/git/PEnG-motif/scripts/mcf7_GATA_narrow/mcf7_GATA_narrow.fasta --PWMFile /home/mmeier/git/PEnG-motif/scripts/mcf7_GATA_narrow/mcf7_GATA_narrow.tmp.out --FDR --savePvalues -K 2 --zoops

Result:
terminate called after throwing an instance of 'std::length_error'
what(): vector::_M_default_append

Backtrace with gdb:
#10 Motif::initFromPWM (this=0x51546f0, PWM=PWM@entry=0x66b680, asize=4, count=) at /home/mmeier/git/bamm-private/src/bamm/Motif.cpp:255

when printing LW1:
LW1 = -7

There seems to be a very short sequence in the dataset:

chr8:93080412-93080414
AC

Can you catch this?

build failure

current build failure on my mac

/Users/ch/repo/bamm-private/src/shared/SequenceSet.cpp:357:50: error: cannot take the address of
 an rvalue of type 'std::__1::basic_ostringstream<char, std::__1::char_traits<char>, std::__1::a
llocator<char> >'
                                                header = static_cast<std::ostringstream*>( &( st
d::ostringstream() << ( N+1 ) ) )->str();
                                                                                           ^  ~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

http://pastebin.com/AfXfdZwH

soedinglab / bammmotif2 Goto Github PK

bammmotif2's People

Contributors

Stargazers

Watchers

Forkers

bammmotif2's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs