GithubHelp home page GithubHelp logo

costalab / reg-gen Goto Github PK

View Code? Open in Web Editor NEW
100.0 12.0 30.0 81.85 MB

Regulatory Genomics Toolbox: Python library and set of tools for the integrative analysis of high throughput regulatory genomics data.

Home Page: https://reg-gen.readthedocs.io/

License: Other

Python 96.57% CSS 0.67% Makefile 0.01% C 1.65% Shell 0.82% Awk 0.27% R 0.02%
bioinformatics genomics differential-peak-calling footprinting ngs dnase-seq chip-seq atac-seq motif motif-analysis

reg-gen's Introduction

Stars PyPI PyPIDownloads Docs

RGT - Regulatory Genomics Toolbox

RGT is an open source Python 3.6+ library for analysis of regulatory genomics. RGT is programmed in an oriented object fashion and its core classes provide functionality for handling regulatory genomics data.

The toolbox is made of a core library and several tools:

  • HINT: ATAC-seq/DNase-seq footprinting method
  • THOR: ChIP-Seq differential peak caller
  • Motif Analysis: TBFS match and enrichment
  • RGT-Viz: Visualization tool
  • TDF: DNA/RNA triplex domain finder

See https://reg-gen.readthedocs.io for documentation and tutorials.

Installation with conda

We recommend using conda to manage the python environment to avoid issues.

You can install conda from here

Once you successfully installed conda, first create a specific environment:

conda create -n rgt python=3.9

Then activate your environment and install the full RGT suite with all other dependencies:

conda activate rgt
pip install RGT

Detailed installation instructions and basic problem solving can be found on our website.

Please also consider citing our main paper if you used any sub-tools from RGT:

@article{li2023rgt,
  title={RGT: a toolbox for the integrative analysis of high throughput regulatory genomics data},
  author={Li, Zhijian and Kuo, Chao-Chung and Ticconi, Fabio and Shaigan, Mina and Gehrmann, Julia and Gusmao, Eduardo Gade and Allhoff, Manuel and Manolov, Martin and Zenke, Martin and Costa, Ivan G},
  journal={BMC bioinformatics},
  volume={24},
  number={1},
  pages={1--12},
  year={2023},
  publisher={BioMed Central}
}

reg-gen's People

Contributors

chaochungkuo avatar computerscienceiscool avatar dependabot[bot] avatar eggduzao avatar fabio-t avatar igcf avatar juliageh avatar kfding avatar lzj1769 avatar manuelallh avatar minashaigan avatar ronghui1992 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

reg-gen's Issues

MD5 Checksum support

We should validate the downloaded file by MD5 checksum, to make sure the files are downloaded completely.

THOR needs new testing data

The current test does not use inputs, and thus it doesn't test as much as it could do. We need to replace it (or better, just add another one).

rgt-hint differential error

Hi,

I am running RGT v0.11.1 on Python 2.7.14 in an Ubuntu 16.04 OS. I was able to run the "rgt-motifanalysis matching" command but when I run the "rgt-hint differential" I received the following error "TypeError: coercing to Unicode: need string or buffer, NoneType found". Do you have any suggestions to why this is occurring?

Thanks,
Michael

Improve motif matching usability

SetupLogoData.py: it shouldn't be needed at all, but we can keep it and document it in the website

Then, motif logos should be produced on the fly when doing motif matching (configurable with an option), and then stored in rgtdata. In this way, it won't happen that a result file is produced with "logo missing" artifacts - and after the first time it ran, the logo will be re-used.

Fix MacOS support

It's almost done, but there's still some weirdness. Must try it in a few more macs.

Refactor GenomicRegionSet IO handling

IO read/write functions should be separated from the actual GRS. This will mean extracting all read_bed, write_bed etc functions and putting them into Format classes that will take a GRS as input and populate it or write it to file as needed.

The following basic classes should be developed, at least:

  • BedFormat: it's the current "default" for GRS. They are strongly coupled, and as such it makes harder to export to different formats. This refactoring will solve this problem.

  • BigBedFormat: it's currently only supported in some of the tools, in a "handcrafted" way. We need a more rational approach for this, especially to support further improvements like having a disk-backed GRS, without loading everything in memory. This would reduce a lot the memory footprint of certain tools (eg, motif analysis).

  • Bed12Format: a more complicate "bed-like" format relevant for, I believe, only RGT-Viz.

To leave for later: improve memory footprint of GenomicRegion so that GRS can be much bigger. Also possibly substitute the internal list for a proper array, to make removal O(1).

rgt-hint footprinting error

Hi , when I use rgt-hint footprinting , get some error.

[root@bogon yue]# rgt-hint footprinting --atac-seq --organism=hg38 ./Galaxy396.bam ./Galaxy572.bed --output-prefix=cell
[W::hts_idx_load2] The index file is older than the data file: ./Galaxy396.bai
[W::hts_idx_load2] The index file is older than the data file: ./Galaxy396.bai
divide by zero encountered in log
divide by zero encountered in log
Mean of empty slice.
invalid value encountered in double_scalars
Traceback (most recent call last):
File "/root/miniconda2/bin/rgt-hint", line 11, in
load_entry_point('RGT==0.11.2', 'console_scripts', 'rgt-hint')()
File "build/bdist.linux-x86_64/egg/rgt/HINT/Main.py", line 91, in main
File "build/bdist.linux-x86_64/egg/rgt/HINT/Footprinting.py", line 105, in footprinting_run
File "build/bdist.linux-x86_64/egg/rgt/HINT/Footprinting.py", line 332, in atac_seq
File "build/bdist.linux-x86_64/egg/rgt/HINT/signalProcessing.py", line 168, in get_signal_atac
File "build/bdist.linux-x86_64/egg/rgt/HINT/signalProcessing.py", line 345, in bias_correction_atac
File "pysam/libcalignmentfile.pyx", line 793, in pysam.libcalignmentfile.AlignmentFile.fetch
File "pysam/libchtslib.pyx", line 650, in pysam.libchtslib.HTSFile.parse_region
ValueError: start out of range (-25)

Any suggestion will be appreciated

Specify motif database from command line

Add option to specify motif databases (not by name, but by path - useful for experiment-specific databases). When used, the default DBs are completely ignored.

The common use case is for motifs generated via MEME from ChiPSeq peaks. We usually get a handful of those and must test them on some big BED input file.

NB: FDR thresholds must be recalculated on the fly and this must be presented to the user

HINT deprecation warning

@lzj1769 there's an annoying deprecation warning in HINT that produces a LOT of output on the test data:

Function log_multivariate_normal_density is deprecated; The function log_multivariate_normal_density is deprecated in 0.18 and will be removed in 0.20.

Can you look into fixing this for the next release? There's no urgency, but it would make sense to find a proper alternative well before version 0.20.

HINT import issue

Installed RGT with pip in a virtual environment running python 3.6.1

$ rgt-hint footprinting
Traceback (most recent call last):
File "/home/x_andtj/.virtualenvs/atac-seq/bin/rgt-hint", line 7, in
from rgt.HINT.Main import main
File "/home/x_andtj/.virtualenvs/atac-seq/lib/python3.6/site-packages/rgt/HINT/Main.py", line 5, in
from Training import training_args, training_run
ModuleNotFoundError: No module named 'Training'

Viz and TDF have no version option

@jovesus could you add a --version option that uses the global __version__.py file to grab the current RGT version?

See motif analysis or HINT for how to do it in the same way we do.

more python3 syntax issues

Apparently, python3 doesn't like current directory include statements, so the #internal include statement things like
from GenomicRegion import GenomicRegion
should be
from .GenomicRegion import GenomicRegion

This has shown up for me in
GenomicRegionSet.py, GeneSet.py, helper.py, HINT/Main.py, and HINT/signalProcessing.py

Python 3 Conversion

I've encountered a few typos with setup.py when running with python 3.5. When I tried this:python setup.py install --prefix=/local/tools/HINT-1.1.1 --rgt-data-path=/local/tools/HINT-1.1.1 --rgt-tool=hint

I got the following error:

  File "setup.py", line 132
    except (BadOptionError,AmbiguousOptionError), e:

which I corrected by changing line 132 to:

except (BadOptionError,AmbiguousOptionError) as e:

then I got the following error:

  File "setup.py", line 398
    default_file_permission = 0644
                                 ^
SyntaxError: invalid token

which I corrected by changing line 398 and 299 to:

default_file_permission = 0o644
default_path_permission = 0o755

(apparently python3 doesn't like numbers beginning with 0. Octal numbers start with 0o)

then, the setup works. But then I get some errors when I attempt to run rgt-hint... I'll get to those in another issue.

Improve README and misc

The README and all .txt files with description should be cleaned up, converted to md files and formatted.

All references to the "automatic export from google site" should be removed.

@ManuelAllh are you happy for me to proceed with this?

Test Pysam 0.12.0 with RGT

Looks like pysam 0.12.0 is out, and they also seem to have fixed MacOS support (good news for #17).

However, there seem to be some API changes. We are going to have to test all RGT tools before we modify the dependency requirement in setup.py.

THOR- Issue in regenerating report

Below is the error message I got when I set --report in the command.

I will also check it later.

Call DPs on whole genome.
Computing read extension sizes for ChIP-seq profiles
Compute GC-content
Compute factors
Normalize input of Signal 0, Rep 0 with factor 0.767
Normalize input of Signal 0, Rep 1 with factor 0.767
Normalize input of Signal 1, Rep 0 with factor 0.769
Normalize input of Signal 1, Rep 1 with factor 0.769
Use global TMM approach 
Traceback (most recent call last):
  File "/work/ck687463/bin/rgt-THOR", line 11, in <module>
    load_entry_point('RGT==0.10.0', 'console_scripts', 'rgt-THOR')()
  File "/rwthfs/rz/cluster/work/ck687463/lib/python2.7/site-packages/RGT-0.10.0-py2.7.egg/rgt/THOR/THOR.py", line 157, in main
    chrom_sizes, dims, inputs, tracker)
  File "/rwthfs/rz/cluster/work/ck687463/lib/python2.7/site-packages/RGT-0.10.0-py2.7.egg/rgt/THOR/THOR.py", line 81, in train_HMM
    report=options.report, poisson=options.poisson)
  File "/rwthfs/rz/cluster/work/ck687463/lib/python2.7/site-packages/RGT-0.10.0-py2.7.egg/rgt/THOR/dpc_help.py", line 197, in _fit_mean_var_distr
    _plot_func(plot_data, outputdir)
  File "/rwthfs/rz/cluster/work/ck687463/lib/python2.7/site-packages/RGT-0.10.0-py2.7.egg/rgt/THOR/dpc_help.py", line 108, in _plot_func
    maxs.append(max(tmp[tmp < np.percentile(tmp, 90)]))
  File "/rwthfs/rz/cluster/work/ck687463/lib/python2.7/site-packages/numpy/lib/function_base.py", line 4116, in percentile
    interpolation=interpolation)
  File "/rwthfs/rz/cluster/work/ck687463/lib/python2.7/site-packages/numpy/lib/function_base.py", line 3858, in _ureduce
    r = func(a, **kwargs)
  File "/rwthfs/rz/cluster/work/ck687463/lib/python2.7/site-packages/numpy/lib/function_base.py", line 4233, in _percentile
    x1 = take(ap, indices_below, axis=axis) * weights_below
  File "/rwthfs/rz/cluster/work/ck687463/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 134, in take
    return _wrapfunc(a, 'take', indices, axis=axis, out=out, mode=mode)
  File "/rwthfs/rz/cluster/work/ck687463/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 57, in _wrapfunc
    return getattr(obj, method)(*args, **kwds)
IndexError: cannot do a non-empty take from an empty axes.

THOR DeprecationWarning using example data

rgt-THOR failed to create output files when running its example data as described at:
http://www.regulatory-genomics.org/thor-2/basic-intrstruction/

See details below:

Installed Python 2.7.13 then:

$ pip install cython numpy scipy
$ pip install RGT
$ pip list
biopython (1.70)
cycler (0.10.0)
Cython (0.26.1)
fisher (0.1.4)
functools32 (3.2.3.post2)
hmmlearn (0.2.0)
HTSeq (0.9.1)
matplotlib (2.0.2)
matplotlib-venn (0.11.5)
mpmath (0.19)
natsort (5.1.0)
ngslib (1.1.20)
numpy (1.13.1)
pip (9.0.1)
pyBigWig (0.3.4)
pyparsing (2.2.0)
pysam (0.11.1)
python-dateutil (2.6.1)
pytz (2017.2)
PyVCF (0.6.8)
PyX (0.12.1)
RGT (0.10.0)
scikit-learn (0.19.0)
scipy (0.19.1)
setuptools (36.3.0)
six (1.10.0)
subprocess32 (3.2.7)
wheel (0.29.0)

$ wget http://www.regulatory-genomics.org/wp-content/uploads/2015/07/THOR_example_data.tar.gz
$ tar xzf THOR_example_data.tar.gz 
$ cd THOR_example_data

$ cat THOR.config
#rep1
FL5_H3K27ac.100k.bam
FL8_H3K27ac.100k.bam
#rep2
CC4_H3K27ac.100k.bam
CC5_H3K27ac.100k.bam
#chrom_sizes
hg19.chrom.sizes

$ rgt-THOR ./THOR.config
Warning: Do not compute GC-content, as there is no input file
Warning: Do not compute GC-content, as there is no genome file
Call DPs on whole genome.
Computing read extension sizes for ChIP-seq profiles
Use global TMM approach
Compute HMM's training set
Train HMM
Traceback (most recent call last):
  File "/group/bioinfo/apps/apps/RGT-0.10.0/bin/rgt-THOR", line 11, in <module>
    sys.exit(main())
  File "/group/bioinfo/apps/apps/RGT-0.10.0/lib/python2.7/site-packages/rgt/THOR/THOR.py", line 157, in main
    chrom_sizes, dims, inputs, tracker)
  File "/group/bioinfo/apps/apps/RGT-0.10.0/lib/python2.7/site-packages/rgt/THOR/THOR.py", line 92, in train_HMM
    m.fit([training_set_obs], options.hmm_free_para)
  File "/group/bioinfo/apps/apps/RGT-0.10.0/lib/python2.7/site-packages/rgt/THOR/neg_bin_rep_hmm.py", line 146, in fit
    posteriors = np.exp(gamma.T - logsumexp(gamma, axis=1)).T
  File "/group/bioinfo/apps/apps/RGT-0.10.0/lib/python2.7/site-packages/sklearn/utils/deprecation.py", line 75, in wrapped
    warnings.warn(msg, category=DeprecationWarning)
DeprecationWarning: Function logsumexp is deprecated; sklearn.utils.extmath.logsumexp was deprecated in version 0.19 and will be removed in 0.21. Use scipy.misc.logsumexp instead.

Error while downloading hg38

python setupGenomicData.py --hg38

Traceback (most recent call last):
  File "setupGenomicData.py", line 182, in <module>
    if(options.hg38_gtf_path):
AttributeError: Values instance has no attribute 'hg38_gtf_path'

Use a standard config file format

rgt-THOR failed with a non informative error message when an empty line was left at the bottom of the config file.
To avoid similar issues, it might be a good idea to use a standardized config format, such as yaml, in the place of a plain text file it is now in use

Error message was:
... File ~/.local/lib/python2.7/site-packages/rgt/Util.py", line 20, in npath return os.path.abspath(os.path.expanduser(filename)) if not path.startswith('~'): AttributeError: 'list' object has no attribute 'startswith'

python3 issues with lib/python3.5/site-packages/RGT-1.1.1-py3.5.egg/rgt/Util.py

After a fresh install, attempting to run rgt-hint, I get this traceback:

Traceback (most recent call last):
  File "./bin/rgt-hint", line 11, in <module>
    load_entry_point('RGT==1.1.1', 'console_scripts', 'rgt-hint')()
  File "/local/Ben/anaconda3/lib/python3.5/site-packages/setuptools-27.2.0-py3.5.egg/pkg_resources/__init__.py", line 565, in load_entry_point
  File "/local/Ben/anaconda3/lib/python3.5/site-packages/setuptools-27.2.0-py3.5.egg/pkg_resources/__init__.py", line 2598, in load_entry_point
  File "/local/Ben/anaconda3/lib/python3.5/site-packages/setuptools-27.2.0-py3.5.egg/pkg_resources/__init__.py", line 2258, in load
  File "/local/Ben/anaconda3/lib/python3.5/site-packages/setuptools-27.2.0-py3.5.egg/pkg_resources/__init__.py", line 2264, in resolve
  File "/local/tools/HINT-1.1.1/lib/python3.5/site-packages/RGT-1.1.1-py3.5.egg/rgt/HINT/Main.py", line 15, in <module>
    from .. Util import PassThroughOptionParser, ErrorHandler, HmmData, GenomeData, OverlapType
  File "/local/tools/HINT-1.1.1/lib/python3.5/site-packages/RGT-1.1.1-py3.5.egg/rgt/Util.py", line 287
    except (BadOptionError,AmbiguousOptionError), e:
                                                ^
SyntaxError: invalid syntax

changing line 287 of site-packages/RGT-1.1.1-py3.5.egg/rgt/Util.py to this fixes it:

except (BadOptionError, AmbiguousOptionError) as e:

Then I get a different error -

  File "/local/tools/HINT-1.1.1/lib/python3.5/site-packages/RGT-1.1.1-py3.5.egg/rgt/Util.py", line 381
    except KeyError, IndexError:
                   ^
SyntaxError: invalid syntax

so lines 381 and 409 should be:

except (KeyError, IndexError):

With those fixes, I now get an ImportError:

site-packages/RGT-1.1.1-py3.5.egg/rgt/Util.py", line 12, in <module>
    import ConfigParser
ImportError: No module named 'ConfigParser'

instead, line 12 should be

import configparser

"rgt-motifanalysis matching" Illegal Instruction Error

Hi,

I am trying to run the "rgt-motifanalysis matching" command and I keep getting an Illegal instruction error followed by a core dump. I tried this on the example data provided and I still am receiving the same error. Do you have any guidance on how to resolve this?

Thanks,
Michael

[mdurante@pegasus RGT_MotifAnalysis_FullSiteTest]$ rgt-motifanalysis matching input/regions_K562.bed input/background.bed
>> genome: hg19
>> motif repositories: [u'hocomoco']
Illegal instruction (core dumped)

p-value in BED output files

Hi,
I just checked some of the output BED files (THOR 0.9.8) and realized that the p-value column contains several times the following number:

15483:12140:16917;25102:20718:22467;9223372036854775807
14791:11131:14713;24211:20379:20588;9223372036854775807
26633:25513:31360;51481:39765:42614;9223372036854775807

9223372036854775807

The number is so popular that it has its own Wikipedia page...

2**631

I presume there is a numerical/floating point issue/sys.maxsize default or the like at the core of this.

+Peter

gcc compile version of libtriplexator.so

Hello, I download the new version of reg-gen.
But my gcc is 4.4.6.
Then, I caught the error as follows:

libtriplexator.so: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.15' not found

And the `GLIBCXX_3.4.15' is coresponding to gcc 4.6.0.
So, could you compile libtriplexator.so with lower gcc version?

Thanks!

rgt-hint fails without arguments

@lzj1769, if you run rgt-hint it crashes on arguments[0]. It should instead check the number of arguments, and if none is passed it should print the help string, like the other tool.

rgt-hint differential "GPL Ghostscript 8.70: Can't find initialization file gs_init.ps."

Hi,

After running rgt-hint differential I noticed that no output was produced even though the job looks as if it had completed successfully. When looking at the error output file I see the following error:

GPL Ghostscript 8.70: Can't find initialization file gs_init.ps.
GPL Ghostscript 8.70: Can't find initialization file gs_init.ps.
GPL Ghostscript 8.70: Can't find initialization file gs_init.ps.
GPL Ghostscript 8.70: Can't find initialization file gs_init.ps.

The error file contained multiple lines with this error. Any suggestions on how to resolve this?

Thanks,
Michael

Get rgtdata folder path from environment variable

Right now, we store a file inside the actual installation directory of rgt. This file contains the rgtdata folder path, defaulting to the home directory.

This is very brittle, and also requires a custom setup.py which causes further problems with pip (when using global options).

A further, big problem of this approach is the following: if RGT is installed on a shared computer (eg, cluster/server), then only one shared rgtdata folder is allowed for all users.

A better approach is to check an environment variable RGTDATA. If not set, default to home directory. If set, use that one.

rgt-filterVCF

What steps will reproduce the problem?
1. rgt-filterVCF path/to/sample.conf --t-mq 30 --t-dp 40 --list-WT 
path/to/file_wt.config 
2.
3.

What is the expected output? What do you see instead?
-running programm

mbegemann@UK2011001653:~$ rgt-filterVCF 
/home/mbegemann/am_pipeline/reg-gen-read-only/test/sample.conf --t-mq 30 --t-dp 
40 --list-WT /home/mbegemann/am_pipeline/reg-gen-read-only/test/file_wt.config  
Traceback (most recent call last):
  File "/usr/local/bin/rgt-filterVCF", line 9, in <module>
    load_entry_point('RGT==0.0.1', 'console_scripts', 'rgt-filterVCF')()
  File "/usr/local/lib/python2.7/dist-packages/RGT-0.0.1-py2.7.egg/rgt/filterVCF/filterVCF.py", line 151, in main
    sample_data = load_data(vcf_list)
  File "/usr/local/lib/python2.7/dist-packages/RGT-0.0.1-py2.7.egg/rgt/filterVCF/filterVCF.py", line 36, in load_data
    name, path = tmp[0], tmp[1]
IndexError: list index out of range

What version of the product are you using? On what operating system?


Hallo Manuel,
das kommt bei mir raus wenn ich versuche das Programm zu starten. Ich bin mir 
noch nicht sicher ob ich einen synthax-fehler begehe. 
welche Optionen sin zwingend und welche optional? habe ich eine motifs.bed oder 
läd der die automatisch?

Gruß,
Matthias

Original issue reported on code.google.com by [email protected] on 1 Dec 2014 at 1:51

THOR fails: numpy float to integer

Hi,
I am running THOR in an environment with Python 2.7.13 and numpy 1.12.1. THOR version 0.1 as downloaded from the website. The run fails due to the following:

  File "[...]/conda/envs/thor/lib/python2.7/site-packages/RGT-0.0.1-py2.7.egg/rgt/THOR/dpc_help.py", line 198, in _fit_mean_var_distr
    _plot_func(plot_data, outputdir)
  File "[...]/conda/envs/thor/lib/python2.7/site-packages/RGT-0.0.1-py2.7.egg/rgt/THOR/dpc_help.py", line 107, in _plot_func
    x = linspace(0, max(plot_data[i][0]), max(plot_data[i][0])+1)
  File "[...]/conda/envs/thor/lib/python2.7/site-packages/numpy/core/function_base.py", line 101, in linspace
    num = _index_deprecate(num)
  File "[...]/conda/envs/thor/lib/python2.7/site-packages/numpy/core/function_base.py", line 21, in _index_deprecate
    warnings.warn(msg, DeprecationWarning, stacklevel=stacklevel)
DeprecationWarning: object of type <type 'numpy.float64'> cannot be safely interpreted as an integer.

Best,
Peter

Release concept: to discuss and improve

The "releases" here are sub-project specific. However, we then use a link to Google Drive for the global "rgt" release.

We should instead provide a single RGT release here on github, including fixes for all subprojects. Users should stay up-to-date even if they use only, say "Odin", because we don't when they might want to try another of the tools.

Also they might benefit from bug fixes at the RGT top level: otherwise, any time we make a change at the top level we have to release a fix for ALL the tools. It's not scalable.

rgt_THOR failing

THOR is run on 6 histone datasets with two conditions, two replicates and Input. Of the 6 histone datasets, 5 complete successfully and one fails with the following stack trace:

Call DPs on whole genome.
Computing read extension sizes for ChIP-seq profiles
Compute factors
Normalize input of Signal 0, Rep 0 with factor 0.103
Normalize input of Signal 0, Rep 1 with factor 0.249
Normalize input of Signal 1, Rep 0 with factor 0.666
Normalize input of Signal 1, Rep 1 with factor 0.782
Use global TMM approach 
Compute HMM's training set
Traceback (most recent call last):
  File "..../.local/bin/rgt-THOR", line 11, in <module>
    load_entry_point('RGT==0.9.9', 'console_scripts', 'rgt-THOR')()
  File ".../.local/lib/python2.7/site-packages/rgt/THOR/THOR.py", line 157, in main
    chrom_sizes, dims, inputs, tracker)
  File "..../.local/lib/python2.7/site-packages/rgt/THOR/THOR.py", line 87, in train_HMM
    init_alpha, init_mu = get_init_parameters(s0, s1, s2)
  File "..../.local/lib/python2.7/site-packages/rgt/THOR/neg_bin_rep_hmm.py", line 64, in get_init_parameters
    alpha = (var - mu) / np.square(mu)
RuntimeWarning: invalid value encountered in divide

Fix deprecation warning

/home/fabio/.local/lib/python2.7/site-packages/rgt/Util.py:44: DeprecationWarning: You passed a bytestring as `filenames`. This will not work on Python 3. Use `cp.read_file()` or switch to using Unicode strings across the board.
  self.config.read(data_config_file_name)
/home/fabio/.local/lib/python2.7/site-packages/rgt/Util.py:47: DeprecationWarning: You passed a bytestring as `filenames`. This will not work on Python 3. Use `cp.read_file()` or switch to using Unicode strings across the board.
  self.config.read(data_config_file_name + ".user")

Motif Analysis: add metadata and make new subtool

  • write script to get all available metadata from jaspar, hocomoco and uniprobe DBs

  • merge the additional fields into the MTF files (must modify createMtf and the annotation class)

  • write new subtool, mapper, to get all motifs from a set of gene names or viceversa (must allow wildcards). Also allow more complex filtering, eg by family

  • (Maybe) MTF and FPR files should be joined - or we should remove fpr completely, to keep it clean

rgt-ODIN: Only 2 files are generated: *-setup.info and *-diffpeaks.bed

What steps will reproduce the problem?


$ rgt-ODIN \
>     --merge \
>     --input-1=../../bam/input_ambion_H3K27ac.minMQ4.sorted.bam \
>     --input-2=../../bam/input_scrambled_H3K27ac.minMQ4.sorted.bam \
>     ../../bam/ChIP_ambion_H3K27ac.minMQ4.sorted.bam \
>     ../../bam/ChIP_scrambled_H3K27ac.minMQ4.sorted.bam \
>     /resources/human/hg19/Homo_sapiens_assembly19_sorted.fa \
>     /resources/human/hg19/hg19.chrom.sizes
Computing read extension sizes...
Read extension for first file: 65
Read extension for second file: 65
Read extension for first input file: 0
Read extension for second input file: 0
Loading reads...
Computing GC content
line 9025093 of /tmp/tmpXkOZ5R: chromosome chr17 has 81195210 bases, but item 
ends at 81195224
line 11590445 of /tmp/tmpUyqdPx: chromosome chr17 has 81195210 bases, but item 
ends at 81195224
Normalizing...
Warning: chrM not found, do not consider
Warning: chr22 not found, do not consider
Warning: chr20 not found, do not consider
Warning: chr21 not found, do not consider
Warning: chrM not found, do not consider
Warning: chr22 not found, do not consider
Warning: chr20 not found, do not consider
Warning: chr21 not found, do not consider
Warning: chrM not found, do not consider
Warning: chr22 not found, do not consider
Warning: chr20 not found, do not consider
Warning: chr21 not found, do not consider
Warning: chrM not found, do not consider
Warning: chr22 not found, do not consider
Warning: chr20 not found, do not consider
Warning: chr21 not found, do not consider
Normalize input with factor 0.362454886804 and 0.38504811665
Normalize file 2 by signal with estimated factor 1.46909181481: 
line 7502798 of /tmp/tmpORJtpH: chromosome chr17 has 81195210 bases, but item 
ends at 81195224
line 9032212 of /tmp/tmp89TAbY: chromosome chr17 has 81195210 bases, but item 
ends at 81195224
done
Number of regions to be considered by the HMM: 0
Number of regions with putative differential peaks: 16828474
Compute training set...
Training HMM...
Use binomial HMM
...done
Computing HMM's posterior probabilities and Viterbi path
...done
Number of Peaks where p-value is not calculated:  3103


What is the expected output? What do you see instead?

  I only get 2 output files (no bigWig files) instead of 4:
    - exp-ChIP_ambion_H3K27ac.minMQ4.sorted-ChIP_scrambled_H3K27ac.minMQ4.sorted-diffpeaks.bed
    - exp-ChIP_ambion_H3K27ac.minMQ4.sorted-ChIP_scrambled_H3K27ac.minMQ4.sorted-setup.info

What version of the product are you using? On what operating system?

  ODIN-0.1alpha, Ubuntu 12.04 LTS 64-bit



Please provide any additional information below.

  Any idea why I have this error:

    Computing GC content
    line 9025093 of /tmp/tmpXkOZ5R: chromosome chr17 has 81195210 bases, but item ends at 81195224
    line 11590445 of /tmp/tmpUyqdPx: chromosome chr17 has 81195210 bases, but item ends at 81195224

  Is this because of the read extension in the previous step?


  Any idea why the chromosome not found warnings happen?

    Normalizing...
    Warning: chrM not found, do not consider
    Warning: chr22 not found, do not consider
    Warning: chr20 not found, do not consider
    Warning: chr21 not found, do not consider

  I checked the BAM files and those chromosomes have mapped reads.


Original issue reported on code.google.com by [email protected] on 14 Nov 2014 at 10:08

Motif Analysis must be more verbose

Right now, Motif Analysis doesn't give you any information on progress. Very simply, we should always print at least the following information:

  • which motif databases are being used
  • stats from the input files (how many regions in each BED file?)
  • which file currently being tested for matching (also with something like (5/10) to show some sort of progress)
  • how long it took to do each file (that gives the user an idea of how long the next files are going to take, by looking at their size)
  • must finally replace the old option parser with the great argparse

SetupGenomicData can fail if download interrupted

To be better characterised, but in short: if one downloads a genome, eg --mm10 and it gets interrupted before the end, the next downloads should overwrite the partially-downloaded gz archive instead of using the ".2", ".3" suffixes.

Too many figures are opened.

The below message is presented when running rgt-hint differential:
More than 20 figures have been opened. Figures created through the pyplot interface (matplotlib.pyplot.figure) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam figure.max_open_warning).

Hint tutorial commands

I get the following errors when attempting the commands in the tutorial:

Program: rgt-hint.
Report: The experimental matrix could not be loaded. Check if it is correctly formatted and that your python version is >= 2.7.
Behaviour: The program will quit with exit status 0.
--------------------------------------------------
Traceback (most recent call last):
  File "build/bdist.linux-x86_64/egg/rgt/HINT/Main.py", line 322, in main
    exp_matrix.read(input_matrix)
  File "build/bdist.linux-x86_64/egg/rgt/ExperimentalMatrix.py", line 66, in read
    f = open(file_path,'rU')
IOError: [Errno 2] No such file or directory: 'estimation'```

I believe I have version 10.2 installed correctly, but I do not see any documentation for commands like "rgt-hint estimate".  Can you please advise?

thanks 

Random seed is not fixed for almost all tools

By not setting the seed, every execution of an RGT tool will yield slightly different results. This is obviously bad for reproducibility.

The simplest solution is to put the following code in the "main" module of each tool (except Motif Analysis, that already has it):

from random import seed
seed(42)

A further improvement is to provide an option for the user to set its own seed, but I don't think we need that right now.

rgt-ODIN: Check if wigToBigWig exists at the beginning of the analysis

What steps will reproduce the problem?

$ rgt-ODIN \
>     --merge \
>     --input-1=../bam/input_ambion_H3K27ac.minMQ4.sorted.bam \
>     --input-2=../bam/input_scrambled_H3K27ac.minMQ4.sorted.bam \
>     ../bam/ChIP_ambion_H3K27ac.minMQ4.sorted.bam \
>     ../bam/ChIP_scrambled_H3K27ac.minMQ4.sorted.bam \
>     /resources/human/hg19/Homo_sapiens_assembly19_sorted.fa \
>     /resources/human/hg19/hg19.chrom.sizes
Computing read extension sizes...
Read extension for first file: 65
Read extension for second file: 65
Read extension for first input file: 0
Read extension for second input file: 0
Loading reads...
Computing GC content
sh: 1: wigToBigWig: not found
sh: 1: wigToBigWig: not found
Normalizing...
Warning: chrM not found, do not consider
Warning: chr22 not found, do not consider
Warning: chr20 not found, do not consider
Warning: chr21 not found, do not consider


What is the expected output? What do you see instead?

  It would be nice that rgt-ODIN checks at the beginning of the
  script if wigToBigWig is in the $PATH, instead of failing to
  find it when it needs it after rgt-ODIN is already running for
  quite some time.

What version of the product are you using? On what operating system?

  ODIN-0.1alpha, Ubuntu 12.04 LTS 64-bit


Original issue reported on code.google.com by [email protected] on 14 Nov 2014 at 10:16

Update Motif databases

We can download a single file with all PWMs from at least hocomoco and jaspar, while uniprobe only provides an archive.

@lzj1769 for reference

THOR 0.9.7 fail QXcbConnection

Hi,
I am running THOR 0.9.7 in batch mode on a compute cluster (=headless servers) and my runs fail with the following message:

QXcbConnection: Could not connect to display

Is that coming from the parameter --report, i.e., does it try to open the generated HTML report?
+Peter

Use rel. paths in HTML report

Hi,
I have a suggestion for improvement - apparently, the HTML report uses full paths to link to the image files, which makes sharing the report a bit complicated (i.e., in our setup, the cluster file system is isolated from the rest, so the report files have to copied somewhere else).
Best,
Peter

Test case does not work

What steps will reproduce the problem?
1. Download the source code
2. Install
3. Run the test code

What is the expected output? What do you see instead?
I get a Python file read error.

What version of the product are you using? On what operating system?
Python 2.7

Please provide any additional information below.
Your test case is not working. It reads the InputMatrix.txt file as if the 
first line contains a file. However, the first line is only a header line. I 
had to fix this problem by deleting the header line in the InputMatrix.txt file.

Original issue reported on code.google.com by [email protected] on 5 Aug 2014 at 6:51

lack of mm10 annotation files

Hello, I want to predict binding sites of lncRNA with mm10 genome.
And it is lack of mm10 annotation files in this package.
So, could you attach those files in new version?
Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.