GithubHelp home page GithubHelp logo

jpiper / pydnase Goto Github PK

View Code? Open in Web Editor NEW
37.0 7.0 24.0 1.01 MB

Python module for the easy handling and analysis of DNase-seq data

Home Page: http://jpiper.github.io/pyDNase

License: MIT License

Python 86.75% C 13.25%

pydnase's Introduction

pyDNase - a library for analyzing DNase-seq data

https://travis-ci.org/jpiper/pyDNase.svg?branch=master https://coveralls.io/repos/jpiper/pyDNase/badge.svg?branch=master&service=github

Introduction

pyDNase is a suite of tools for analysing DNase-seq data - pyDNase comes with several analysis scripts covering several common use cases of DNase-seq analysis, and also an implementation of the Wellington, Wellington 1D, and Wellington-boostrap footprinting algorithms.

An easy-to-understand DNase-seq footprinting tutorial can be found here and full documentation can be accessed here

API

Many people currently analyzing DNase-seq data are using tools designed for ChIP-seq work, but may be inappropriate for DNase-seq data where one is less interested in the overlaps of sequenced fragments, but the site at which the cut occurs (the 5' most end of the aligned sequence fragment).

pyDNase has an underlying API to interface with a sorted and indexed BAM file from a DNase-seq experiment, allowing efficient and easy random access of DNase-seq cut data from any genomic location, e.g.

>>> import pyDNase
>>> reads = pyDNase.BAMHandler(pyDNase.example_reads())
>>> reads["chr6,170863500,170863532,+"]
{'+': [0,0,0,1,0,0,1,1,2,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,1,1,0,0,0,1],
 '-': [0,10,1,0,1,0,4,9,0,1,0,2,1,0,0,0,0,0,3,0,6,3,0,0,0,1,1,1,3,0,3,6]}

Querying the BAMHandler object returns a dictionary containing lists with DNase cut counts on the positive reference strand (+), and cuts on the negative reference strand (-). pyDNase efficiently caches the cut data queried, so that multiple requests from the same genomic locations do not require repeated lookups from the BAM file (this can be disabled). See the full documentation for full details.

Installation

to install pyDNase, run:

$ pip install pyDNase

for full documentation go to: http://pythonhosted.org/pyDNase/

Support

If you're having any troubles, please send an email to [email protected] and I'll do my best to help you out. If you notice any bugs, then please raise an issue over at the github repo. If you require more formal training on the analysis of DNase-seq or ATAC-seq data, I am available for consultancy. Likewise, if you are a commercial entity looking for a support contract, please get in touch.

Contributions

I highly encourage contributions! This is my first software development project - send any pull requests this way. I'm particularly interested in cool analysis scripts that anyone has written.

Reference

Note

If you use pyDNase or the Wellington algorithm in your work, please cite the following papers.

Piper et al. 2013. Wellington: A novel method for the accurate identification of digital genomic footprints from DNase-seq data, Nucleic Acids Research 2013; doi: 10.1093/nar/gkt850

Piper et al. 2015. Wellington-bootstrap: differential DNase-seq footprinting identifies cell-type determining transcription factors, BMC Genomics 2015; doi:10.1186/s12864-015-2081-4

License

Copyright (C) 2015 Jason Piper. This work is licensed under the MIT license, see LICENCE.TXT for details. If you require the use of this software under a difference license, please email me at [email protected].

pydnase's People

Contributors

ajank avatar brisk022 avatar dpryan79 avatar jpiper avatar jshoyer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

pydnase's Issues

Fix pvalue cutoffs in wellington_footprints.py

$ wellington_footprints.py -fdr 0.01 -pv -5,-10,-20,-30,-50,-100 -o footprints_DHS_macs_"$chr" DHS_macs_"$chr".bed dnaseI_UT.sorted."$chr".bam results_FDR_1e-2_"$chr"

wellington_footprints.py: error: argument -pv/--pv_cutoffs: expected one argument

I tried with single and double quotes, saving -5,-10,-20,-30,-50,-100 to a variable, but the error persists.

example_footprint_scores.py

Hi.
I can't seem to get the example_footprint_scores.py script running properly. I'm running Linux Mint and have passed the gcc --verison check. The error message I get is:
from: can't read /var/mail/pyDNase.footprinting
/home/saideep/anaconda3/bin/example_footprint_scores.py: line 6: syntax error near unexpected token (' /home/saideep/anaconda3/bin/example_footprint_scores.py: line 6: reads = pyDNase.BAMHandler(pyDNase.example_reads())'

Upon running I get a "crosshairs" cursor and when I click the error message pops up. Hopefully this is resolvable. Thanks!

dnase_wig_tracks.py crushes - python3

Hi, apparently dnase_wig_tracks.py python3 version crushes. Python says the reason is that GenomicInterval objects canot be compared to each other. Indeed I can't find any internal method for such comaprison in pyDNase package. In package for python2 it works however but I can't figure where it's defined. Could you give me a hint on that?
Thx!

>>> import pyDNase
>>> regions = pyDNase.GenomicIntervalSet("some.bed")
Reading BED File...
[################################] 39169/39169 - 00:00:01
>>> for each in [item for sublist in sorted(regions.intervals.values()) for item in sorted(sublist, key=lambda peak: peak.startbp)]:
...     continue
...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unorderable types: GenomicInterval() < GenomicInterval()

Please tag 0.2.6 on github

Could you tag the 0.2.6 release on github? It's a bit easier to track down issues when the versions are tagged.

wellington_footprints syntax error

I apologize if this has been asked, but I haven't seen this same question.

When running wellington_footprints.py I get an error stating:
line 83
print("track type=wiggle=_0", file=wigout)
With a carrot pointing to the = between file and wigout
SyntaxError: invalid syntax

I'm not quite sure how to fix this issue and wanted to see if you have any suggestions.

dnase_bias_estimator.py argument names

Hi Jason,

Thanks for putting together these excellent tools and making them so accessible. I've just downloaded pyDNase. When attempting to run dnase_bias_estimator.py I get a "namespace has no attribute genomesize" error. I believe the argument variable name is stored as "genomesize" (line 89) and then referred to as "genome_size" (line 96).

Best,

Drew

Python version issues(?) running wellington_footprints.py

Running python 3.6.1
running pip install pyDNase
After fixing the print statements to work in python3

I'm running the following code

import pyDNase
regions = pyDNase.GenomicIntervalSet("merged_peaks.bed")
len(regions.intervals.values())
sorted(regions.intervals.values())

where the last line can be found in the file wellington_footprints.py
now this gives an error.

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-5-7d9455173698> in <module>()
----> 1 sorted(regions.intervals.values())

TypeError: '<' not supported between instances of 'GenomicInterval' and 'GenomicInterval'

trying to figure out what is going on here but it seems like Genomic Interval does not work the same as it used to.
How do I fix this?

PS: Is this project dead? The fact that wellington_footprints script is not working and no responses to previous issues hints at this.

IndexError using wellington_footprints.py with Python 2.7.14

I have encountered the following error:

Reading BED File...
[################################] 0/34484 - 00:00:00
Traceback (most recent call last):
  File "/home/s1437643/conda/envs/py27/bin/wellington_footprints.py", line 95, in <module>
    regions = pyDNase.GenomicIntervalSet(clargs.regions)
  File "/home/s1437643/conda/envs/py27/lib/python2.7/site-packages/pyDNase/__init__.py", line 246, in __init__
    self.loadBEDFile(filename)
  File "/home/s1437643/conda/envs/py27/lib/python2.7/site-packages/pyDNase/__init__.py", line 283, in loadBEDFile
    if not self.__isBEDHeader(line):
  File "/home/s1437643/conda/envs/py27/lib/python2.7/site-packages/pyDNase/__init__.py", line 307, in __isBEDHeader
    if string[0] == "#":
IndexError: string index out of range

The command I used was:

wellington_footprints.py -fdr 0.05 -o ESC_Rep0 -p 24 -A <(cut -f 1,2,3 Results/Peaks/IDR/ESC_Rep0/ESC_Rep0_optimal.narrowPeak) Data/Samples/ESC_Rep0/ESC_Rep0_filtered.bam Results/Footprints/pyDNase/ESC_Rep0

I built pyDNase from the latest GitHub code (2017-11-17) using this command:

pip install git+git://github.com/jpiper/pyDNase.git

The version of Python I am using:

$ python --version
Python 2.7.14

Migrate docs

pythonhosted.org is deprecated, I need to move the docs to readthedocs.org

Question about dnase_bias_estimator.py

Hello,
Recently I use pyDNase to deal with my ATAC-seq data. I encounter a question like below,and I used python 3.6 version. I don't know how to handle this. Could you give me a hint on that?
Thx!

Determining transposition sites (roughly 60s per 1E6 reads)...
Traceback (most recent call last):
File "/usr/local/bin/dnase_bias_estimator.py", line 85, in
bed_file_for_6mers = generate_6mer_bed(test_bam, genome_dic(genome))
File "/usr/local/bin/dnase_bias_estimator.py", line 51, in generate_6mer_bed
print("\t".join((str(i) for i in (chrom, startbp, endbp, 0, 0, strand))), file=outfile)
File "/usr/lib/python3.6/tempfile.py", line 624, in func_wrapper
return func(*args, **kwargs)
TypeError: a bytes-like object is required, not 'str'

Error in wellington_footprints.py when reading bed file

Hello,

I get the following error when trying to run wellington_footprints.py:

File "/home/barnekr1/.conda/envs/pydnase/bin/wellington_footprints.py", line 86, in
orderedbychr = [item for sublist in sorted(regions.intervals.values()) for item in sorted(sublist, key=lambda peak: peak.startbp)]
TypeError: '<' not supported between instances of 'GenomicInterval' and 'GenomicInterval'

This is running:
pydnase 0.2.6
python 3.6.2

Installation was done in an anaconda virtual environment with:
conda install -c bioconda pydnase

I am able to successfully run example_footprint_scores.py.

I believe this may be the same issues #35 addressed by @dpryan79
Should this fix work? Can these changes simply be manually made to files or is a fresh installation needed? Any advice is appreciated.

wellington footprints never finishes

Hi!

Whenever I run wellington on my data, I get "waiting on last 80 threads." as the last output

When the process eventually ends, the footprint output file is empty.

any help is appreciated. Thanks!

wellington_bootstrap questions

Hi there,
This is probably a very simple question but I wanted to make sure that I was correctly understanding how to use the wellington_bootstrap.py script.

The script appears to only take two bam files "treatment_bam" and "control_bam". Is the expectation that if I have more than one sample in each group I will merge the bam files? It would be great if there was a way to pass multiple files for each group since the merged files can get very large!
Thanks!
Jean

Better test coverage

Test coverage is pretty bad. This needs to be fixed, otherwise code refactoring become harder.

instructions for disabling -fdrlimit with Wellinton are contradictory, don't agree with source code

The FAQ at https://github.com/jpiper/pyDNase/blob/84ff43412752191b89218565cdab3eb3286ca2cf/docs/faqs.rst say "if you have low read depths you might need to adjust the -fdrlimit parameter to something less stringent like "-10" or "-5", which sets the mimimum amounts of evidence required to support the alternate hypothesis of there being a footprint. You can set this to 0 if you want to disable this feature altogether,"

However, the tutorial at https://github.com/jpiper/pyDNase/blob/84ff43412752191b89218565cdab3eb3286ca2cf/docs/tutorial.rst says "You can set -fdrlimit to -0.01 if you want to disable this feature altogether, "

Meanwhile, the source code at line 33 of https://github.com/jpiper/pyDNase/blob/241657206fdb43929a1c1648940554bae8b95d52/pyDNase/scripts/wellington_footprints.py says "default=-20,type=int)

Since it's an int, that suggests I can't set the limit to -0.01; and I get an error stating it must be less than 0 if I set it to -0...

Problem of BAMHandler and FOS

Hi!
I used pyDNase ver.0.1.7.
I think pyDNase have a few problem.
First, BAMHandler.
Second, How to Calculate FOS.

BAMHandler

along with Introduction, I commanded

import pyDNase
reads = pyDNase.BAMHandler(pyDNase.example_reads())
reads["chr6,170863500,170863532,+"]

then

{'+': array([0,0,0,1,0,0,1,1,2,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,1,1,0,0,0,1]),
 '-': array([0,10,1,0,1,0,4,9,0,1,0,2,1,0,0,0,0,0,3,0,6,3,0,0,0,1,1,1,3,0,3,6])}

I confirmed example.bam, so I think it may be wrong. I think "chr6,170863500,170863532,+" means "BED" format, so I think that "+" array is off the +1bp and "-" array is off the -1bp. I think expected return is this, so can you confirm?

{'+': array([0,0,0,0,1,0,0,1,1,2,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1,0,0,1,1,0,0,0]),
 '-': array([10,1,0,1,0,4,9,0,1,0,2,1,0,0,0,0,0,3,0,6,3,0,0,0,1,1,1,3,0,3,6,0])}

How to Calculate FOS

leftReads   = float(sum(cutArray[:bgsize]))
centreReads = float(sum(cutArray[bgsize:-bgsize]))
rightReads  = float(sum(cutArray[-bgsize:]))
    try:
        return ( (centreReads+1) / leftReads ) + ( (centreReads+1)/rightReads)

This code is function of FOS in pyDNase
I read Neph et al. 2012 (Nature), I think that we should calculate of the average of leftReads, centreReads, and rightReads. but function of FOS don't calculate of the average.
Additionally, they wrote central component (6–40 nucleotides) and flanking component (3–10 nucleotides), so central component is dependent of bed file that we have. so, that is OK, but default of bgsize is 35. I think it is too large. What do you think about that.

alignedread.aend is None sometimes and wellington_footprints.py crashes

Hi,
I have been running into a problem where alignedread.aend returns None occasionally and wellington_footprints.py exists with the following stack trace:

+ wellington_footprints.py -fdrlimit -5 /home/jer15/Dev/Saturn/Data/DNASE/peaks/relaxed/DNASE.K562.relaxed.bed /home/jer15/Dev/Saturn/Data/DNASE/bams/DNASE.K562.biorep2.techrep16.bam /home/jer15/Dev/Saturn/Data/DNASE/bams/footprints/K562/relaxed/2/16/                                           
Reading BED File...                                                                                                                                                                                                                                                                                   
Calculating footprints...                                                                                                                                                                                                                                                                             
Traceback (most recent call last):                                                                                                                                                                                                                                                                    
  File "/home/jer15/Dev/Saturn/python/virtualenv-py2/bin/wellington_footprints.py", line 150, in <module>                                                                                                                                                                                             
    multiWellington(orderedbychr,reads, shoulder_sizes = clargs.shoulder_sizes ,footprint_sizes = clargs.footprint_sizes, FDR_cutoff=clargs.FDR_cutoff,FDR_iterations=clargs.FDR_iterations,bonferroni = clargs.bonferroni)                                                                           
  File "/home/jer15/Dev/Saturn/python/virtualenv-py2/bin/wellington_footprints.py", line 140, in multiWellington                                                                                                                                                                                      
    fp = footprinting.wellington(i,reads,**kwargs)                                                                                                                                                                                                                                                    
  File "/home/jer15/Dev/Saturn/python/virtualenv-py2/lib/python2.7/site-packages/pyDNase/footprinting/__init__.py", line 46, in __init__                                                                                                                                                              
    self.reads        = reads[self.interval]                                                                                                                                                                                                                                                          
  File "/home/jer15/Dev/Saturn/python/virtualenv-py2/lib/python2.7/site-packages/pyDNase/__init__.py", line 192, in __getitem__                                                                                                                                                                       
    return self.get_cut_values(vals)                                                                                                                                                                                                                                                                  
  File "/home/jer15/Dev/Saturn/python/virtualenv-py2/lib/python2.7/site-packages/pyDNase/__init__.py", line 182, in get_cut_values                                                                                                                                                                    
    retval = self.__lookupReadsWithoutCache(startbp,endbp,chrom)                                                                                                                                                                                                                                      
  File "/home/jer15/Dev/Saturn/python/virtualenv-py2/lib/python2.7/site-packages/pyDNase/__init__.py", line 122, in __lookupReadsWithoutCache                                                                                                                                                         
    a = int(alignedread.aend)                                                                                                                                                                                                                                                                         
TypeError: int() argument must be a string or a number, not 'NoneType'

The docs for pysam say

aligned reference position of the read on the reference genome.

reference_end points to one past the last aligned residue. Returns None if not available (read is unmapped or no cigar alignment present).

I thought all the reads were aligned but I could be wrong (I am a bit new to read mapping and DNase-seq and the data has come from a third party).

Is there any way for wellington_footprints.py to handle these problems gracefully rather than crashing? Can it just ignore these reads? This only seems to happen for reads mapped to the reverse strand: alignedread.pos is never None.

And thanks for your nice software :)

numpy package requirements

python 2.7.3

I was trying to install pyDNase using pip, and it was erroring out on the numpy version. While numpy was installed, pyDNase was failing the numpy requirement.

I corrected the issue with: pip install numpy --upgrade, which brought numpy to version 1.7.1 after which pyDNase installed normally.

It would be nice if you listed your numpy version requirement in your docs.

for reference:

[root@lebowski ~]# pip install numpy
Requirement already satisfied (use --upgrade to upgrade): numpy in /net/lebowski/vol1/sw/python/2.7.3/lib/python2.7/site-packages
Cleaning up...
[root@lebowski ~]# pip install pyDNase
Downloading/unpacking pyDNase
Downloading pyDNase-0.1.2.tar.gz (201kB): 201kB downloaded
Running setup.py egg_info for package pyDNase
Traceback (most recent call last):
File "", line 16, in
File "/tmp/pip_build_root/pyDNase/setup.py", line 9, in
raise ImportError("Due to a quirk with pip, pyDNase requires numpy to be installed before starting setup")
ImportError: Due to a quirk with pip, pyDNase requires numpy to be installed before starting setup
Complete output from command python setup.py egg_info:
Traceback (most recent call last):

File "", line 16, in

File "/tmp/pip_build_root/pyDNase/setup.py", line 9, in

raise ImportError("Due to a quirk with pip, pyDNase requires numpy to be installed before starting setup")

ImportError: Due to a quirk with pip, pyDNase requires numpy to be installed before starting setup


Cleaning up...
Command python setup.py egg_info failed with error code 1 in /tmp/pip_build_root/pyDNase
Storing complete log in /root/.pip/pip.log

Thanks Much!
John N.

Empty bed file after running wellington_footprints.py

Hi,

This might be same type of question as #29 or #37.
I have ATAC seq data and whenever I run wellington on my data, I get "waiting on last 80 threads." as the last output.
The process seems to finish immediately, yeilding empty WellingtonFootprints.FDR.0.01.bed file.
I tried different FDR cutoff, but not seems working.

I installed
pip install pyDNase=0.3.0

I'm running
python 2.7

Any solution for this?
Thank you so much in advance.

Python 3 regression in `wellington_footprints.py`

The core scripts needs to be tested on Python 3 as well, some of them don't work...

Traceback (most recent call last):
  File "/n/data2/bch/surg/hla/PMA_analysis_final/new_pyDNase/venv/bin/wellington_footprints.py", line 70, in xrange_from_string
    range_string = range(range_string[0],range_string[1],range_string[2])
TypeError: 'map' object is not subscriptable

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/n/data2/bch/surg/hla/PMA_analysis_final/new_pyDNase/venv/bin/wellington_footprints.py", line 77, in <module>
    clargs.shoulder_sizes = xrange_from_string(clargs.shoulder_sizes)
  File "/n/data2/bch/surg/hla/PMA_analysis_final/new_pyDNase/venv/bin/wellington_footprints.py", line 74, in xrange_from_string
    raise ValueError
ValueError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/n/data2/bch/surg/hla/PMA_analysis_final/new_pyDNase/venv/bin/wellington_footprints.py", line 80, in <module>
    raise RuntimeError("shoulder and footprint sizes must be supplied as from,to,step")
RuntimeError: shoulder and footprint sizes must be supplied as from,to,step

wellington_bootstrap.py finishes with empty files

Hello,

I recently attempted differential footprinting for the first time with wellington_bootstrap.py. After running the output files were completely empty. There were no errors raised during the run.

Is this simply because there are no differential footprints that pass the default FDR threshold? This was run on approximately 156,000 peak regions so I was thinking it would find something.

Thanks in advance for any help.

pyDNase not compatible with Python 2.6.6

So, the documentation here says Python >= 2.6:

http://pythonhosted.org/pyDNase/installation.html#pre-installation-requirements

However, trying to import pyDNase in Python 2.6.6 gives the following syntax error:

import pyDNase
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "pyDNase/__init__.py", line 58
   self.cutCache    =  {i:{"+":{},"-":{}} for i in self.samfile.references}
                                            ^
SyntaxError: invalid syntax

I don't know if you prefer to change the documentation or the code, but either would be sufficient I think!

error in wellington_footprints.py

Hi!

I just downloaded wellington, and when running wellington_footprints.py I was getting the error:

error at line 85:
'<' is not defined for GenomicInterval objects

I determined this was happening only when a BED file with multiple chromosomes was passed. The full line 85 was

orderedbychr = [item for sublist in sorted(regions.intervals.values()) for item in sorted(sublist, key=lambda peak: peak.startbp)]

but, after some testing, it seemed like the error was coming from

sorted(regions.intervals.values())

Based on the comment above that the goal is to iterate by chromosome, then position, I edited line 85 to be

orderedbychr = [item for sublist in sorted(regions.intervals.values(),key=lambda genomicIntervalList: genomicIntervalList[0].chromosome) for item in sorted(sublist, key=lambda peak: peak.startbp)]

This has overcome the bug and the program runs. I am posting this to both confirm that this is the intended result and that my output should be correct, and to give a bugfix if so.

Thanks!

Refactor footprinting module

The footprinting module needs refactoring for a number of reasons, including code clarity and general good programming practices. Ideally, we would like to make footprinting parallelisable, but due to how the footprinting is implemented, this isn't possible as one must pass an open file object (a BAMHandler) to Wellington, and this isn't pickleable.

No README.rst, generates install error

Error on install:

Traceback (most recent call last):
File "setup.py", line 21, in
long_description=open('README.rst',"rt").read(),
IOError: [Errno 2] No such file or directory: 'README.rst'

Seems to be solved by replacing the line referencing 'README.rst' in setup.py. Suggest adding the file, or removing the line?!

Bad bam file?

Hello! I'm afraid that this may be more of a bam format question, but I just cannot figure it out. I got pyDNase running successfully with your example.bam and a few other bam files that I have, both single- and paired-end reads, but I can't get it to run on the bam file I want it to run on. Here are the first 2 lines of the sam file I have, called "DSpeek.sam":

HWI-ST700693:385:C3ATLACXX:8:1304:13600:22809 99 chr1 99965 60 36M = 100034 105 TCGGTCGCGCAAGATCGCTTCATTTGTTTCTAGAGT CCCFDFFFHHHHHIJIJJJJJJJJIJJJJJJJJIIB XT:A:U NM:i:0 SM:i:37 AM:i:37 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:36
HWI-ST700693:385:C3ATLACXX:5:2102:10641:35258 147 chr1 99965 60 36M = 99902 -99 TCGGTCGCGCAAGATCGCTTCATTTGTTTCTAGAGT GJIJJJJJJJJJJJJJIJJJJJJHHHHHFFFFFCCC XT:A:U NM:i:0 SM:i:37 AM:i:37 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:36

I make the bam by running "samtools view -bt <ref.fai> DSpeek.sam > DSpeek.bam

Here is the error I'm getting:

import pyDNase
reads = pyDNase.BAMHandler("DSpeek.bam")
reads["chr1,1000,110000,+"]
Traceback (most recent call last):
File "", line 1, in
File "/net/gs/vol3/software/modules-sw-python/2.7.3/pyDNase/0.1.5-1/Linux/RHEL6/x86_64/lib/python2.7/site-packages/pyDNase-0.1.6-py2.7-linux-x86_64.egg/pyDNase/init.py", line 166, in getitem
retval = self.lookupReadsUsingCache(startbp,endbp,chrom)
File "/net/gs/vol3/software/modules-sw-python/2.7.3/pyDNase/0.1.5-1/Linux/RHEL6/x86_64/lib/python2.7/site-packages/pyDNase-0.1.6-py2.7-linux-x86_64.egg/pyDNase/__init
.py", line 103, in lookupReadsUsingCache
self.__addCutsToCache(chrom,i,i+self.CHUNK_SIZE)
File "/net/gs/vol3/software/modules-sw-python/2.7.3/pyDNase/0.1.5-1/Linux/RHEL6/x86_64/lib/python2.7/site-packages/pyDNase-0.1.6-py2.7-linux-x86_64.egg/pyDNase/__init
.py", line 80, in __addCutsToCache
a = int(alignedread.aend)
TypeError: int() argument must be a string or a number, not 'NoneType'

Anything obvious? I am new to python, so sorry for being dumb, but everything worked fine on other bam files, so I'm thinking it must be my file format somehow. Thank you!!!

Finding footprints (wellington_footprints.py)

Many peak callers exist such as Hotspot, F-seq, and Macs2.
These results are ~.bed without strand information ("+" or "-").
But pyDNase scrpits need bed with strand information.
For example, dnase_average_profile.py, I should mind strand information when I use around TF motif bed, because TF motif has a direction. But, when I find footprint, I think bed don't need strand information. For now, I don't have to be too careful.

cat hotspot.bed | awk 'BEGIN{OFS="\t"} {print $1,$2,$3,$4,$5,"+" }' > withstand.hotspot.bed
wellington_footprints.py withstand.hotspot.bed hotspot.bam output_dir

So, wellington_footprints.py worked. Does this process have a problem?

For ATAC-seq bias correction, is an naked DNA sequencing data necessary?

Hello,
Recently I use pyDNase to deal with my ATAC-seq data. I read the paper about Wellington-bootstrap, it is said that the bias is corrected using observed cuts and expected counts. I also read the paper about how to calculate the "predicted count". The "predicted count" can be from chromatin derived DNase-seq or naked DNA. Which method pyDNase support? Is an naked DNA sequencing data necessary?
Thanks!

wellington_footprints.py doesn't work with python3

The print statements in wellington_footprints.py are python 2, so don't work with python3.

Here's a diff of the current version in the repository (wellington_footprints.py.bk) with one in which I commented out the print statements and rewrote them as functions. This revision works with python3 (but not python2 - perhaps keep 2 different versions in the repository?)

diff wellington_footprints.py wellington_footprints.py.bk 
104,105c104
< #print >> wigout, "track type=wiggle_0"
< print("track type=wiggle_0", file=wigout)

---
> print >> wigout, "track type=wiggle_0"
121,124c120,121
<     #print >> wigout, "fixedStep\tchrom=" + str(fp.interval.chromosome) + "\t start="+ str(fp.interval.startbp) +"\tstep=1"
<     print("fixedStep\tchrom=" + str(fp.interval.chromosome) + "\    t start="+ str(fp.interval.startbp) +"\tstep=1", file=wigout)
<     #print >> wigout , '\n'.join(map(str, fp.scores))
<     print('\n'.join(map(str, fp.scores)), file=wigout)

---
>     print >> wigout, "fixedStep\tchrom=" + str(fp.interval.chromosome) + "\t start="+ str(fp.interval.startbp) +"\tstep=1"
>     print >> wigout , '\n'.join(map(str, fp.scores))
129,130c126
<              #print >> fdrout, footprint
<              print(footprint, file=fdrout)

---
>              print >> fdrout, footprint
135,136c131
<             #print >> ofile, footprint
<             print(footprint, file=ofile)

---
>             print >> ofile, footprint

ATAC-seq handling question

Would it be possible to provide explanations for why loffset and roffset of BamHandler for ATACseq data needs to be set to -5 and 4, respectively?

Question about shifting alignment

Hi!
Recently I'm using pyDNase to analyze some ATAC-seq data. It's a very convinient tool!
However, I noticed that in the original ATAC-seq paper it said: "all reads aligning to the + strand
were offset by +4 bp, and all reads aligning to the – strand were offset −5 bp." But in https://github.com/jpiper/pyDNase/blob/master/pyDNase/__init__.py the corresponding code is:

if ATAC:
    self.loffset = -5
    self.roffset = +4

I'm new to ATAC-seq but the code doesn't seem to agree with the ATAC-seq paper. Am I understanding wrong?

matplotlib version

Hello,

Is there a plan to update pyDNAse to an updated version of matplotlib?

ERROR: pydnase 0.3.0 has requirement matplotlib<2.0.0, but you'll have matplotlib 3.1.1 which is incompatible.

best regards

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.