GithubHelp home page GithubHelp logo

biocore-ntnu / epic Goto Github PK

View Code? Open in Web Editor NEW
31.0 7.0 6.0 11 MB

(DEPRECATED) epic: diffuse domain ChIP-Seq caller based on SICER

Home Page: http://bioepic.readthedocs.io

License: MIT License

Python 99.74% Shell 0.26%
sicer-algorithm bioinformatics chip-seq peak-caller sicer chip-seq-callers

epic's People

Contributors

daler avatar darwinawardwinner avatar endrebak avatar palmercd avatar rmsds avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

epic's Issues

Paired-end read support

Hi,
HiChip paper ( http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-15-280 ) describes how they added paired-end support to Sicer:

For paired-end reads, the HiChIP pipeline keeps the first end and extends by the fragment length estimated from mapping positions of the two ends, rather than by the average fragment length of the library. Given the variability of fragment lengths across a complex genome like human genome, the use of actual coordinates of mapped pairs is expected to achieve better resolution in signal visualization. The bed file is then used to generate a bedGraph file by the genomeCoverageBed command from BEDTools.

Best wishes,

Michal

Use unique filenames for --sum-bigwig option

Running epic on multiple datasets with the same parameter directory location for --sum-bigwig will overwrite output of a previous run on a different dataset. It looks like the output file just get called chip_sum.bw and input_sum.bw. Can you change this to have a more unique name, perhaps by taking a --name parameter on the command line for an experiment name?

Add instructions how to add a new genome

I tried running epic on zebrafish data. For this I created a file:
/usr/local/lib/python2.7/dist-packages/epic/scripts/chromsizes/danRer7.chromsizes
containing chormosome sizes and run epic on paired-end data:

epic --treatment myReads.bedpe.bz2 \
    --control myInput.bedpe.bz2 \
    --number-cores 8 \
    --genome danRer7 \
    --effective_genome_length 0.9 \
    --paired-end

But I ran into problems, which I don't understand.


Merging ChIP and Input data. (File: helper_functions, Log level: INFO, Time: Tue, 05 Jul 2016 17:32:56 )
0.9effective_genome_length (File: compute_background_probabilites, Log level: DEBUG, Time: Tue, 05 Jul 2016 17:33:01 )
200 window size (File: compute_background_probabilites, Log level: DEBUG, Time: Tue, 05 Jul 2016 17:33:01 )
0 total chip count (File: compute_background_probabilites, Log level: DEBUG, Time: Tue, 05 Jul 2016 17:33:01 )
0.0average_window_readcount (File: compute_background_probabilites, Log level: DEBUG, Time: Tue, 05 Jul 2016 17:33:01 )
1island_enriched_threshold (File: compute_background_probabilites, Log level: DEBUG, Time: Tue, 05 Jul 2016 17:33:01 )
4.0gap_contribution (File: compute_background_probabilites, Log level: DEBUG, Time: Tue, 05 Jul 2016 17:33:01 )
1.0boundary_contribution (File: compute_background_probabilites, Log level: DEBUG, Time: Tue, 05 Jul 2016 17:33:01 )
Traceback (most recent call last):
  File "/usr/local/bin/epic", line 164, in <module>
    run_epic(args)
  File "/usr/local/lib/python2.7/dist-packages/epic/run/run_epic.py", line 45, in run_epic
    compute_background_probabilities(nb_chip_reads, args)
  File "/usr/local/lib/python2.7/dist-packages/epic/statistics/compute_background_probabilites.py", line 48, in compute_background_probabilities
    boundary_contribution, genome_length_in_bins)
  File "/usr/local/lib/python2.7/dist-packages/epic/statistics/compute_score_threshold.py", line 24, in compute_score_threshold
    current_scaled_score = int(round(score / BIN_SIZE))
OverflowError: cannot convert float infinity to integer

Could you please let me know how to proceed.

Piotr

Temporary Directory

I'm running to a problem where my /tmp folder is running out of space during a pipeline run. I was wondering if there was an option to choose where temporary files are stored for epic?

Bam support

I think it is a bad idea for the below reasons. Feel free to suggest solutions:

You will probably rerun the analyses many times. Having to run a time-consuming conversion step (the most time-consuming one in the algorithm) each time would be silly. It is also IO-intensive so parallell execution would not help much.

I am not just writing epic but a lot of helper scripts for ChIP-Seq and differential ChIP Seq. Adding a conversion step to bed in all of these before running the scripts would be a waste.

Also, where should I store the temporary bed files? Overflowing /tmp/ dirs is an eternal issue.

If I were to stream the data to bed using pipes, epic would not be fast anymore. I get a massive speedup from multiple cores if I use text files, presumably because the system knows it has the file in memory already. This is not the case if I start the pipe with bamToBed blabla | ...

There are many things that can go wrong when converting bam to bed, due to wonky bam files. I would get a bunch of github issues about "epic not being able to use my bam files" if I were to silently convert to bed within my programs.

Andaconda Update

EPIC on Anaconda is still at 1.6. I'd like to include and cite it as part of a docker container, unfortunately I require the latest updated version of EPIC which has access to the "-cs" and "-sbw" options.

Thank you! (And sorry for posting so much here)

Using dm3 makes epic crash

Hi , I am very new to epic (I have installed it today) and when I am try to run it with multiple cores I get the following error (Running it at a single core appears unaffected until now - still running) . Please advice

Pantelis Topalis

epic -t ../sorted_H3_APAA.bed -c ../sorted_H3_BiB.bed --number-cores 16 -gn dm3 -w 200 -g 3 -fs 150 -fdr 0.05 -egs 0.72 -sm APAA_BiB_matrix

epic -t ../sorted_H3_APAA.bed -c ../sorted_H3_BiB.bed --number-cores 16 -gn dm3 -w 200 -g 3 -fs 150 -fdr 0.05 -egs 0.72 -sm APAA_BiB_matrix

epic -t ../sorted_H3_APAA.bed -c ../sorted_H3_BiB.bed --number-cores 16 -gn dm3 -w 200 -g 3 -fs 150 -fdr 0.05 -egs 0.72 -sm APAA_BiB_matrix (File: epic, Log level: INFO, Time: Wed, 13 Jul 2016 21:22:54 )

Binning ../sorted_H3_APAA.bed (File: run_epic, Log level: INFO, Time: Wed, 13 Jul 2016 21:22:54 )
Binning chromosomes 2L, 2LHet, 2R, 2RHet, 3L, 3LHet, 3R, 3RHet, 4, M, U, Uextra, X, XHet, YHet (File: count_reads_in_windows, Log level: INFO, Time: Wed, 13 Jul 2016 21:22:54 )
Merging the bins on both strands per chromosome. (File: count_reads_in_windows, Log level: INFO, Time: Wed, 13 Jul 2016 21:23:05 )
Binning ../sorted_H3_BiB.bed (File: run_epic, Log level: INFO, Time: Wed, 13 Jul 2016 21:23:07 )
Binning chromosomes 2L, 2LHet, 2R, 2RHet, 3L, 3LHet, 3R, 3RHet, 4, M, U, Uextra, X, XHet, YHet (File: count_reads_in_windows, Log level: INFO, Time: Wed, 13 Jul 2016 21:23:07 )
Merging the bins on both strands per chromosome. (File: count_reads_in_windows, Log level: INFO, Time: Wed, 13 Jul 2016 21:23:19 )
Merging ChIP and Input data. (File: helper_functions, Log level: INFO, Time: Wed, 13 Jul 2016 21:23:22 )
Traceback (most recent call last):
File "/usr/local/bin/epic", line 165, in
run_epic(args)
File "/usr/local/lib/python2.7/dist-packages/epic/run/run_epic.py", line 42, in run_epic
args.number_cores)
File "/usr/local/lib/python2.7/dist-packages/epic/utils/helper_functions.py", line 37, in merge_chip_and_input
for chip_df, input_df in zip(chip_dfs, input_dfs))
File "/usr/local/lib/python2.7/dist-packages/joblib/parallel.py", line 764, in call
self.retrieve()
File "/usr/local/lib/python2.7/dist-packages/joblib/parallel.py", line 715, in retrieve
raise exception
joblib.my_exceptions.JoblibValueError: JoblibValueError


Multiprocessing exception:
...........................................................................
/usr/local/bin/epic in ()
160 elif not args.effective_genome_length and args.paired_end:
161 logging.info("Using paired end so setting readlength to 100.")
162 args.effective_genome_length = get_effective_genome_length(args.genome,
163 100)
164
--> 165 run_epic(args)
166
167
168
169

...........................................................................
/usr/local/lib/python2.7/dist-packages/epic/run/run_epic.py in run_epic(args=Namespace(control=['../sorted_H3_BiB.bed'], effe...tment=['../sorted_H3_APAA.bed'], window_size=200))
37
38 nb_chip_reads = get_total_number_of_reads(chip_merged_sum)
39 nb_input_reads = get_total_number_of_reads(input_merged_sum)
40
41 merged_dfs = merge_chip_and_input(chip_merged_sum, input_merged_sum,
---> 42 args.number_cores)
args.number_cores = 16
43
44 score_threshold, island_enriched_threshold, average_window_readcount =
45 compute_background_probabilities(nb_chip_reads, args)
46

...........................................................................
/usr/local/lib/python2.7/dist-packages/epic/utils/helper_functions.py in merge_chip_and_input(chip_dfs=[ Chromosome Bin Count
0 c...chr2L 22986200 3

[107160 rows x 3 columns], Chromosome Bin Count
0 chr2LHet 1... chr2LHet 367800 2

[539 rows x 3 columns], Chromosome Bin Count
0 chr... chr2R 21145600 3

[98361 rows x 3 columns], Chromosome Bin Count
0 chr2RHet ...chr2RHet 3277200 2

[5745 rows x 3 columns], Chromosome Bin Count
0 c...chr3L 24537000 1

[113832 rows x 3 columns], Chromosome Bin Count
0 chr3LHet ...chr3LHet 2547800 5

[5234 rows x 3 columns], Chromosome Bin Count
0 c...chr3R 27898600 4

[134035 rows x 3 columns], Chromosome Bin Count
0 chr3RHet ...chr3RHet 2517400 8

[4774 rows x 3 columns], Chromosome Bin Count
0 chr4 ... chr4 1285200 1

[5699 rows x 3 columns], Chromosome Bin Count
0 chrM 1800 ... chrM 8800 2
3 chrM 12000 1, Chromosome Bin Count
0 chrU... chrU 10047200 1

[8182 rows x 3 columns], Chromosome Bin Count
0 chrUextra...rUextra 29003800 1

[4023 rows x 3 columns], Chromosome Bin Count
0 ... chrX 22422000 17

[102257 rows x 3 columns], Chromosome Bin Count
0 chrXHet ... chrXHet 190000 3

[522 rows x 3 columns], Chromosome Bin Count
0 chrYHet ... chrYHet 341400 2

[471 rows x 3 columns]], input_dfs=[ Chromosome Bin Count
0 c...chr2L 22997600 1

[107540 rows x 3 columns], Chromosome Bin Count
0 chr2LHet 1... chr2LHet 367800 2

[577 rows x 3 columns], Chromosome Bin Count
0 chr... chr2R 21146200 1

[98668 rows x 3 columns], Chromosome Bin Count
0 chr2RHet ...chr2RHet 3277200 2

[5981 rows x 3 columns], Chromosome Bin Count
0 c...chr3L 24535600 18

[114448 rows x 3 columns], Chromosome Bin Count
0 chr3LHet ...chr3LHet 2547800 4

[5631 rows x 3 columns], Chromosome Bin Count
0 c...chr3R 27898600 4

[134479 rows x 3 columns], Chromosome Bin Count
0 chr3RHet ...chr3RHet 2517400 6

[5181 rows x 3 columns], Chromosome Bin Count
0 chr4 ... chr4 1318000 1

[5701 rows x 3 columns], Chromosome Bin Count
0 chrM 600 ... chrM 12000 2
4 chrM 12200 1, Chromosome Bin Count
0 ch... chrU 10043000 2

[10082 rows x 3 columns], Chromosome Bin Count
0 chrUextra...rUextra 29000400 1

[5706 rows x 3 columns], Chromosome Bin Count
0 ... chrX 22422200 3

[103662 rows x 3 columns], Chromosome Bin Count
0 chrXHet ... chrXHet 197200 1

[544 rows x 3 columns], Chromosome Bin Count
0 chrYHet ... chrYHet 341400 2

[574 rows x 3 columns]], nb_cpu=16)
32 assert len(chip_dfs) == len(input_dfs)
33
34 logging.info("Merging ChIP and Input data.")
35 merged_chromosome_dfs = Parallel(n_jobs=nb_cpu)(
36 delayed(_merge_chip_and_input)(chip_df, input_df)
---> 37 for chip_df, input_df in zip(chip_dfs, input_dfs))
chip_dfs = [ Chromosome Bin Count
0 c...chr2L 22986200 3

[107160 rows x 3 columns], Chromosome Bin Count
0 chr2LHet 1... chr2LHet 367800 2

[539 rows x 3 columns], Chromosome Bin Count
0 chr... chr2R 21145600 3

[98361 rows x 3 columns], Chromosome Bin Count
0 chr2RHet ...chr2RHet 3277200 2

[5745 rows x 3 columns], Chromosome Bin Count
0 c...chr3L 24537000 1

[113832 rows x 3 columns], Chromosome Bin Count
0 chr3LHet ...chr3LHet 2547800 5

[5234 rows x 3 columns], Chromosome Bin Count
0 c...chr3R 27898600 4

[134035 rows x 3 columns], Chromosome Bin Count
0 chr3RHet ...chr3RHet 2517400 8

[4774 rows x 3 columns], Chromosome Bin Count
0 chr4 ... chr4 1285200 1

[5699 rows x 3 columns], Chromosome Bin Count
0 chrM 1800 ... chrM 8800 2
3 chrM 12000 1, Chromosome Bin Count
0 chrU... chrU 10047200 1

[8182 rows x 3 columns], Chromosome Bin Count
0 chrUextra...rUextra 29003800 1

[4023 rows x 3 columns], Chromosome Bin Count
0 ... chrX 22422000 17

[102257 rows x 3 columns], Chromosome Bin Count
0 chrXHet ... chrXHet 190000 3

[522 rows x 3 columns], Chromosome Bin Count
0 chrYHet ... chrYHet 341400 2

[471 rows x 3 columns]]
input_dfs = [ Chromosome Bin Count
0 c...chr2L 22997600 1

[107540 rows x 3 columns], Chromosome Bin Count
0 chr2LHet 1... chr2LHet 367800 2

[577 rows x 3 columns], Chromosome Bin Count
0 chr... chr2R 21146200 1

[98668 rows x 3 columns], Chromosome Bin Count
0 chr2RHet ...chr2RHet 3277200 2

[5981 rows x 3 columns], Chromosome Bin Count
0 c...chr3L 24535600 18

[114448 rows x 3 columns], Chromosome Bin Count
0 chr3LHet ...chr3LHet 2547800 4

[5631 rows x 3 columns], Chromosome Bin Count
0 c...chr3R 27898600 4

[134479 rows x 3 columns], Chromosome Bin Count
0 chr3RHet ...chr3RHet 2517400 6

[5181 rows x 3 columns], Chromosome Bin Count
0 chr4 ... chr4 1318000 1

[5701 rows x 3 columns], Chromosome Bin Count
0 chrM 600 ... chrM 12000 2
4 chrM 12200 1, Chromosome Bin Count
0 ch... chrU 10043000 2

[10082 rows x 3 columns], Chromosome Bin Count
0 chrUextra...rUextra 29000400 1

[5706 rows x 3 columns], Chromosome Bin Count
0 ... chrX 22422200 3

[103662 rows x 3 columns], Chromosome Bin Count
0 chrXHet ... chrXHet 197200 1

[544 rows x 3 columns], Chromosome Bin Count
0 chrYHet ... chrYHet 341400 2

[574 rows x 3 columns]]
38 return merged_chromosome_dfs
39
40
41 def get_total_number_of_reads(dfs):

...........................................................................
/usr/local/lib/python2.7/dist-packages/joblib/parallel.py in call(self=Parallel(n_jobs=16), iterable=<generator object >)
759 if pre_dispatch == "all" or n_jobs == 1:
760 # The iterable was consumed all at once by the above for loop.
761 # No need to wait for async callbacks to trigger to
762 # consumption.
763 self._iterating = False
--> 764 self.retrieve()
self.retrieve = <bound method Parallel.retrieve of Parallel(n_jobs=16)>
765 # Make sure that we get a last message telling us we are done
766 elapsed_time = time.time() - self._start_time
767 self._print('Done %3i out of %3i | elapsed: %s finished',
768 (len(self._output), len(self._output),


Sub-process traceback:

ValueError Wed Jul 13 21:23:22 2016
PID: 33319 Python 2.7.11+: /usr/bin/python
...........................................................................
/usr/local/lib/python2.7/dist-packages/joblib/parallel.py in call(self=<joblib.parallel.BatchedCalls object>)
122 def init(self, iterator_slice):
123 self.items = list(iterator_slice)
124 self._size = len(self.items)
125
126 def call(self):
--> 127 return [func(_args, *_kwargs) for func, args, kwargs in self.items]
func =
args = ( Chromosome Bin Count
0 c...chr3R 27898600 4

[134035 rows x 3 columns], Chromosome Bin Count
0 c...chr3R 27898600 4

[134479 rows x 3 columns])
kwargs = {}
self.items = [(, ( Chromosome Bin Count
0 c...chr3R 27898600 4

[134035 rows x 3 columns], Chromosome Bin Count
0 c...chr3R 27898600 4

[134479 rows x 3 columns]), {})]
128
129 def len(self):
130 return self._size
131

...........................................................................
/usr/local/lib/python2.7/dist-packages/epic/utils/helper_functions.py in _merge_chip_and_input(chip_df= Chromosome Bin Count
0 c...chr3R 27898600 4

[134035 rows x 3 columns], input_df= Chromosome Bin Count
0 c...chr3R 27898600 4

[134479 rows x 3 columns])
13
14 chip_df_nb_bins = len(chip_df)
15 merged_df = chip_df.merge(input_df,
16 how="left",
17 on=["Chromosome", "Bin"],
---> 18 suffixes=[" ChIP", " Input"])
19 merged_df = merged_df[["Chromosome", "Bin", "Count ChIP", "Count Input"]]
20 merged_df.columns = ["Chromosome", "Bin", "ChIP", "Input"]
21
22 merged_df = merged_df.fillna(0)

...........................................................................
/usr/local/lib/python2.7/dist-packages/pandas-0.18.1-py2.7-linux-x86_64.egg/pandas/core/frame.py in merge(self= Chromosome Bin Count
0 c...chr3R 27898600 4

[134035 rows x 3 columns], right= Chromosome Bin Count
0 c...chr3R 27898600 4

[134479 rows x 3 columns], how='left', on=['Chromosome', 'Bin'], left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=[' ChIP', ' Input'], copy=True, indicator=False)
4432 suffixes=('_x', '_y'), copy=True, indicator=False):
4433 from pandas.tools.merge import merge
4434 return merge(self, right, how=how, on=on, left_on=left_on,
4435 right_on=right_on, left_index=left_index,
4436 right_index=right_index, sort=sort, suffixes=suffixes,
-> 4437 copy=copy, indicator=indicator)
copy = True
indicator = False
4438
4439 def round(self, decimals=0, _args, *_kwargs):
4440 """
4441 Round a DataFrame to a variable number of decimal places.

...........................................................................
/usr/local/lib/python2.7/dist-packages/pandas-0.18.1-py2.7-linux-x86_64.egg/pandas/tools/merge.py in merge(left= Chromosome Bin Count
0 c...chr3R 27898600 4

[134035 rows x 3 columns], right= Chromosome Bin Count
0 c...chr3R 27898600 4

[134479 rows x 3 columns], how='left', on=['Chromosome', 'Bin'], left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=[' ChIP', ' Input'], copy=True, indicator=False)
34 suffixes=('_x', '_y'), copy=True, indicator=False):
35 op = _MergeOperation(left, right, how=how, on=on, left_on=left_on,
36 right_on=right_on, left_index=left_index,
37 right_index=right_index, sort=sort, suffixes=suffixes,
38 copy=copy, indicator=indicator)
---> 39 return op.get_result()
op.get_result = <bound method _MergeOperation.get_result of <pandas.tools.merge._MergeOperation object>>
40 if debug:
41 merge.doc = _merge_doc % '\nleft : DataFrame'
42
43

...........................................................................
/usr/local/lib/python2.7/dist-packages/pandas-0.18.1-py2.7-linux-x86_64.egg/pandas/tools/merge.py in get_result(self=<pandas.tools.merge._MergeOperation object>)
212 def get_result(self):
213 if self.indicator:
214 self.left, self.right = self._indicator_pre_merge(
215 self.left, self.right)
216
--> 217 join_index, left_indexer, right_indexer = self._get_join_info()
join_index = undefined
left_indexer = undefined
right_indexer = undefined
self._get_join_info = <bound method _MergeOperation._get_join_info of <pandas.tools.merge._MergeOperation object>>
218
219 ldata, rdata = self.left._data, self.right._data
220 lsuf, rsuf = self.suffixes
221

...........................................................................
/usr/local/lib/python2.7/dist-packages/pandas-0.18.1-py2.7-linux-x86_64.egg/pandas/tools/merge.py in _get_join_info(self=<pandas.tools.merge._MergeOperation object>)
348 sort=self.sort)
349 else:
350 (left_indexer,
351 right_indexer) = _get_join_indexers(self.left_join_keys,
352 self.right_join_keys,
--> 353 sort=self.sort, how=self.how)
self.sort = False
self.how = 'left'
354 if self.right_index:
355 if len(self.left) > 0:
356 join_index = self.left.index.take(left_indexer)
357 else:

...........................................................................
/usr/local/lib/python2.7/dist-packages/pandas-0.18.1-py2.7-linux-x86_64.egg/pandas/tools/merge.py in _get_join_indexers(left_keys=[array(['chr3R', 'chr3R', 'chr3R', ..., 'chr3R', 'chr3R', 'chr3R'], dtype=object), memmap([ 0, 200, 400, ..., 27897200, 27898400, 27898600])], right_keys=[array(['chr3R', 'chr3R', 'chr3R', ..., 'chr3R', 'chr3R', 'chr3R'], dtype=object), memmap([ 0, 200, 400, ..., 27897200, 27898400, 27898600])], sort=False, how='left')
541
542 # bind sort arg. of _factorize_keys
543 fkeys = partial(_factorize_keys, sort=sort)
544
545 # get left & right join labels and num. of levels at each location
--> 546 llab, rlab, shape = map(list, zip(* map(fkeys, left_keys, right_keys)))
llab = undefined
rlab = undefined
shape = undefined
fkeys = <functools.partial object>
left_keys = [array(['chr3R', 'chr3R', 'chr3R', ..., 'chr3R', 'chr3R', 'chr3R'], dtype=object), memmap([ 0, 200, 400, ..., 27897200, 27898400, 27898600])]
right_keys = [array(['chr3R', 'chr3R', 'chr3R', ..., 'chr3R', 'chr3R', 'chr3R'], dtype=object), memmap([ 0, 200, 400, ..., 27897200, 27898400, 27898600])]
547
548 # get flat i8 keys from label lists
549 lkey, rkey = _get_join_keys(llab, rlab, shape, sort)
550

...........................................................................
/usr/local/lib/python2.7/dist-packages/pandas-0.18.1-py2.7-linux-x86_64.egg/pandas/tools/merge.py in _factorize_keys(lk=memmap([ 0, 200, 400, ..., 27897200, 27898400, 27898600]), rk=memmap([ 0, 200, 400, ..., 27897200, 27898400, 27898600]), sort=False)
708 lk = com._ensure_object(lk)
709 rk = com._ensure_object(rk)
710
711 rizer = klass(max(len(lk), len(rk)))
712
--> 713 llab = rizer.factorize(lk)
llab = undefined
rizer.factorize =
lk = memmap([ 0, 200, 400, ..., 27897200, 27898400, 27898600])
714 rlab = rizer.factorize(rk)
715
716 count = rizer.get_count()
717

...........................................................................
/usr/local/lib/python2.7/dist-packages/pandas-0.18.1-py2.7-linux-x86_64.egg/pandas/hashtable.so in pandas.hashtable.Int64Factorizer.factorize (pandas/hashtable.c:15827)()
854
855
856
857
858
--> 859
860
861
862
863

...........................................................................
/usr/local/lib/python2.7/dist-packages/pandas-0.18.1-py2.7-linux-x86_64.egg/pandas/hashtable.so in View.MemoryView.memoryview_cwrapper (pandas/hashtable.c:29882)()
611
612
613
614
615
--> 616
617
618
619
620

...........................................................................
/usr/local/lib/python2.7/dist-packages/pandas-0.18.1-py2.7-linux-x86_64.egg/pandas/hashtable.so in View.MemoryView.memoryview.cinit (pandas/hashtable.c:26251)()
318
319
320
321
322
--> 323
324
325
326
327

ValueError: buffer source array is read-only

bedtools needs genome

bedtools bamtobed needs a genome file on recent (within the last few years?) versions of bedtools. Using BAM files as input is not possible with the bedtools versions available in bioconda due to this call to bedtools.

OverflowError: cannot convert float infinity to integer

Hi,

I'm using SICER in one of my projects, and I wanted to give epic a try. I'm using epic 0.0.6 (conda install). I downloaded the test.bed and control.bed files.

epic --treatment test.bed --control control.bed

And I'm getting the following error message:

Merging ChIP and Input data. (File: helper_functions, Log level: INFO, Time: Mon, 18 Jul 2016 17:07:42 )
2290813547.42effective_genome_length (File: compute_background_probabilites, Log level: DEBUG, Time: Mon, 18 Jul 2016 17:07:42 )
200 window size (File: compute_background_probabilites, Log level: DEBUG, Time: Mon, 18 Jul 2016 17:07:42 )
0 total chip count (File: compute_background_probabilites, Log level: DEBUG, Time: Mon, 18 Jul 2016 17:07:42 )
0.0average_window_readcount (File: compute_background_probabilites, Log level: DEBUG, Time: Mon, 18 Jul 2016 17:07:42 )
1island_enriched_threshold (File: compute_background_probabilites, Log level: DEBUG, Time: Mon, 18 Jul 2016 17:07:42 )
4.0gap_contribution (File: compute_background_probabilites, Log level: DEBUG, Time: Mon, 18 Jul 2016 17:07:42 )
1.0boundary_contribution (File: compute_background_probabilites, Log level: DEBUG, Time: Mon, 18 Jul 2016 17:07:42 )
Traceback (most recent call last):
File "/home/estelle/miniconda2/bin/epic", line 4, in
import('pkg_resources').run_script('bioepic==0.0.6', 'epic')
File "/home/estelle/miniconda2/lib/python2.7/site-packages/setuptools-20.3-py2.7.egg/pkg_resources/init.py", line 726, in run_script
File "/home/estelle/miniconda2/lib/python2.7/site-packages/setuptools-20.3-py2.7.egg/pkg_resources/init.py", line 1491, in run_script
File "/home/estelle/miniconda2/lib/python2.7/site-packages/bioepic-0.0.6-py2.7.egg/EGG-INFO/scripts/epic", line 130, in

File "/pica/h1/estelle/miniconda2/lib/python2.7/site-packages/bioepic-0.0.6-py2.7.egg/epic/run/run_epic.py", line 38, in run_epic
File "/pica/h1/estelle/miniconda2/lib/python2.7/site-packages/bioepic-0.0.6-py2.7.egg/epic/statistics/compute_background_probabilites.py", line 59, in compute_background_probabilities
File "/pica/h1/estelle/miniconda2/lib/python2.7/site-packages/bioepic-0.0.6-py2.7.egg/epic/statistics/compute_score_threshold.py", line 24, in compute_score_threshold
OverflowError: cannot convert float infinity to integer

/Estelle

Make it possible to enter fa/fai file on command line instead of genome

Hello,
instead of hard coding chromosome name and lengths. Could you for example use FASTA index created by samtools:

def addGenomeData(input_filename):
    genomeData = {}

    with open(input_filename) as i:
        for line in i:
            try:
                parts = line.rstrip().split('\t')
            except IndexError:
                continue
            genomeData[parts[0]] = parts[1]

    print genomeData
    print genomeData.keys()

if __name__ == "__main__":

    addGenomeData("/data//Bactrocera_tryoni/Bac.fasta.fai")

or to use http://nullege.com/codes/search/pysam.faidx so the user only has to provide FASTA file and pysam will create the FASTA index.

If Epic could be used without hard coding genome sizes than it will be possible to use it in galaxy (use galaxy.org).

Thank you in advance.

Mic

IndexError: list index out of range

I installed epic ver. 0.1.18 and no issues were reported. However, when I try running it on any dataset including the example dataset shipped with the software, I am getting the following error.

mnrusimh@bioserv:/nfs/analysis/epic$ epic --treatment test.bed --control control.bed

epic --treatment test.bed --control control.bed

epic --treatment test.bed --control control.bed # epic_version: 0.1.18, pandas_version: 0.12.0 (File: epic, Log level: INFO, Time: Tue, 04 Oct 2016 09:45:41 )

Traceback (most recent call last):
File "/home/mnrusimh/Applications/epic/bin/epic", line 5, in
pkg_resources.run_script('bioepic==0.1.18', 'epic')
File "/nfs/bio/sw/lib/python2.7/site-packages/distribute-0.6.34-py2.7.egg/pkg_resources.py", line 505, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/nfs/bio/sw/lib/python2.7/site-packages/distribute-0.6.34-py2.7.egg/pkg_resources.py", line 1245, in run_script
execfile(script_filename, namespace, namespace)
File "/home/mnrusimh/Applications/epic/lib/python2.7/site-packages/bioepic-0.1.18-py2.7.egg/EGG-INFO/scripts/epic", line 193, in
estimated_readlength = find_readlength(args)
File "/home/mnrusimh/Applications/epic/lib/python2.7/site-packages/bioepic-0.1.18-py2.7.egg/epic/utils/find_readlength.py", line 36, in find_readlength
names=["Start", "End"])
File "/nfs/bio/sw/lib/python2.7/site-packages/pandas-0.12.0-py2.7-linux-x86_64.egg/pandas/io/parsers.py", line 400, in parser_f
return _read(filepath_or_buffer, kwds)
File "/nfs/bio/sw/lib/python2.7/site-packages/pandas-0.12.0-py2.7-linux-x86_64.egg/pandas/io/parsers.py", line 205, in _read
return parser.read()
File "/nfs/bio/sw/lib/python2.7/site-packages/pandas-0.12.0-py2.7-linux-x86_64.egg/pandas/io/parsers.py", line 608, in read
ret = self._engine.read(nrows)
File "/nfs/bio/sw/lib/python2.7/site-packages/pandas-0.12.0-py2.7-linux-x86_64.egg/pandas/io/parsers.py", line 1028, in read
data = self._reader.read(nrows)
File "parser.pyx", line 706, in pandas.parser.TextReader.read (pandas/parser.c:6745)
File "parser.pyx", line 728, in pandas.parser.TextReader._read_low_memory (pandas/parser.c:6964)
File "parser.pyx", line 804, in pandas.parser.TextReader._read_rows (pandas/parser.c:7780)
File "parser.pyx", line 865, in pandas.parser.TextReader._convert_column_data (pandas/parser.c:8512)
File "parser.pyx", line 1105, in pandas.parser.TextReader._get_column_name (pandas/parser.c:11684)
IndexError: list index out of range

-k FLAG

Can you add more information on the keep duplicate flag? I get an error saying it requires an argument, but i'm not sure what the correct argument would be.

Default parameters for gap

Hi,

In the doc, the default parameter for the "window size" is 200, and the default parameter for "gap" is 3.

However, "gap" is supposed to be a multiple of "window size". Should you change the default gap value for 400, 600..? Or does that mean that by default the gap is 3 * 200 = 600?

Thanks,
/Estelle

How to direct output to a file

Hi,

Strangely, I cannot save the output file.

I cannot even find a parameter that says something on naming output file :(

The output just displays on screen

How can I direct the output to files.

The command that I am using is

epic -t ChIPseq_H3K9me2.bed -c ChIPseq_input.bed -gn rn6

Thank you in advance.

Bug in multiprocessing paired end data:

epic -pe -t examples/chr19_sample.bedpe   -c examples/chr19_input.bedpe

works, but

epic -cpu 25 -pe -t examples/chr19_sample.bedpe   -c examples/chr19_input.bedpe  --store-matrix H3K27me3.matrix

fails! I'll try to get to the bottom of this, but the error is in a different library, not epic.

# epic -cpu 25 -pe -t examples/chr19_sample.bedpe -c examples/chr19_input.bedpe --store-matrix H3K27me3.matrix
# epic -cpu 25 -pe -t examples/chr19_sample.bedpe -c examples/chr19_input.bedpe --store-matrix H3K27me3.matrix (File: epic, Log level: INFO, Time: Thu, 11 Aug 2016 14:55:40 )
Using paired end so setting readlength to 100. (File: epic, Log level: INFO, Time: Thu, 11 Aug 2016 14:55:41 )
Using an effective genome fraction of 0.901962701202. (File: genomes, Log level: INFO, Time: Thu, 11 Aug 2016 14:55:41 )
Binning examples/chr19_sample.bedpe (File: run_epic, Log level: INFO, Time: Thu, 11 Aug 2016 14:55:41 )
Binning chromosomes 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, M, X, Y (File: count_reads_in_windows, Log level: INFO, Time: Thu, 11 Aug 2016 14:55:41 )
Making duplicated bins unique by summing them. (File: count_reads_in_windows, Log level: INFO, Time: Thu, 11 Aug 2016 14:55:48 )
Binning examples/chr19_input.bedpe (File: run_epic, Log level: INFO, Time: Thu, 11 Aug 2016 14:55:48 )
Binning chromosomes 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, M, X, Y (File: count_reads_in_windows, Log level: INFO, Time: Thu, 11 Aug 2016 14:55:48 )
Making duplicated bins unique by summing them. (File: count_reads_in_windows, Log level: INFO, Time: Thu, 11 Aug 2016 14:55:54 )
Merging ChIP and Input data. (File: helper_functions, Log level: INFO, Time: Thu, 11 Aug 2016 14:55:55 )
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/local/home/endrebak/anaconda3/lib/python3.5/site-packages/joblib-0.9.4-py3.5.egg/joblib/parallel.py", line 130, in __call__
    return self.func(*args, **kwargs)
  File "/local/home/endrebak/anaconda3/lib/python3.5/site-packages/joblib-0.9.4-py3.5.egg/joblib/parallel.py", line 72, in __call__
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
  File "/local/home/endrebak/anaconda3/lib/python3.5/site-packages/joblib-0.9.4-py3.5.egg/joblib/parallel.py", line 72, in <listcomp>
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
  File "/local/home/endrebak/anaconda3/lib/python3.5/site-packages/bioepic-0.1.8-py3.5.egg/epic/utils/helper_functions.py", line 19, in _merge_chip_and_input
    suffixes=[" ChIP", " Input"])
  File "/local/home/endrebak/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py", line 4437, in merge
    copy=copy, indicator=indicator)
  File "/local/home/endrebak/anaconda3/lib/python3.5/site-packages/pandas/tools/merge.py", line 39, in merge
    return op.get_result()
  File "/local/home/endrebak/anaconda3/lib/python3.5/site-packages/pandas/tools/merge.py", line 217, in get_result
    join_index, left_indexer, right_indexer = self._get_join_info()
  File "/local/home/endrebak/anaconda3/lib/python3.5/site-packages/pandas/tools/merge.py", line 353, in _get_join_info
    sort=self.sort, how=self.how)
  File "/local/home/endrebak/anaconda3/lib/python3.5/site-packages/pandas/tools/merge.py", line 546, in _get_join_indexers
    llab, rlab, shape = map(list, zip(* map(fkeys, left_keys, right_keys)))
  File "/local/home/endrebak/anaconda3/lib/python3.5/site-packages/pandas/tools/merge.py", line 713, in _factorize_keys
    llab = rizer.factorize(lk)
  File "pandas/hashtable.pyx", line 859, in pandas.hashtable.Int64Factorizer.factorize (pandas/hashtable.c:15715)
  File "stringsource", line 644, in View.MemoryView.memoryview_cwrapper (pandas/hashtable.c:29784)
  File "stringsource", line 345, in View.MemoryView.memoryview.__cinit__ (pandas/hashtable.c:26059)
ValueError: buffer source array is read-only

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/local/home/endrebak/anaconda3/lib/python3.5/tokenize.py", line 392, in find_cookie
    line_string = line.decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd0 in position 24: invalid continuation byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/local/home/endrebak/anaconda3/lib/python3.5/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/local/home/endrebak/anaconda3/lib/python3.5/site-packages/joblib-0.9.4-py3.5.egg/joblib/parallel.py", line 139, in __call__
    tb_offset=1)
  File "/local/home/endrebak/anaconda3/lib/python3.5/site-packages/joblib-0.9.4-py3.5.egg/joblib/format_stack.py", line 373, in format_exc
    frames = format_records(records)
  File "/local/home/endrebak/anaconda3/lib/python3.5/site-packages/joblib-0.9.4-py3.5.egg/joblib/format_stack.py", line 274, in format_records
    for token in generate_tokens(linereader):
  File "/local/home/endrebak/anaconda3/lib/python3.5/tokenize.py", line 514, in _tokenize
    line = readline()
  File "/local/home/endrebak/anaconda3/lib/python3.5/site-packages/joblib-0.9.4-py3.5.egg/joblib/format_stack.py", line 265, in linereader
    line = getline(file, lnum[0])
  File "/local/home/endrebak/anaconda3/lib/python3.5/linecache.py", line 16, in getline
    lines = getlines(filename, module_globals)
  File "/local/home/endrebak/anaconda3/lib/python3.5/linecache.py", line 47, in getlines
    return updatecache(filename, module_globals)
  File "/local/home/endrebak/anaconda3/lib/python3.5/linecache.py", line 136, in updatecache
    with tokenize.open(fullname) as fp:
  File "/local/home/endrebak/anaconda3/lib/python3.5/tokenize.py", line 456, in open
    encoding, lines = detect_encoding(buffer.readline)
  File "/local/home/endrebak/anaconda3/lib/python3.5/tokenize.py", line 433, in detect_encoding
    encoding = find_cookie(first)
  File "/local/home/endrebak/anaconda3/lib/python3.5/tokenize.py", line 397, in find_cookie
    raise SyntaxError(msg)
  File "<string>", line None
SyntaxError: invalid or missing encoding declaration for '/local/home/endrebak/anaconda3/lib/python3.5/site-packages/pandas/hashtable.cpython-35m-x86_64-linux-gnu.so'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/local/home/endrebak/anaconda3/bin/epic", line 4, in <module>
    __import__('pkg_resources').run_script('bioepic==0.1.8', 'epic')
  File "/local/home/endrebak/anaconda3/lib/python3.5/site-packages/setuptools-20.7.0-py3.5.egg/pkg_resources/__init__.py", line 719, in run_script
  File "/local/home/endrebak/anaconda3/lib/python3.5/site-packages/setuptools-20.7.0-py3.5.egg/pkg_resources/__init__.py", line 1504, in run_script
  File "/local/home/endrebak/anaconda3/lib/python3.5/site-packages/bioepic-0.1.8-py3.5.egg/EGG-INFO/scripts/epic", line 165, in <module>
    run_epic(args)
  File "/local/home/endrebak/anaconda3/lib/python3.5/site-packages/bioepic-0.1.8-py3.5.egg/epic/run/run_epic.py", line 42, in run_epic
    args.number_cores)
  File "/local/home/endrebak/anaconda3/lib/python3.5/site-packages/bioepic-0.1.8-py3.5.egg/epic/utils/helper_functions.py", line 55, in merge_chip_and_input
    for chip_df, input_df in zip(chip_dfs, input_dfs))
  File "/local/home/endrebak/anaconda3/lib/python3.5/site-packages/joblib-0.9.4-py3.5.egg/joblib/parallel.py", line 810, in __call__
    self.retrieve()
  File "/local/home/endrebak/anaconda3/lib/python3.5/site-packages/joblib-0.9.4-py3.5.egg/joblib/parallel.py", line 727, in retrieve
    self._output.extend(job.get())
  File "/local/home/endrebak/anaconda3/lib/python3.5/multiprocessing/pool.py", line 608, in get
    raise self._value
SyntaxError: invalid or missing encoding declaration for '/local/home/endrebak/anaconda3/lib/python3.5/site-packages/pandas/hashtable.cpython-35m-x86_64-linux-gnu.so

Usage with multiple treatments and controls

If I have ChIP experiments on multiple different treatment groups (lets say dosage A B C D and a control group) how should I input into epic? Do I list the different treatment in a single command and epic treats them separately? Or should I run epic for each treatment vs a control.

Also that brings me to confusion in wordage for the control flag. Is it reads from an actual ChIP input (i.e no antibody chromatin DNA) or the IP of an untreated control group?

Effective Genome Size

As part of a pipeline I am using your epic-effective script to calculate EGS prior to downstream analysis. I was going through your arguments and noticed that your -egs parameter requires a EGS between 0 and 1. As opposed to the typical ~2.8 billion that is used by macs2 and deeptools for human hg38. I tried running epic with the 2.8 billion number and didn't run into any errors.

Is this just because the parameter information is outdated or will this result in incorrect peak calling? I'm interested in knowing if i'll have to parse out extra information or if what I already have done is fine.

Thanks!

AssertionError: assert len(merged_df) == chip_df_nb_bins

Traceback (most recent call last):
  File "/usr/local/bin/epic", line 165, in <module>
    run_epic(args)
  File "/usr/local/lib/python2.7/dist-packages/epic/run/run_epic.py", line 42, in run_epic
    args.number_cores)
  File "/usr/local/lib/python2.7/dist-packages/epic/utils/helper_functions.py", line 37, in merge_chip_and_input
    for chip_df, input_df in zip(chip_dfs, input_dfs))
  File "/usr/local/lib/python2.7/dist-packages/joblib/parallel.py", line 754, in __call__
    while self.dispatch_one_batch(iterator):
  File "/usr/local/lib/python2.7/dist-packages/joblib/parallel.py", line 604, in dispatch_one_batch
    self._dispatch(tasks)
  File "/usr/local/lib/python2.7/dist-packages/joblib/parallel.py", line 567, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/usr/local/lib/python2.7/dist-packages/joblib/_parallel_backends.py", line 109, in apply_async
    result = ImmediateResult(func)
  File "/usr/local/lib/python2.7/dist-packages/joblib/_parallel_backends.py", line 322, in __init__
    self.results = batch()
  File "/usr/local/lib/python2.7/dist-packages/joblib/parallel.py", line 127, in __call__
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
  File "/usr/local/lib/python2.7/dist-packages/epic/utils/helper_functions.py", line 24, in _merge_chip_and_input
    assert len(merged_df) == chip_df_nb_bins
AssertionError

(from #30).

Concating dfs pandas error

Hi
I have received the following error

Concating dfs. (File: run_epic, Log level: INFO, Time: Wed, 24 May 2017 14:50:32 )
Traceback (most recent call last):
  File "/home/ubuntu/.local/bin/epic", line 219, in <module>
    run_epic(args)
  File "/usr/local/lib/python2.7/dist-packages/bioepic-0.1.25-py2.7.egg/epic/run/run_epic.py", line 60, in run_epic
    df = pd.concat([df for df in dfs if not df.empty])
  File "/usr/local/lib/python2.7/dist-packages/pandas/tools/merge.py", line 845, in concat
    copy=copy)
  File "/usr/local/lib/python2.7/dist-packages/pandas/tools/merge.py", line 878, in __init__
    raise ValueError('No objects to concatenate')
ValueError: No objects to concatenate

I am unsure if this an error in pandas or epic? Any ideas of how to fix this?
Thanks

Score & Fold_change? What do they mean and how to caculate them?

Epic outputs the calculation result in delim format. The result columns include Score and Fold_change. I am not sure how they are derived from. Do you plan to explain them in details? I check the Sicer website and the software readme, it is not mentioned. It seems a common knowledge I missed.

Google Groups

Would it be possible to create a google groups for questions?

I'd like to integrate automatic effective genome size estimation into my chip-seq pipeline, but am having a bit of difficulty and can't find an appropriate place to post this (besides maybe biostars, but it's more of a programming question than a bioinformatic question).

OverflowError: Cannot convert float infinity to integer

Command used:

Command executed:

  epic --treatment Control2_treatment.bed.gz --control Control2_control.bed.gz --genome hg19 --fragment-size 150 > Control2_epic.bed

Resulting error:

Command error:
  # epic --treatment Control2_treatment.bed.gz --control Control2_control.bed.gz --genome hg19 --fragment-size 150 (File: epic, Log level: INFO, Time: Thu, 08 Sep 2016 11:56:47 )

  gzip: stdout: Broken pipe
  Used first 10000 reads of Control2_treatment.bed.gz to estimate a median read length of 34.0
  Mean readlength: 33.3074, max readlength: 37, min readlength: 20. (File: find_readlength, Log level: INFO, Time: Thu, 08 Sep 2016 11:56:47 )
  Using an effective genome fraction of 0.810858412293. (File: genomes, Log level: INFO, Time: Thu, 08 Sep 2016 11:56:47 )
  Binning Control2_treatment.bed.gz (File: run_epic, Log level: INFO, Time: Thu, 08 Sep 2016 11:56:47 )
  Binning chromosomes 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, M, X, Y (File: count_reads_in_windows, Log level: INFO, Time: Thu, 08 Sep 2016 11:56:47 )
  Merging the bins on both strands per chromosome. (File: count_reads_in_windows, Log level: INFO, Time: Thu, 08 Sep 2016 12:00:12 )
  Binning Control2_control.bed.gz (File: run_epic, Log level: INFO, Time: Thu, 08 Sep 2016 12:00:13 )
  Binning chromosomes 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, M, X, Y (File: count_reads_in_windows, Log level: INFO, Time: Thu, 08 Sep 2016 12:00:13 )
  Merging the bins on both strands per chromosome. (File: count_reads_in_windows, Log level: INFO, Time: Thu, 08 Sep 2016 12:03:40 )
  Merging ChIP and Input data. (File: helper_functions, Log level: INFO, Time: Thu, 08 Sep 2016 12:03:40 )
  2510169508.0 effective_genome_length (File: compute_background_probabilites, Log level: DEBUG, Time: Thu, 08 Sep 2016 12:03:40 )
  200 window size (File: compute_background_probabilites, Log level: DEBUG, Time: Thu, 08 Sep 2016 12:03:40 )
  0 total chip count (File: compute_background_probabilites, Log level: DEBUG, Time: Thu, 08 Sep 2016 12:03:40 )
  0.0 average_window_readcount (File: compute_background_probabilites, Log level: DEBUG, Time: Thu, 08 Sep 2016 12:03:40 )
  1 island_enriched_threshold (File: compute_background_probabilites, Log level: DEBUG, Time: Thu, 08 Sep 2016 12:03:40 )
  4.0 gap_contribution (File: compute_background_probabilites, Log level: DEBUG, Time: Thu, 08 Sep 2016 12:03:40 )
  1.0 boundary_contribution (File: compute_background_probabilites, Log level: DEBUG, Time: Thu, 08 Sep 2016 12:03:40 )
  Traceback (most recent call last):
    File "/usr/local/anaconda2/bin/epic", line 4, in <module>
      __import__('pkg_resources').run_script('bioepic==0.1.6', 'epic')
    File "/usr/local/anaconda2/lib/python2.7/site-packages/pkg_resources.py", line 461, in run_script
      self.require(requires)[0].run_script(script_name, ns)
    File "/usr/local/anaconda2/lib/python2.7/site-packages/pkg_resources.py", line 1194, in run_script
      execfile(script_filename, namespace, namespace)
    File "/usr/local/anaconda2/lib/python2.7/site-packages/bioepic-0.1.6-py2.7.egg-info/scripts/epic", line 165, in <module>
      run_epic(args)
    File "/usr/local/anaconda2/lib/python2.7/site-packages/epic/run/run_epic.py", line 45, in run_epic
      compute_background_probabilities(nb_chip_reads, args)
    File "/usr/local/anaconda2/lib/python2.7/site-packages/epic/statistics/compute_background_probabilites.py", line 49, in compute_background_probabilities
      boundary_contribution, genome_length_in_bins)
    File "/usr/local/anaconda2/lib/python2.7/site-packages/epic/statistics/compute_score_threshold.py", line 24, in compute_score_threshold
      current_scaled_score = int(round(score / BIN_SIZE))
  OverflowError: cannot convert float infinity to integer

Any ideas?

Clarify Read Inputs and Differential Peak Calling

I hope that you can clarify the relationship of reads when supplying multiple treatment and control files to epic. For example, are multiple treatment files pooled (like MACS) or are they treated independently?

Also, I have interest in differential peak calling between treated and non-treated ChIP samples with corresponding input controls. Does epic have this ability like SICER-df.sh, or is there a mechanism to reproduce this functionality?

Thanks for the great work, I am enjoying using epic so far.

Multiprocessing exception at merging ChIP and Input

Unfortunately I need to report another problem:

Binning input.bedpe.bz2 (File: run_epic, Log level: INFO, Time: Sun, 07 Aug 2016 00:50:10 )
Binning chromosomes 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, M (File: count_reads_in_windows, Log level: INFO, Time: Sun, 07 Aug 2016 00:50:10 )
Merging ChIP and Input data. (File: helper_functions, Log level: INFO, Time: Sun, 07 Aug 2016 02:31:24 )
Traceback (most recent call last):
  File "/usr/local/bin/epic", line 165, in <module>
    run_epic(args)
  File "/usr/local/lib/python2.7/dist-packages/epic/run/run_epic.py", line 42, in run_epic
    args.number_cores)
  File "/usr/local/lib/python2.7/dist-packages/epic/utils/helper_functions.py", line 37, in merge_chip_and_input
    for chip_df, input_df in zip(chip_dfs, input_dfs))
  File "/usr/local/lib/python2.7/dist-packages/joblib/parallel.py", line 764, in __call__
    self.retrieve()
  File "/usr/local/lib/python2.7/dist-packages/joblib/parallel.py", line 715, in retrieve
    raise exception
joblib.my_exceptions.JoblibValueError: JoblibValueError
___________________________________________________________________________
Multiprocessing exception:
...........................................................................
/usr/local/bin/epic in <module>()
    160     elif not args.effective_genome_length and args.paired_end:
    161         logging.info("Using paired end so setting readlength to 100.")
    162         args.effective_genome_length = get_effective_genome_length(args.genome,
    163                                                                    100)
    164 
--> 165     run_epic(args)
    166 
    167 
    168 
    169 

...........................................................................
/usr/local/lib/python2.7/dist-packages/epic/run/run_epic.py in run_epic(args=Namespace(control=['/mnt/biggles/csc_home/piotr/...Rer7.goodChr.sorted.bedpe.bz2'], window_size=200))
     37 
     38     nb_chip_reads = get_total_number_of_reads(chip_merged_sum)
     39     nb_input_reads = get_total_number_of_reads(input_merged_sum)
     40 
     41     merged_dfs = merge_chip_and_input(chip_merged_sum, input_merged_sum,
---> 42                                       args.number_cores)
        args.number_cores = 8
     43 
     44     score_threshold, island_enriched_threshold, average_window_readcount = \
     45         compute_background_probabilities(nb_chip_reads, args)
     46 

...........................................................................
/usr/local/lib/python2.7/dist-packages/epic/utils/helper_functions.py in merge_chip_and_input(chip_dfs=[       Chromosome       Bin  Count
0            ... chr1  60348200      1

[412411 rows x 3 columns],        Chromosome       Bin  Count
0            ... chr2  60300400      4

[421692 rows x 3 columns],        Chromosome       Bin  Count
0            ... chr3  63268800     18

[435867 rows x 3 columns],        Chromosome       Bin  Count
0            ... chr4  62094400      5

[316130 rows x 3 columns],        Chromosome       Bin  Count
0            ... chr5  75682000      1

[535305 rows x 3 columns],        Chromosome       Bin  Count
0            ... chr6  59938600      6

[422058 rows x 3 columns],        Chromosome       Bin  Count
0            ... chr7  77275200      1

[527395 rows x 3 columns],        Chromosome       Bin  Count
0            ... chr8  56184600      8

[388639 rows x 3 columns],        Chromosome       Bin  Count
0            ... chr9  58231800      7

[432048 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr10  46591000      3

[328844 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr11  46661200      3

[323983 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr12  50697000      1

[340626 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr13  54093600      9

[398247 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr14  53733400      2

[392001 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr15  47442200      2

[322950 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr16  58780600      3

[409774 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr17  53983600      2

[393334 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr18  49877000      1

[344906 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr19  50254400      1

[366537 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr20  55951800      1

[395725 rows x 3 columns], ...], input_dfs=[        Chromosome       Bin  Count
0           ...chr1  60348200      7

[1119107 rows x 3 columns],         Chromosome       Bin  Count
0           ...chr2  60300400      8

[1134549 rows x 3 columns],         Chromosome       Bin  Count
0           ...chr3  63268800     24

[1122376 rows x 3 columns],        Chromosome       Bin  Count
0            ... chr4  62094400      6

[805058 rows x 3 columns],         Chromosome       Bin  Count
0           ...chr5  75682000      1

[1402420 rows x 3 columns],         Chromosome       Bin  Count
0           ...chr6  59938600      1

[1149075 rows x 3 columns],         Chromosome       Bin  Count
0           ...chr7  77275600      1

[1461920 rows x 3 columns],         Chromosome       Bin  Count
0           ...chr8  56184600     10

[1047948 rows x 3 columns],         Chromosome       Bin  Count
0           ...chr9  58232000      7

[1109349 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr10  46591000      8

[866951 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr11  46661200      3

[887944 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr12  50697000      1

[920637 rows x 3 columns],         Chromosome       Bin  Count
0           ...hr13  54093600      1

[1032122 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr14  53733600      2

[983984 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr15  47442200      5

[871981 rows x 3 columns],         Chromosome       Bin  Count
0           ...hr16  58780600      3

[1068652 rows x 3 columns],         Chromosome       Bin  Count
0           ...hr17  53983600      7

[1039873 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr18  49877000      1

[977370 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr19  50254400     12

[973110 rows x 3 columns],         Chromosome       Bin  Count
0           ...hr20  55951800      3

[1054667 rows x 3 columns], ...], nb_cpu=8)
     32     assert len(chip_dfs) == len(input_dfs)
     33 
     34     logging.info("Merging ChIP and Input data.")
     35     merged_chromosome_dfs = Parallel(n_jobs=nb_cpu)(
     36         delayed(_merge_chip_and_input)(chip_df, input_df)
---> 37         for chip_df, input_df in zip(chip_dfs, input_dfs))
        chip_dfs = [       Chromosome       Bin  Count
0            ... chr1  60348200      1

[412411 rows x 3 columns],        Chromosome       Bin  Count
0            ... chr2  60300400      4

[421692 rows x 3 columns],        Chromosome       Bin  Count
0            ... chr3  63268800     18

[435867 rows x 3 columns],        Chromosome       Bin  Count
0            ... chr4  62094400      5

[316130 rows x 3 columns],        Chromosome       Bin  Count
0            ... chr5  75682000      1

[535305 rows x 3 columns],        Chromosome       Bin  Count
0            ... chr6  59938600      6

[422058 rows x 3 columns],        Chromosome       Bin  Count
0            ... chr7  77275200      1

[527395 rows x 3 columns],        Chromosome       Bin  Count
0            ... chr8  56184600      8

[388639 rows x 3 columns],        Chromosome       Bin  Count
0            ... chr9  58231800      7

[432048 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr10  46591000      3

[328844 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr11  46661200      3

[323983 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr12  50697000      1

[340626 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr13  54093600      9

[398247 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr14  53733400      2

[392001 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr15  47442200      2

[322950 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr16  58780600      3

[409774 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr17  53983600      2

[393334 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr18  49877000      1

[344906 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr19  50254400      1

[366537 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr20  55951800      1

[395725 rows x 3 columns], ...]
        input_dfs = [        Chromosome       Bin  Count
0           ...chr1  60348200      7

[1119107 rows x 3 columns],         Chromosome       Bin  Count
0           ...chr2  60300400      8

[1134549 rows x 3 columns],         Chromosome       Bin  Count
0           ...chr3  63268800     24

[1122376 rows x 3 columns],        Chromosome       Bin  Count
0            ... chr4  62094400      6

[805058 rows x 3 columns],         Chromosome       Bin  Count
0           ...chr5  75682000      1

[1402420 rows x 3 columns],         Chromosome       Bin  Count
0           ...chr6  59938600      1

[1149075 rows x 3 columns],         Chromosome       Bin  Count
0           ...chr7  77275600      1

[1461920 rows x 3 columns],         Chromosome       Bin  Count
0           ...chr8  56184600     10

[1047948 rows x 3 columns],         Chromosome       Bin  Count
0           ...chr9  58232000      7

[1109349 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr10  46591000      8

[866951 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr11  46661200      3

[887944 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr12  50697000      1

[920637 rows x 3 columns],         Chromosome       Bin  Count
0           ...hr13  54093600      1

[1032122 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr14  53733600      2

[983984 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr15  47442200      5

[871981 rows x 3 columns],         Chromosome       Bin  Count
0           ...hr16  58780600      3

[1068652 rows x 3 columns],         Chromosome       Bin  Count
0           ...hr17  53983600      7

[1039873 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr18  49877000      1

[977370 rows x 3 columns],        Chromosome       Bin  Count
0           c...chr19  50254400     12

[973110 rows x 3 columns],         Chromosome       Bin  Count
0           ...hr20  55951800      3

[1054667 rows x 3 columns], ...]
     38     return merged_chromosome_dfs
     39 
     40 
     41 def get_total_number_of_reads(dfs):

...........................................................................
/usr/local/lib/python2.7/dist-packages/joblib/parallel.py in __call__(self=Parallel(n_jobs=8), iterable=<generator object <genexpr>>)
    759             if pre_dispatch == "all" or n_jobs == 1:
    760                 # The iterable was consumed all at once by the above for loop.
    761                 # No need to wait for async callbacks to trigger to
    762                 # consumption.
    763                 self._iterating = False
--> 764             self.retrieve()
        self.retrieve = <bound method Parallel.retrieve of Parallel(n_jobs=8)>
    765             # Make sure that we get a last message telling us we are done
    766             elapsed_time = time.time() - self._start_time
    767             self._print('Done %3i out of %3i | elapsed: %s finished',
    768                         (len(self._output), len(self._output),

---------------------------------------------------------------------------
Sub-process traceback:
---------------------------------------------------------------------------
ValueError                                         Sun Aug  7 02:31:27 2016
PID: 10981                                   Python 2.7.12: /usr/bin/python
...........................................................................
/usr/local/lib/python2.7/dist-packages/joblib/parallel.py in __call__(self=<joblib.parallel.BatchedCalls object>)
    122     def __init__(self, iterator_slice):
    123         self.items = list(iterator_slice)
    124         self._size = len(self.items)
    125 
    126     def __call__(self):
--> 127         return [func(*args, **kwargs) for func, args, kwargs in self.items]
        func = <function _merge_chip_and_input>
        args = (       Chromosome       Bin  Count
0            ... chr1  60348200      1

[412411 rows x 3 columns],         Chromosome       Bin  Count
0           ...chr1  60348200      7

[1119107 rows x 3 columns])
        kwargs = {}
        self.items = [(<function _merge_chip_and_input>, (       Chromosome       Bin  Count
0            ... chr1  60348200      1

[412411 rows x 3 columns],         Chromosome       Bin  Count
0           ...chr1  60348200      7

[1119107 rows x 3 columns]), {})]
    128 
    129     def __len__(self):
    130         return self._size
    131 

...........................................................................
/usr/local/lib/python2.7/dist-packages/epic/utils/helper_functions.py in _merge_chip_and_input(chip_df=       Chromosome       Bin  Count
0            ... chr1  60348200      1

[412411 rows x 3 columns], input_df=        Chromosome       Bin  Count
0           ...chr1  60348200      7

[1119107 rows x 3 columns])
     13 
     14     chip_df_nb_bins = len(chip_df)
     15     merged_df = chip_df.merge(input_df,
     16                               how="left",
     17                               on=["Chromosome", "Bin"],
---> 18                               suffixes=[" ChIP", " Input"])
     19     merged_df = merged_df[["Chromosome", "Bin", "Count ChIP", "Count Input"]]
     20     merged_df.columns = ["Chromosome", "Bin", "ChIP", "Input"]
     21 
     22     merged_df = merged_df.fillna(0)

...........................................................................
/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py in merge(self=       Chromosome       Bin  Count
0            ... chr1  60348200      1

[412411 rows x 3 columns], right=        Chromosome       Bin  Count
0           ...chr1  60348200      7

[1119107 rows x 3 columns], how='left', on=['Chromosome', 'Bin'], left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=[' ChIP', ' Input'], copy=True, indicator=False)
   4432               suffixes=('_x', '_y'), copy=True, indicator=False):
   4433         from pandas.tools.merge import merge
   4434         return merge(self, right, how=how, on=on, left_on=left_on,
   4435                      right_on=right_on, left_index=left_index,
   4436                      right_index=right_index, sort=sort, suffixes=suffixes,
-> 4437                      copy=copy, indicator=indicator)
        copy = True
        indicator = False
   4438 
   4439     def round(self, decimals=0, *args, **kwargs):
   4440         """
   4441         Round a DataFrame to a variable number of decimal places.

...........................................................................
/usr/local/lib/python2.7/dist-packages/pandas/tools/merge.py in merge(left=       Chromosome       Bin  Count
0            ... chr1  60348200      1

[412411 rows x 3 columns], right=        Chromosome       Bin  Count
0           ...chr1  60348200      7

[1119107 rows x 3 columns], how='left', on=['Chromosome', 'Bin'], left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=[' ChIP', ' Input'], copy=True, indicator=False)
     34           suffixes=('_x', '_y'), copy=True, indicator=False):
     35     op = _MergeOperation(left, right, how=how, on=on, left_on=left_on,
     36                          right_on=right_on, left_index=left_index,
     37                          right_index=right_index, sort=sort, suffixes=suffixes,
     38                          copy=copy, indicator=indicator)
---> 39     return op.get_result()
        op.get_result = <bound method _MergeOperation.get_result of <pandas.tools.merge._MergeOperation object>>
     40 if __debug__:
     41     merge.__doc__ = _merge_doc % '\nleft : DataFrame'
     42 
     43 

...........................................................................
/usr/local/lib/python2.7/dist-packages/pandas/tools/merge.py in get_result(self=<pandas.tools.merge._MergeOperation object>)
    212     def get_result(self):
    213         if self.indicator:
    214             self.left, self.right = self._indicator_pre_merge(
    215                 self.left, self.right)
    216 
--> 217         join_index, left_indexer, right_indexer = self._get_join_info()
        join_index = undefined
        left_indexer = undefined
        right_indexer = undefined
        self._get_join_info = <bound method _MergeOperation._get_join_info of <pandas.tools.merge._MergeOperation object>>
    218 
    219         ldata, rdata = self.left._data, self.right._data
    220         lsuf, rsuf = self.suffixes
    221 

...........................................................................
/usr/local/lib/python2.7/dist-packages/pandas/tools/merge.py in _get_join_info(self=<pandas.tools.merge._MergeOperation object>)
    348                                     sort=self.sort)
    349         else:
    350             (left_indexer,
    351              right_indexer) = _get_join_indexers(self.left_join_keys,
    352                                                  self.right_join_keys,
--> 353                                                  sort=self.sort, how=self.how)
        self.sort = False
        self.how = 'left'
    354             if self.right_index:
    355                 if len(self.left) > 0:
    356                     join_index = self.left.index.take(left_indexer)
    357                 else:

...........................................................................
/usr/local/lib/python2.7/dist-packages/pandas/tools/merge.py in _get_join_indexers(left_keys=[array(['chr1', 'chr1', 'chr1', ..., 'chr1', 'chr1', 'chr1'], dtype=object), memmap([    1400,     1600,     1800, ..., 60347800, 60348000, 60348200])], right_keys=[array(['chr1', 'chr1', 'chr1', ..., 'chr1', 'chr1', 'chr1'], dtype=object), memmap([    1400,     1600,     1400, ..., 60347800, 60348000, 60348200])], sort=False, how='left')
    541 
    542     # bind `sort` arg. of _factorize_keys
    543     fkeys = partial(_factorize_keys, sort=sort)
    544 
    545     # get left & right join labels and num. of levels at each location
--> 546     llab, rlab, shape = map(list, zip(* map(fkeys, left_keys, right_keys)))
        llab = undefined
        rlab = undefined
        shape = undefined
        fkeys = <functools.partial object>
        left_keys = [array(['chr1', 'chr1', 'chr1', ..., 'chr1', 'chr1', 'chr1'], dtype=object), memmap([    1400,     1600,     1800, ..., 60347800, 60348000, 60348200])]
        right_keys = [array(['chr1', 'chr1', 'chr1', ..., 'chr1', 'chr1', 'chr1'], dtype=object), memmap([    1400,     1600,     1400, ..., 60347800, 60348000, 60348200])]
    547 
    548     # get flat i8 keys from label lists
    549     lkey, rkey = _get_join_keys(llab, rlab, shape, sort)
    550 

...........................................................................
/usr/local/lib/python2.7/dist-packages/pandas/tools/merge.py in _factorize_keys(lk=memmap([    1400,     1600,     1800, ..., 60347800, 60348000, 60348200]), rk=memmap([    1400,     1600,     1400, ..., 60347800, 60348000, 60348200]), sort=False)
    708         lk = com._ensure_object(lk)
    709         rk = com._ensure_object(rk)
    710 
    711     rizer = klass(max(len(lk), len(rk)))
    712 
--> 713     llab = rizer.factorize(lk)
        llab = undefined
        rizer.factorize = <built-in method factorize of pandas.hashtable.Int64Factorizer object>
        lk = memmap([    1400,     1600,     1800, ..., 60347800, 60348000, 60348200])
    714     rlab = rizer.factorize(rk)
    715 
    716     count = rizer.get_count()
    717 

...........................................................................
/usr/local/lib/python2.7/dist-packages/pandas/hashtable.so in pandas.hashtable.Int64Factorizer.factorize (pandas/hashtable.c:15715)()
    854 
    855 
    856 
    857 
    858 
--> 859 
    860 
    861 
    862 
    863 

...........................................................................
/usr/local/lib/python2.7/dist-packages/pandas/hashtable.so in View.MemoryView.memoryview_cwrapper (pandas/hashtable.c:29784)()
    639 
    640 
    641 
    642 
    643 
--> 644 
    645 
    646 
    647 
    648 

...........................................................................
/usr/local/lib/python2.7/dist-packages/pandas/hashtable.so in View.MemoryView.memoryview.__cinit__ (pandas/hashtable.c:26059)()
    340 
    341 
    342 
    343 
    344 
--> 345 
    346 
    347 
    348 
    349 

ValueError: buffer source array is read-only
___________________________________________________________________________

EPIC stops if treatment and control are both input files

Because the way my pipeline is set up, it will typically run Input bed vs Input bed ... in MACS2 this is fine ... no peaks are called. But EPIC gets a command exit status 1:

Here is the output:

Command executed:

  epic --treatment Input_treatment.bed --control Input_control.bed --number-cores 3 --genome hg19 --fragment-size 150 > Input_epic.bed

Command exit status:
  1

Command output:
  (empty)

Command error:
      710 
      711     rizer = klass(max(len(lk), len(rk)))
      712 
  --> 713     llab = rizer.factorize(lk)
      714     rlab = rizer.factorize(rk)
      715 
      716     count = rizer.get_count()
      717 

  ...........................................................................
  /usr/local/anaconda2/lib/python2.7/site-packages/pandas/hashtable.so in pandas.hashtable.Int64Factorizer.factorize (pandas/hashtable.c:15715)()
      854 
      855 
      856 
      857 
      858 
  --> 859 
      860 
      861 
      862 
      863 

  ...........................................................................
  /usr/local/anaconda2/lib/python2.7/site-packages/pandas/hashtable.so in View.MemoryView.memoryview_cwrapper (pandas/hashtable.c:29784)()
      639 
      640 
      641 
      642 
      643 
  --> 644 
      645 
      646 
      647 
      648 

  ...........................................................................
  /usr/local/anaconda2/lib/python2.7/site-packages/pandas/hashtable.so in View.MemoryView.memoryview.__cinit__ (pandas/hashtable.c:26059)()
      340 
      341 
      342 
      343 
      344 
  --> 345 
      346 
      347 
      348 
      349 

  ValueError: buffer source array is read-only
  ___________________________________________________________________________

Error calculating readlength

I'm getting

# epic -t chip_A-sortsam.bam -c input_A-sortsam.bam -fs 300 (File: epic, Log level: INFO, Time: Thu, 06 Oct 2016 06:56:24 )
cat: write error: Broken pipe
Used first 10000 reads of chip_A-sortsam.bam to estimate a median read length of nan
Mean readlength: nan, max readlength: nan, min readlength: nan. (File: find_readlength, Log level: INFO, Time: Thu, 06 Oct 2016 06:56:25 )
# epic -t chip_A-sortsam.bam -c input_A-sortsam.bam -fs 300
Traceback (most recent call last):
  File "/home/users/balter/miniconda2/bin/epic", line 4, in <module>
    __import__('pkg_resources').run_script('bioepic==0.1.17', 'epic')
  File "/home/users/balter/miniconda2/lib/python2.7/site-packages/setuptools-23.0.0-py2.7.egg/pkg_resources/__init__.py", line 719, in run_script
  File "/home/users/balter/miniconda2/lib/python2.7/site-packages/setuptools-23.0.0-py2.7.egg/pkg_resources/__init__.py", line 1504, in run_script
  File "/home/users/balter/miniconda2/lib/python2.7/site-packages/bioepic-0.1.17-py2.7.egg-info/scripts/epic", line 190, in <module>
    closest_readlength = get_closest_readlength(estimated_readlength)
  File "/home/users/balter/miniconda2/lib/python2.7/site-packages/epic/utils/find_readlength.py", line 63, in get_closest_readlength
    if d == min_difference][0]
IndexError: list index out of range

BAM files:

ls -l
total 6622033
-rw-rw-r-- 1 balter CompBio 3387571391 Oct  6 05:59 chip_A-sortsam.bam
-rw-rw-r-- 1 balter CompBio 3387571391 Oct  6 06:34 input_A-sortsam.bam

Add integration test that tests with bam file.

Will create sam files and add to repo example files. In the integration test I'll call bedtools to convert into bam, then run the test. This is to avoid adding binary files to the repo.

Effective genome size for Rnor6

Hi
Could you please let me know, how I can numbers for effective genome size files in the for rat and pig?

And also for broad peaks like K9me2 what settings shall I use?

Thank you

custom chromosomes with . in name

Hi,

My chromosome names unfortunately contain full stops (chr1_v2.1, chr_v2.1 etc). When I use --chromsizes option and a file containing chromosome names and length I get an error. When I remove the full stops this is resolved, but the chromosome names are incorrect now.

In the Genome.py script:

chromosome_lengths = [l.split() for l in open(chromsizes).readlines()]

appears not just to be splitting with white space, the full stops are causing a problem here for some reason.
Any chance of a solution?

Thanks!

epic-effective cannot find jellyfish

Hi,

I have installed epic and jellyfish in a virtualenv:
virtualenv venv.epic_0.1.25
. /path/venv.epic_0.1.25/bin/activate
pip install bioepic
pip install jellyfish

When I try to run epic-effective as:
epic-effective --read-length=76 --nb-cpu=12 oviAri3_BASE.fa

I get the following error:
Temporary directory: /tmp/86174.1.C6320-512-haswell.q (File: effective_genome_size, Log level: INFO, Time: Tue, 14 Mar 2017 16:40:11 )
File analyzed: oviAri3_BASE.fa (File: effective_genome_size, Log level: INFO, Time: Tue, 14 Mar 2017 16:40:11 )
Genome length: 2587507083 (File: effective_genome_size, Log level: INFO, Time: Tue, 14 Mar 2017 16:40:11 )
/bin/sh: jellyfish: command not found
/bin/sh: jellyfish: command not found
Traceback (most recent call last):
File "/mnt/fls01-home01/user/virtualenvs/venv.epic_0.1.25/bin/epic-effective", line 38, in
effective_genome_size(fasta, read_length, nb_cpu, tmpdir)
File "/mnt/fls01-home01/user/virtualenvs/venv.epic_0.1.25/lib/python2.7/site-packages/epic/scripts/effective_genome_size.py", line 56, in effective_genome_size
shell=True)
File "/opt/gridware/depots/4baff5c5/el7/pkg/apps/python/2.7.8/gcc-4.8.5/lib/python2.7/subprocess.py", line 573, in check_output
raise CalledProcessError(retcode, cmd, output=output)
subprocess.CalledProcessError: Command 'jellyfish stats /tmp/86174.1.C6320-512-haswell.q/oviAri3_BASE.fa.jf' returned non-zero exit status 127
rm: cannot remove โ€˜/tmp/86174.1.C6320-512-haswell.q/oviAri3_BASE.fa.jfโ€™: No such file or directory

Thanks.

Create bigwig files for UCSC genome browser display?

I'd like to create files for displaying in the genome browser.

Need to find out:

  1. Should I create one track with pooled data from all the ChIP files or one bigwig file per ChIP file?

I have never used the GB for displaying data so pinging you @daler. Do you have an opinion/other related suggestions?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.