biocore-ntnu / epic Goto Github PK

View Code? Open in Web Editor NEW

31.0 7.0 6.0 11 MB

(DEPRECATED) epic: diffuse domain ChIP-Seq caller based on SICER

Home Page: http://bioepic.readthedocs.io

License: MIT License

Python 99.74% Shell 0.26%

sicer-algorithm bioinformatics chip-seq peak-caller sicer chip-seq-callers

epic's People

Contributors

Stargazers

Watchers

Forkers

vd4mmind darwinawardwinner bioinformaticsmaterials aloy3c palmercd rmsds

epic's Issues

Paired-end read support

Hi,
HiChip paper ( http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-15-280 ) describes how they added paired-end support to Sicer:

For paired-end reads, the HiChIP pipeline keeps the first end and extends by the fragment length estimated from mapping positions of the two ends, rather than by the average fragment length of the library. Given the variability of fragment lengths across a complex genome like human genome, the use of actual coordinates of mapped pairs is expected to achieve better resolution in signal visualization. The bed file is then used to generate a bedGraph file by the genomeCoverageBed command from BEDTools.

Best wishes,

Michal

Add instructions on how to filter files in README.md

Latest version of bioepic on anaconda is 0.0.6

Installed bioepic through bioconda and latest version is 0.0.6.

Make script to print system and library versions

Will be useful for debugging.

Use unique filenames for --sum-bigwig option

Running epic on multiple datasets with the same parameter directory location for --sum-bigwig will overwrite output of a previous run on a different dataset. It looks like the output file just get called chip_sum.bw and input_sum.bw. Can you change this to have a more unique name, perhaps by taking a --name parameter on the command line for an experiment name?

Add instructions how to add a new genome

I tried running epic on zebrafish data. For this I created a file:
/usr/local/lib/python2.7/dist-packages/epic/scripts/chromsizes/danRer7.chromsizes
containing chormosome sizes and run epic on paired-end data:

epic --treatment myReads.bedpe.bz2 \
    --control myInput.bedpe.bz2 \
    --number-cores 8 \
    --genome danRer7 \
    --effective_genome_length 0.9 \
    --paired-end

But I ran into problems, which I don't understand.


Merging ChIP and Input data. (File: helper_functions, Log level: INFO, Time: Tue, 05 Jul 2016 17:32:56 )
0.9effective_genome_length (File: compute_background_probabilites, Log level: DEBUG, Time: Tue, 05 Jul 2016 17:33:01 )
200 window size (File: compute_background_probabilites, Log level: DEBUG, Time: Tue, 05 Jul 2016 17:33:01 )
0 total chip count (File: compute_background_probabilites, Log level: DEBUG, Time: Tue, 05 Jul 2016 17:33:01 )
0.0average_window_readcount (File: compute_background_probabilites, Log level: DEBUG, Time: Tue, 05 Jul 2016 17:33:01 )
1island_enriched_threshold (File: compute_background_probabilites, Log level: DEBUG, Time: Tue, 05 Jul 2016 17:33:01 )
4.0gap_contribution (File: compute_background_probabilites, Log level: DEBUG, Time: Tue, 05 Jul 2016 17:33:01 )
1.0boundary_contribution (File: compute_background_probabilites, Log level: DEBUG, Time: Tue, 05 Jul 2016 17:33:01 )
Traceback (most recent call last):
  File "/usr/local/bin/epic", line 164, in <module>
    run_epic(args)
  File "/usr/local/lib/python2.7/dist-packages/epic/run/run_epic.py", line 45, in run_epic
    compute_background_probabilities(nb_chip_reads, args)
  File "/usr/local/lib/python2.7/dist-packages/epic/statistics/compute_background_probabilites.py", line 48, in compute_background_probabilities
    boundary_contribution, genome_length_in_bins)
  File "/usr/local/lib/python2.7/dist-packages/epic/statistics/compute_score_threshold.py", line 24, in compute_score_threshold
    current_scaled_score = int(round(score / BIN_SIZE))
OverflowError: cannot convert float infinity to integer

Could you please let me know how to proceed.

Piotr

Temporary Directory

I'm running to a problem where my /tmp folder is running out of space during a pipeline run. I was wondering if there was an option to choose where temporary files are stored for epic?

Bam support

I think it is a bad idea for the below reasons. Feel free to suggest solutions:

You will probably rerun the analyses many times. Having to run a time-consuming conversion step (the most time-consuming one in the algorithm) each time would be silly. It is also IO-intensive so parallell execution would not help much.

I am not just writing epic but a lot of helper scripts for ChIP-Seq and differential ChIP Seq. Adding a conversion step to bed in all of these before running the scripts would be a waste.

Also, where should I store the temporary bed files? Overflowing /tmp/ dirs is an eternal issue.

If I were to stream the data to bed using pipes, epic would not be fast anymore. I get a massive speedup from multiple cores if I use text files, presumably because the system knows it has the file in memory already. This is not the case if I start the pipe with bamToBed blabla | ...

There are many things that can go wrong when converting bam to bed, due to wonky bam files. I would get a bunch of github issues about "epic not being able to use my bam files" if I were to silently convert to bed within my programs.

Galaxy (usegalaxy.org) wrapper for epic

Hi,
Any plans to develop a Galaxy (usegalaxy.org) wrapper for epic?

Mic

Allow samtools filter flags + quality threshold on the command line?

Would allowing users to enter samtools flags or a fasta min quality threshold on the command line be helpful?

Opinions welcome!

Andaconda Update

EPIC on Anaconda is still at 1.6. I'd like to include and cite it as part of a docker container, unfortunately I require the latest updated version of EPIC which has access to the "-cs" and "-sbw" options.

Thank you! (And sorry for posting so much here)

printmatrix not working for bepde

Matrices are already merged to just contain "Count", not cols per infile.

epic - is it usual that analyses take time?

Using dm3 makes epic crash

Hi , I am very new to epic (I have installed it today) and when I am try to run it with multiple cores I get the following error (Running it at a single core appears unaffected until now - still running) . Please advice

Pantelis Topalis

epic -t ../sorted_H3_APAA.bed -c ../sorted_H3_BiB.bed --number-cores 16 -gn dm3 -w 200 -g 3 -fs 150 -fdr 0.05 -egs 0.72 -sm APAA_BiB_matrix

epic -t ../sorted_H3_APAA.bed -c ../sorted_H3_BiB.bed --number-cores 16 -gn dm3 -w 200 -g 3 -fs 150 -fdr 0.05 -egs 0.72 -sm APAA_BiB_matrix (File: epic, Log level: INFO, Time: Wed, 13 Jul 2016 21:22:54 )

Binning ../sorted_H3_APAA.bed (File: run_epic, Log level: INFO, Time: Wed, 13 Jul 2016 21:22:54 )
Binning chromosomes 2L, 2LHet, 2R, 2RHet, 3L, 3LHet, 3R, 3RHet, 4, M, U, Uextra, X, XHet, YHet (File: count_reads_in_windows, Log level: INFO, Time: Wed, 13 Jul 2016 21:22:54 )
Merging the bins on both strands per chromosome. (File: count_reads_in_windows, Log level: INFO, Time: Wed, 13 Jul 2016 21:23:05 )
Binning ../sorted_H3_BiB.bed (File: run_epic, Log level: INFO, Time: Wed, 13 Jul 2016 21:23:07 )
Binning chromosomes 2L, 2LHet, 2R, 2RHet, 3L, 3LHet, 3R, 3RHet, 4, M, U, Uextra, X, XHet, YHet (File: count_reads_in_windows, Log level: INFO, Time: Wed, 13 Jul 2016 21:23:07 )
Merging the bins on both strands per chromosome. (File: count_reads_in_windows, Log level: INFO, Time: Wed, 13 Jul 2016 21:23:19 )
Merging ChIP and Input data. (File: helper_functions, Log level: INFO, Time: Wed, 13 Jul 2016 21:23:22 )
Traceback (most recent call last):
File "/usr/local/bin/epic", line 165, in
run_epic(args)
File "/usr/local/lib/python2.7/dist-packages/epic/run/run_epic.py", line 42, in run_epic
args.number_cores)
File "/usr/local/lib/python2.7/dist-packages/epic/utils/helper_functions.py", line 37, in merge_chip_and_input
for chip_df, input_df in zip(chip_dfs, input_dfs))
File "/usr/local/lib/python2.7/dist-packages/joblib/parallel.py", line 764, in call
self.retrieve()
File "/usr/local/lib/python2.7/dist-packages/joblib/parallel.py", line 715, in retrieve
raise exception
joblib.my_exceptions.JoblibValueError: JoblibValueError

Multiprocessing exception:
...........................................................................
/usr/local/bin/epic in ()
160 elif not args.effective_genome_length and args.paired_end:
161 logging.info("Using paired end so setting readlength to 100.")
162 args.effective_genome_length = get_effective_genome_length(args.genome,
163 100)
164
--> 165 run_epic(args)
166
167
168
169

...........................................................................
/usr/local/lib/python2.7/dist-packages/epic/run/run_epic.py in run_epic(args=Namespace(control=['../sorted_H3_BiB.bed'], effe...tment=['../sorted_H3_APAA.bed'], window_size=200))
37
38 nb_chip_reads = get_total_number_of_reads(chip_merged_sum)
39 nb_input_reads = get_total_number_of_reads(input_merged_sum)
40
41 merged_dfs = merge_chip_and_input(chip_merged_sum, input_merged_sum,
---> 42 args.number_cores)
args.number_cores = 16
43
44 score_threshold, island_enriched_threshold, average_window_readcount =
45 compute_background_probabilities(nb_chip_reads, args)
46

...........................................................................
/usr/local/lib/python2.7/dist-packages/epic/utils/helper_functions.py in merge_chip_and_input(chip_dfs=[ Chromosome Bin Count
0 c...chr2L 22986200 3

[107160 rows x 3 columns], Chromosome Bin Count
0 chr2LHet 1... chr2LHet 367800 2

[539 rows x 3 columns], Chromosome Bin Count
0 chr... chr2R 21145600 3

[98361 rows x 3 columns], Chromosome Bin Count
0 chr2RHet ...chr2RHet 3277200 2

[5745 rows x 3 columns], Chromosome Bin Count
0 c...chr3L 24537000 1

[113832 rows x 3 columns], Chromosome Bin Count
0 chr3LHet ...chr3LHet 2547800 5

[5234 rows x 3 columns], Chromosome Bin Count
0 c...chr3R 27898600 4

[134035 rows x 3 columns], Chromosome Bin Count
0 chr3RHet ...chr3RHet 2517400 8

[4774 rows x 3 columns], Chromosome Bin Count
0 chr4 ... chr4 1285200 1

[5699 rows x 3 columns], Chromosome Bin Count
0 chrM 1800 ... chrM 8800 2
3 chrM 12000 1, Chromosome Bin Count
0 chrU... chrU 10047200 1

[8182 rows x 3 columns], Chromosome Bin Count
0 chrUextra...rUextra 29003800 1

[4023 rows x 3 columns], Chromosome Bin Count
0 ... chrX 22422000 17

[102257 rows x 3 columns], Chromosome Bin Count
0 chrXHet ... chrXHet 190000 3

[522 rows x 3 columns], Chromosome Bin Count
0 chrYHet ... chrYHet 341400 2

[471 rows x 3 columns]], input_dfs=[ Chromosome Bin Count
0 c...chr2L 22997600 1

[107540 rows x 3 columns], Chromosome Bin Count
0 chr2LHet 1... chr2LHet 367800 2

[577 rows x 3 columns], Chromosome Bin Count
0 chr... chr2R 21146200 1

[98668 rows x 3 columns], Chromosome Bin Count
0 chr2RHet ...chr2RHet 3277200 2

[5981 rows x 3 columns], Chromosome Bin Count
0 c...chr3L 24535600 18

[114448 rows x 3 columns], Chromosome Bin Count
0 chr3LHet ...chr3LHet 2547800 4

[5631 rows x 3 columns], Chromosome Bin Count
0 c...chr3R 27898600 4

[134479 rows x 3 columns], Chromosome Bin Count
0 chr3RHet ...chr3RHet 2517400 6

[5181 rows x 3 columns], Chromosome Bin Count
0 chr4 ... chr4 1318000 1

[5701 rows x 3 columns], Chromosome Bin Count
0 chrM 600 ... chrM 12000 2
4 chrM 12200 1, Chromosome Bin Count
0 ch... chrU 10043000 2

[10082 rows x 3 columns], Chromosome Bin Count
0 chrUextra...rUextra 29000400 1

[5706 rows x 3 columns], Chromosome Bin Count
0 ... chrX 22422200 3

[103662 rows x 3 columns], Chromosome Bin Count
0 chrXHet ... chrXHet 197200 1

[544 rows x 3 columns], Chromosome Bin Count
0 chrYHet ... chrYHet 341400 2

[574 rows x 3 columns]], nb_cpu=16)
32 assert len(chip_dfs) == len(input_dfs)
33
34 logging.info("Merging ChIP and Input data.")
35 merged_chromosome_dfs = Parallel(n_jobs=nb_cpu)(
36 delayed(_merge_chip_and_input)(chip_df, input_df)
---> 37 for chip_df, input_df in zip(chip_dfs, input_dfs))
chip_dfs = [ Chromosome Bin Count
0 c...chr2L 22986200 3