nboley / idr Goto Github PK
View Code? Open in Web Editor NEWIDR
License: GNU General Public License v2.0
IDR
License: GNU General Public License v2.0
Would be nice to have. Is there an accompanying paper?
Hi,
I want to use IDR to get highly repetitive peaks. For each biological repetition, we called peaks separately using SICER to get broad peaks. The output file of the software is similar to the BED file.
So, we run IDR with "--input-file-type bed". However, it failed to run successfully and the following error message is reported.
Traceback (most recent call last): File "/home/yuandy/anaconda3/bin/idr", line 4, in <module> __import__('pkg_resources').run_script('idr==2.0.3', 'idr') File "/home/yuandy/anaconda3/lib/python3.8/site-packages/pkg_resources/__init__.py", line 651, in run_script self.require(requires)[0].run_script(script_name, ns) File "/home/yuandy/anaconda3/lib/python3.8/site-packages/pkg_resources/__init__.py", line 1448, in run_script exec(code, namespace, namespace) File "/home/yuandy/anaconda3/lib/python3.8/site-packages/idr-2.0.3-py3.8-linux-x86_64.egg/EGG-INFO/scripts/idr", line 10, in <module> idr.idr.main() File "/home/yuandy/anaconda3/lib/python3.8/site-packages/idr-2.0.3-py3.8-linux-x86_64.egg/idr/idr.py", line 839, in main merged_peaks, signal_type = load_samples(args) File "/home/yuandy/anaconda3/lib/python3.8/site-packages/idr-2.0.3-py3.8-linux-x86_64.egg/idr/idr.py", line 720, in load_samples raise ValueError("For bed files --signal-type must either "\ ValueError: For bed files --signal-type must either be set to score or an index specifying the column to use.
How should we set parameters when running with a BED file, and are there any other requirements for the BED file itself?
Looking forward to your reply!Thanks!
README.md says
Get the current repo
wget https://github.com/nboley/idr/archive/2.0.2.zip
[...]
Command Line Arguments
[...]
--use-best-multisummit-IDR
Set the IDR value for a group of multi summit peaks (a group of peaks with the same chr/start/stop but different summits) to the best value across all of these peaks. This \
is a work around for peak callers that don't do a good job splitting scores across multi summit peaks (e.g. MACS). If set in conjunction with --plot two plots will be created - one with alternate summits and one without. Use this option with care.
This option is not present in v. 2.0.2.
2.0.2 has bugs as well. Why not point to 2.0.3 or simply ask to use git clone?
I wasted 2h+ on trying to run 2.0.2 which was crashing. See #44
Hi,
I am just running the manual code: idr --samples peak1 peak2 and this is the output:
Initial parameter values: [0.10 1.00 0.20 0.50]
Final parameter values: [1.57 1.26 0.89 0.41]
Number of reported peaks - 50537/50537 (100.0%)
Number of peaks passing IDR cutoff of 0.05 - 0/50537 (0.0%)
I found the same issue using my own peaks. Which can be the problem?
Thank you in advance,
Andrés
idr --version and idr --help both create an empty idrValues.txt in the current directory, which you probably don't really want to do. More significantly, if you run idr --version or idr --help in a directly that already has an idrValues.txt file, it clobbers it.
Hi,
I'm hoping I can get some answers even though this github looks pretty deserted.
I ran the idr using input files that looks like this:
track type=narrowPeak name="peaks1.bed" description="peaks1.bed" nextItemButton=on
chr1 12237 15856 peaks1.bed_narrowPeak1 367 . 0.00000 0.00000 0.00000 905
chr1 16017 16585 peaks1.bed_narrowPeak2 90 . 0.00000 0.00000 0.00000 425
The fifth column is -log10(pvalue) of peaks.
I ran idr using the following command:
idr --samples peaks1.bed peaks2.bed \
--input-file-type bed \
--rank 5 \
--output-file peaks1_peaks2-idr \
--plot \
--log-output-file peaks1_peaks2.idr.log
The output I got is this:
chr3 122863912 122880767 . 1000 . 5.00 5.00 122863912 122880767 58149.00000 122864004 122880714 30237.00000
chr18 10456599 10467214 . 1000 . 5.00 5.00 10456599 10467214 49558.00000 10456626 10467173 20877.00000
What are the columns? This output does not have the same number of columns as the narrowpeak output file.
I ran IDR on the list of peaks where some of the summit values are -1, and I got the following error:
File "/mnt/silencer2/home/yanxiazh/.local/lib/python3.4/site-packages/idr-2.0.3-py3.4-linu x-x86_64.egg/idr/idr.py", line 222, in merge_peaks_in_contig
all_intervals.sort()
TypeError: unorderable types: NoneType() < int()
It looks like the error is because "-1" is converted to None when the files were loaded. And sorting values containing "None" will result in an error in Python3. Is there a way to fix this error?
-Yanxiao
Hi Nathan,
I am wondering if the IDR can be installed on windows 7. Is there a way that I can follow for a smooth installation.
ajeet
Hi,
I am trying to run IDR with the --use-best-multisummit-IDR flag. When I do so, I get an error that the --use-best-multisummit-IDR argument is not recognized. The help function of idr-2.0.2 does not contain the --use-best-multisummit-IDR flag, while it is mentioned in the README.md here (in github). Can you please let me know if you have removed or replaced the flag with some other option?
In total there are 3 options that do not show up in the help function while they are still mentioned in the README.md (--dont-filter-peaks-below-noise-mean, --use-best-multisummit-IDR and --allow-negative-scores).
Thank you.
/Klev
When --rank is specified (p.value, signal.value or q.value), no peaks are called. If it is not specified, signal.value is used but for MACS2 called peaks p.value needs to be used.
There should be an option to plot the results.
Hi, I am using idr to process 2 replicate bed files but getting an index error. copying command and bash response
idr --samples dyadvscolrep1_c3.0_common.bed dyadvscolrep2_c3.0_common.bed
/usr/local/bin/idr --samples dyadvscolrep1_c3.0_common.bed dyadvscolrep2_c3.0_common.bed
Traceback (most recent call last):
File "/usr/local/bin/idr", line 4, in
import('pkg_resources').run_script('idr==2.0.3', 'idr')
File "/usr/local/lib/python3.6/site-packages/pkg_resources/init.py", line 743, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/usr/local/lib/python3.6/site-packages/pkg_resources/init.py", line 1498, in run_script
exec(code, namespace, namespace)
File "/usr/local/lib/python3.6/site-packages/idr-2.0.3-py3.6-linux-x86_64.egg/EGG-INFO/scripts/idr", line 10, in
idr.idr.main()
File "/usr/local/lib/python3.6/site-packages/idr-2.0.3-py3.6-linux-x86_64.egg/idr/idr.py", line 839, in main
merged_peaks, signal_type = load_samples(args)
File "/usr/local/lib/python3.6/site-packages/idr-2.0.3-py3.6-linux-x86_64.egg/idr/idr.py", line 703, in load_samples
for fp in args.samples]
File "/usr/local/lib/python3.6/site-packages/idr-2.0.3-py3.6-linux-x86_64.egg/idr/idr.py", line 703, in
for fp in args.samples]
File "/usr/local/lib/python3.6/site-packages/idr-2.0.3-py3.6-linux-x86_64.egg/idr/idr.py", line 53, in load_bed
signal = float(data[signal_index])
IndexError: list index out of range
I've been reading idr
sources to complement my understanding of the paper by Li et al. One thing that caught my attention is predefined parameter ranges ranges in __init__.py
. The paper doesn't mention any restrictions, so I would appreciate it if you could comment on why these are required and how the ranges were chosen for each parameter?
Hello,
I am having a weird issue with IDR.
I am running the command with 2 .narrowPeak files as input, p.value in --rank
and --plot
option on (code below). Another strange thing that is happening is that the option --output-file-type narrowPeak
cannot be detected.
idr --samples 03_macs2/${rep1}/${rep1}_peaks.sort.narrowPeak 03_macs2/${rep2}/${rep2}_peaks.sort.narrowPeak \
--input-file-type narrowPeak \
--rank p.value \
--soft-idr-threshold 0.1 \
-o 05_IDR/${condition}/${condition}.idr \
--log-output-file 05_IDR/${condition}/log_${condition}.idr.log \
--plot
The problem here is that the output file that IDR is returning is empty. The plot seems to be correct. Below there is the log file:
Initial parameter values: [0.10 1.00 0.20 0.50]
Final parameter values: [1.52 1.18 0.81 0.73]
When I activate the option --only-merge-peaks
it returns a list with all the merged peaks. However, all those peaks in the list have a score of 0 and a -log10(local_IDR_value) of 0. I am giving the log file below:
Number of reported peaks - 20206/20206 (100.0%)
Number of peaks passing IDR cutoff of 0.1 - 0/20206 (0.0%)
I know it is not a problem from the samples, because I tried with other files that were already analyzed by another bioinformatician that had a correct output (and I also tried with the code used to analyze them) and the error seems to be the same.
This is the error that is printed in the terminal:
Traceback (most recent call last):
File "/hpcnfs/home/ieo5089/miniconda3/envs/idr/bin/idr", line 4, in <module>
__import__('pkg_resources').run_script('idr==2.0.2', 'idr')
File "/hpcnfs/home/ieo5089/miniconda3/envs/idr/lib/python3.7/site-packages/pkg_resources/__init__.py", line 666, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/hpcnfs/home/ieo5089/miniconda3/envs/idr/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1462, in run_script
exec(code, namespace, namespace)
File "/hpcnfs/home/ieo5089/miniconda3/envs/idr/lib/python3.7/site-packages/idr-2.0.2-py3.7-linux-x86_64.egg/EGG-INFO/scripts/idr", line 10, in <module>
idr.idr.main()
File "/hpcnfs/home/ieo5089/miniconda3/envs/idr/lib/python3.7/site-packages/idr-2.0.2-py3.7-linux-x86_64.egg/idr/idr.py", line 774, in main
useBackwardsCompatibleOutput=args.use_old_output_format)
File "/hpcnfs/home/ieo5089/miniconda3/envs/idr/lib/python3.7/site-packages/idr-2.0.2-py3.7-linux-x86_64.egg/idr/idr.py", line 415, in write_results_to_file
if localIDRs == None or IDRs == None:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
If someone could help me solve the problem, it would be extremely useful for me. Thank you.
An option to keep not only the coordinates but also the peaknames would be useful.
Hello,
Trying to use idr tool with this command:
/share/apps/idr/5fbd010/bin/idr
--samples
BRD2_minus_JQ1_1.MACS2_peaks.narrowPeak.IDR_ready.temp.Peak BRD2_minus_JQ1_2.MACS2_peaks.narrowPeak.IDR_ready.temp.Peak
--input-file-type narrowPeak
--rank p.value
--output-file test.merged.idr.bed
--soft-idr-threshold 0.05
--plot
--use-nonoverlapping-peaks
--peak-merge-method min
--verbose
Getting the following error:
Please see attached file:
TKXL.test.IDR.e293603.txt
It works fine without any errors if I drop the --use-nonoverlapping-peaks option.
My files are simple MACS2 narrowPeak. idr fails on these. There is an empty "idrValues.txt" file created
$ idr --samples s1.narrowPeak s2.narrowPeak
Initial parameter values: [0.10 1.00 0.20 0.50]
Final parameter values: [1.99 1.40 0.91 0.57]
idr --samples s1.narrowPeak s2.narrowPeak
Traceback (most recent call last):
File "idr", line 10, in <module>
idr.idr.main()
File "idr.py", line 774, in main
useBackwardsCompatibleOutput=args.use_old_output_format)
File "idr.py", line 415, in write_results_to_file
if localIDRs == None or IDRs == None:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
$ python3 --version
Python 3.7.2+
Does the program support idr calculation considering more that two samples?
When I enter three files for the --samples argument, I get the following error message:
idr: error: unrecognized arguments: '_file name associated with a third sample_'
Thanks,
Tal
When I run idr on peak lists produced by MACS on my data, I get some bizarre-looking plots:
The plots look like fractal versions of the typical IDR plots shown here: http://ccg.vital-it.ch/var/sib_april15/cases/landt12/idr.html
Do you have any idea what might be causing this? I'm invoking the script as:
idr --samples sample1.narrowPeak sample2.narrowPeak \
--peak-list oraclepeaks.narrowPeak --input-file-type narrowPeak \
--output-file idrValues.txt --output-file-type narrowPeak \
--log-output-file idr.log --plot --random-seed 1986
I can share my peak files if you want them.
The old plots (eg num of sig. peaks against IDR) were useful to see whether certain conditions (antibodies) give more reproducible peaks in replicates than others. Would be helpful if they can be included again.
When I call idr thusly: idr --verbose --samples $1 $2 --input-file-type narrowPeak --rank p.value -o $outdir/$outfile 2>$outdir/idr-errors.txt
IDR raises a Value Error about the column I'm using to rank my peaks:
Loading the peak files
Traceback (most recent call last):
File "/Users/zamparol/anaconda/envs/py3/lib/python3.5/site-packages/idr-2.0.3-py3.5-macosx-10.6-x86_64.egg/idr/idr.py", line 717, in load_samples
signal_index = int(args.rank) - 1
ValueError: invalid literal for int() with base 10: 'p.value'
I'm trying to rank by p.value of narrowPeak files. The usage string (and other issues in this repo) suggest that --rank p.value
is the proper way to indicate ranking by P value. Any ideas why I'm seeing this?
My version:
(py3) mski1743:day4 zamparol$ idr --version
IDR 2.0.3
(py3) mski1743:day4 zamparol$ python --version
Python 3.5.2 :: Continuum Analytics, Inc.
Hi,
I'm pretty new to ChIP-Seq analysis and have been trying to install idr on my MAC. I downloaded the idr package along with python3.6.1 |Anaconda 4.4.0. However when I tried 'idr -h' command I received this error:
Traceback (most recent call last):
File "/Users/benflynn_1/anaconda3/bin/idr", line 4, in
import('pkg_resources').run_script('idr==2.0.3', 'idr')
File "/Users/benflynn_1/anaconda3/lib/python3.6/site-packages/setuptools-27.2.0-py3.6.egg/pkg_resources/init.py", line 744, in run_script
File "/Users/benflynn_1/anaconda3/lib/python3.6/site-packages/setuptools-27.2.0-py3.6.egg/pkg_resources/init.py", line 1499, in run_script
File "/Users/benflynn_1/anaconda3/lib/python3.6/site-packages/idr-2.0.3-py3.6-macosx-10.7-x86_64.egg/EGG-INFO/scripts/idr", line 8, in
import idr.idr
ModuleNotFoundError: No module named 'idr.idr'
I have tried re-installing idr but am still receiving the same error. Does anyone know what has gone wrong and how to fix this as I can't find a solution with simple googling?
Cheers,
Ben
Hi,
I performed IDR analysis on peaks called by MACS2 with the following command line:
idr --samples ${peakDir}/rep1_sorted_peaks.narrowPeak ${peakDir}/rep2_sorted_peaks.narrowPeak \ --input-file-type narrowPeak \ --rank p.value \ --output-file regular_model-idr \ --plot \ --log-output-file regular_model.idr.log
The IDR results show the peak list ranked for pvalue
and set signalValue
and qvalue
to -1.
Now, I would like to retrieve the signalValue
(~fold enrichment
) for each peak regions from original MACS2 output, i.e.
Chr | Start | End | Name | scaledIDR | Strand | signalValue | pvalue | qvalue | peak | globalIDR | localIDR | rep1_Start | rep1_End | rep1_signalValue | rep1_summit | rep2_Start | rep2_End | rep2_signalValue | rep2_summit | fold_enrichment_rep1 | pvalue_1 | fold_enrichment_rep2 | pvalue_2 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
chr10 | 38815703 | 38818793 | . | 288 | . | -1 | 8.68942 | -1 | 2374 | 0.012076 | 0.695235 | 38815703 | 38818793 | 8.68942 | 1449 | 38815732 | 38818791 | 11.32801 | 1501 | 1.7965 | 8.68942 | 2.2494 | 11.32801 |
chr10 | 38815703 | 38818793 | . | 288 | . | -1 | 8.68942 | -1 | 2374 | 0.012076 | 0.695235 | 38815703 | 38818793 | 8.68942 | 1449 | 38815732 | 38818791 | 11.32801 | 1501 | 1.7965 | 8.68942 | 2.42635 | 18.19401 |
chr10 | 38815703 | 38818793 | . | 288 | . | -1 | 8.68942 | -1 | 2374 | 0.012076 | 0.695235 | 38815703 | 38818793 | 8.68942 | 1449 | 38815732 | 38818791 | 11.32801 | 1501 | 1.7965 | 8.68942 | 2.15729 | 14.41833 |
chr10 | 38815703 | 38818793 | . | 288 | . | -1 | 8.68942 | -1 | 2374 | 0.012076 | 0.695235 | 38815703 | 38818793 | 8.68942 | 1449 | 38815732 | 38818791 | 11.32801 | 1501 | 1.7965 | 8.68942 | 2.14088 | 13.07617 |
chr10 | 38815703 | 38818793 | . | 288 | . | -1 | 8.68942 | -1 | 2374 | 0.012076 | 0.695235 | 38815703 | 38818793 | 8.68942 | 1449 | 38815732 | 38818791 | 11.32801 | 1501 | 1.7965 | 8.68942 | 2.27126 | 17.29216 |
chr10 | 38815703 | 38818793 | . | 288 | . | -1 | 8.68942 | -1 | 2374 | 0.012076 | 0.695235 | 38815703 | 38818793 | 8.68942 | 1449 | 38815732 | 38818791 | 11.32801 | 1501 | 1.7965 | 8.68942 | 2.23416 | 22.68965 |
chr10 | 38815703 | 38818793 | . | 288 | . | -1 | 8.68942 | -1 | 2374 | 0.012076 | 0.695235 | 38815703 | 38818793 | 8.68942 | 1449 | 38815732 | 38818791 | 11.32801 | 1501 | 1.7965 | 8.68942 | 2.26469 | 20.03755 |
chr10 | 38815703 | 38818793 | . | 288 | . | -1 | 8.68942 | -1 | 2374 | 0.012076 | 0.695235 | 38815703 | 38818793 | 8.68942 | 1449 | 38815732 | 38818791 | 11.32801 | 1501 | 1.7965 | 8.68942 | 2.35859 | 25.66975 |
chr10 | 38815703 | 38818793 | . | 288 | . | -1 | 8.68942 | -1 | 2374 | 0.012076 | 0.695235 | 38815703 | 38818793 | 8.68942 | 1449 | 38815732 | 38818791 | 11.32801 | 1501 | 2.47086 | 30.05785 | 2.2494 | 11.32801 |
chr10 | 38815703 | 38818793 | . | 288 | . | -1 | 8.68942 | -1 | 2374 | 0.012076 | 0.695235 | 38815703 | 38818793 | 8.68942 | 1449 | 38815732 | 38818791 | 11.32801 | 1501 | 2.47086 | 30.05785 | 2.42635 | 18.19401 |
chr10 | 38815703 | 38818793 | . | 288 | . | -1 | 8.68942 | -1 | 2374 | 0.012076 | 0.695235 | 38815703 | 38818793 | 8.68942 | 1449 | 38815732 | 38818791 | 11.32801 | 1501 | 2.47086 | 30.05785 | 2.15729 | 14.41833 |
chr10 | 38815703 | 38818793 | . | 288 | . | -1 | 8.68942 | -1 | 2374 | 0.012076 | 0.695235 | 38815703 | 38818793 | 8.68942 | 1449 | 38815732 | 38818791 | 11.32801 | 1501 | 2.47086 | 30.05785 | 2.14088 | 13.07617 |
chr10 | 38815703 | 38818793 | . | 288 | . | -1 | 8.68942 | -1 | 2374 | 0.012076 | 0.695235 | 38815703 | 38818793 | 8.68942 | 1449 | 38815732 | 38818791 | 11.32801 | 1501 | 2.47086 | 30.05785 | 2.27126 | 17.29216 |
chr10 | 38815703 | 38818793 | . | 288 | . | -1 | 8.68942 | -1 | 2374 | 0.012076 | 0.695235 | 38815703 | 38818793 | 8.68942 | 1449 | 38815732 | 38818791 | 11.32801 | 1501 | 2.47086 | 30.05785 | 2.23416 | 22.68965 |
chr10 | 38815703 | 38818793 | . | 288 | . | -1 | 8.68942 | -1 | 2374 | 0.012076 | 0.695235 | 38815703 | 38818793 | 8.68942 | 1449 | 38815732 | 38818791 | 11.32801 | 1501 | 2.47086 | 30.05785 | 2.26469 | 20.03755 |
chr10 | 38815703 | 38818793 | . | 288 | . | -1 | 8.68942 | -1 | 2374 | 0.012076 | 0.695235 | 38815703 | 38818793 | 8.68942 | 1449 | 38815732 | 38818791 | 11.32801 | 1501 | 2.47086 | 30.05785 | 2.35859 | 25.66975 |
chr10 | 38815703 | 38818793 | . | 288 | . | -1 | 8.68942 | -1 | 2374 | 0.012076 | 0.695235 | 38815703 | 38818793 | 8.68942 | 1449 | 38815732 | 38818791 | 11.32801 | 1501 | 1.94094 | 13.9285 | 2.2494 | 11.32801 |
chr10 | 38815703 | 38818793 | . | 288 | . | -1 | 8.68942 | -1 | 2374 | 0.012076 | 0.695235 | 38815703 | 38818793 | 8.68942 | 1449 | 38815732 | 38818791 | 11.32801 | 1501 | 1.94094 | 13.9285 | 2.42635 | 18.19401 |
chr10 | 38815703 | 38818793 | . | 288 | . | -1 | 8.68942 | -1 | 2374 | 0.012076 | 0.695235 | 38815703 | 38818793 | 8.68942 | 1449 | 38815732 | 38818791 | 11.32801 | 1501 | 1.94094 | 13.9285 | 2.15729 | 14.41833 |
chr10 | 38815703 | 38818793 | . | 288 | . | -1 | 8.68942 | -1 | 2374 | 0.012076 | 0.695235 | 38815703 | 38818793 | 8.68942 | 1449 | 38815732 | 38818791 | 11.32801 | 1501 | 1.94094 | 13.9285 | 2.14088 | 13.07617 |
chr10 | 38815703 | 38818793 | . | 288 | . | -1 | 8.68942 | -1 | 2374 | 0.012076 | 0.695235 | 38815703 | 38818793 | 8.68942 | 1449 | 38815732 | 38818791 | 11.32801 | 1501 | 1.94094 | 13.9285 | 2.27126 | 17.29216 |
... |
and the related IDR merged peak
Chr | Start | End | Name | scaledIDR | Strand | signalValue | pvalue | qvalue | peak | globalIDR | localIDR | rep1_Start | rep1_End | rep1_signalValue | rep1_summit | rep2_Start | rep2_End | rep2_signalValue | rep2_summit |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
chr10 | 38815703 | 38818793 | . | 288 | . | -1 | 8.68942 | -1 | 2374 | 0.012076 | 0.695235 | 38815703 | 38818793 | 8.68942 | 1449 | 38815732 | 38818791 | 11.32801 | 1501 |
I notice that rep1_signalValue
and rep2_signalValue
fields report the min(pvalue)
for each replicate, respectively, and the signalValue
reports the min(pvalue)
between the two replicates.
How the value of the signalValue/fold_enrichment
should be estimated for each IDR merged peak?
Hi,
I've successfully installed idr in my linux server.
When I call idr from command line in the idr-2.0.2/ path, it's fine, but when call idr in other path, I got an error:
Traceback (most recent call last):
File "/home/wszheng/Install/Python-3.6.1/bin/idr", line 4, in
import('pkg_resources').run_script('idr==2.0.2', 'idr')
File "/home/wszheng/Install/Python-3.6.1/lib/python3.6/site-packages/pkg_resources/init.py", line 654, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/home/wszheng/Install/Python-3.6.1/lib/python3.6/site-packages/pkg_resources/init.py", line 1434, in run_script
exec(code, namespace, namespace)
File "/home/wszheng/Install/Python-3.6.1/lib/python3.6/site-packages/idr-2.0.2-py3.6-linux-x86_64.egg/EGG-INFO/scripts/idr", line 8, in
import idr.idr
ModuleNotFoundError: No module named 'idr.idr'
Does anyone know what's the cause of this problem?
Hi,
I have run IDR and looking at my output, trying to understand why I have peaks that are IDR < 0.05 ( based on column 11) yet the corresponding score in column 5 does not match based on what is described in the manual:
peaks with an IDR of 0.05 have a score of int(-125log2(0.05)) = 540
Is it global IDR or local IDR that is used here? Additionally, in my output column 12 is the beginning of the coordinates for the first replicate. Am I missing something here?
Thanks,
Meeta
B_pa_vs_C_pa-idr.txt
Hi,
I'm confusing about how to get the reproducible peaks from replicates of histone data. The IDR is recommended in ENCODE pipeline for TF ChIPseq data. How about histone data? Can I continue to use IDR to do the same task?
I'm looking forward your reply.
Thanks~
Hanwen
I've been facing an issue with a subset of my ChIP-seq data. The idr output, in the "old output format" seems to change the peak boundaries of the input files. Here's an example -
idr --samples peaks1.narrowPeak peaks2.narrowPeak --use-old-output-format -o test-overlapped-peaks.txt
One of the peaks output in test-overlapped-peaks.txt
is chr1 84274822 84276079
:
chr1 84274822 84276079 . 130.52803 84274772 84275909 . 108.03337 0.00000 0.00000 .
But when I run through my original input (from peaks1.narrowPeak), I found that the actual peak coordinates were - chr1 84274822 84275322
Out of 4477 peaks in my input files, I found that a total of 219 peak boundaries have a wrong end point, like the peak shown above. Note that peak merging is off in the command that I ran.
Curiously, when I looked at the input replicate which had this peak (peaks1.narrowPeak
), here is what I saw -
chr1 84274822 84275322 peak_2017 2392 . 46.13288 243.62022 239.28380 265
chr1 84275490 84276079 peak_2018 4054 . 84.39515 411.44415 405.43530 215
Note that the peak_2017 chr1:84274822-84275322
is present as chr1:84274822-84276079
in the IDR output. 84276079
is the peak end point of the next peak i.e. peak_2018.
Also, I found that the IDR values reported in the output shown above is 0. Is there any way you could increase the number of significant places displayed? Perhaps displaying the -log10 values instead?
I tried runnning the idr command with --input-file-type narrowPeak
, but I get the same output.
Thanks for this great tool.
We are finding that the value of IDR changes based on the order of peak1 and peak2 in the --samples argument. For example, "--samples peak1 peak2" may give an IDR of 3.538 while "--samples peak2 peak1" gives an IDR of 3.544. Is this expected behavior, and may I ask why it occurs?
Many thanks.
I installed lastest IDR (2.0.3) with:
python3 setup.py install --/mnt/data0/lizhidan/software/idr-2.0.3
then, ran:
idr
got:
File "/home/lizhidan/.local/bin/idr", line 4, in <module> __import__('pkg_resources').run_script('idr==2.0.3', 'idr') File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 3254, in <module> def _initialize_master_working_set(): File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 3237, in _call_aside f(*args, **kwargs) File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 3266, in _initialize_master_working_set working_set = WorkingSet._build_master() File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 584, in _build_master ws.require(__requires__) File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 901, in require needed = self.resolve(parse_requirements(requirements)) File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 787, in resolve raise DistributionNotFound(req, requirers) pkg_resources.DistributionNotFound: The 'idr==2.0.3' distribution was not found and is required by the application
Can someone tell how to fix it?
I was running idr on rep1 and rep2 and found out some of the output for rep2 do not match the rep2 narrowPeak file.
I'm copying one of the offending line below. As you can see, the rep2 end location and summit info are not correct. While the rep1 info are correct. This makes me worry about the results. I'm hoping someone can respond. Thanks!
idr file:
chr7 29413398 29416918 . 1000 . -1 213.82509 -1 2529 2.46 2.73 29413398 29416918 620.03809 2548 29413672 29416512 213.82509 702
and here is the associated peak in rep2 narrowPeak
chr7 29413672 29414923 rep2 2138 . 20.03013 213.82509 209.88286 676
FYI, the associated peak in rep1 narrowPeak is correct.
chr7 29413398 29416918 rep1 6200 . 25.59149 620.03809 615.40747 2548
Hi,
I'm working with a relatively small number of peaks (~100; which is probably in contrast to most use cases, which will have thousands+), due to a small genome size and relatively specific binding behavior. Being not not familiar with the details of the statistical background model (though i get the overall point), due i have to consider anything special in the interpretation of my results?
Can you recommend any literature for the statistical background?
Cheers,
Ben
I have downloaded clip_seq data narrow Peak files from ENCODE and am trying to perform IDR directly on them.
I call the command:
idr --samples rep1.peak rep2.peak -i 0.05 --rank p.value --input-file-type narrowPeak
I first get the warning:
'FutureWarning: comparison to None
will result in an elementwise object comparison in the future.
if localIDRs == None or IDRs == None:'
Then immediately get the error:
' mean(x.summit-x.start for x in m_pk.pks[key+1])
TypeError: unsupported operand type(s) for -: 'NoneType' and 'int''
I have tested these same commands on traditional chip-seq data and on the example data provided by IDR. Does anyone have any suggestions?
Hi, I'm running idr-2.0.2 in a virtual environment installed with python3.6 and I'm getting the following error running the test command (I also got this error on a real dataset).
Could this be a specific issue with running in a virtual environment?
Thanks,
-Dave
(IDR) [oliverd@aphrodite bin]$ idr --samples ../tests/data/peak1 ../tests/data/peak1
Initial parameter values: [0.10 1.00 0.20 0.50]
Final parameter values: [0.05 0.20 0.99 0.99]
/athena/hssgenomics/scratch/programs/IDR/bin/idr --samples ../tests/data/peak1 ../tests/data/peak1
Traceback (most recent call last):
File "/athena/hssgenomics/scratch/programs/IDR/bin/idr", line 4, in <module>
__import__('pkg_resources').run_script('idr==2.0.2', 'idr')
File "/athena/hssgenomics/scratch/programs/IDR/lib/python3.6/site-packages/pkg_resources/__init__.py", line 739, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/athena/hssgenomics/scratch/programs/IDR/lib/python3.6/site-packages/pkg_resources/__init__.py", line 1500, in run_script
exec(code, namespace, namespace)
File "/athena/hssgenomics/scratch/programs/IDR/lib/python3.6/site-packages/idr-2.0.2-py3.6-linux-x86_64.egg/EGG-INFO/scripts/idr", line 10, in <module>
idr.idr.main()
File "/athena/hssgenomics/scratch/programs/IDR/lib/python3.6/site-packages/idr-2.0.2-py3.6-linux-x86_64.egg/idr/idr.py", line 774, in main
useBackwardsCompatibleOutput=args.use_old_output_format)
File "/athena/hssgenomics/scratch/programs/IDR/lib/python3.6/site-packages/idr-2.0.2-py3.6-linux-x86_64.egg/idr/idr.py", line 415, in write_results_to_file
if localIDRs == None or IDRs == None:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Hi,
I run idr with the test narrowPeaks files and my own narrowPeaks files. They all ended up with an error:
[wszheng@guru idr-2.0.2]$ idr --samples tests/data/peak1 tests/data/peak2
WARNING: Cython does not appear to be installed.- falling back to much slower python method.
Initial parameter values: [0.10 1.00 0.20 0.50]
/home/wszheng/Install/Python-3.6.1/bin/idr --samples tests/data/peak1 tests/data/peak2
Traceback (most recent call last):
File "/home/wszheng/Install/Python-3.6.1/bin/idr", line 4, in
import('pkg_resources').run_script('idr==2.0.2', 'idr')
File "/home/wszheng/Install/Python-3.6.1/lib/python3.6/site-packages/pkg_resources/init.py", line 654, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/home/wszheng/Install/Python-3.6.1/lib/python3.6/site-packages/pkg_resources/init.py", line 1434, in run_script
exec(code, namespace, namespace)
File "/home/wszheng/Install/Source_binaries/idr-2.0.2/idr.egg-info/scripts/idr", line 10, in
idr.idr.main()
File "/home/wszheng/Install/Source_binaries/idr-2.0.2/idr/idr.py", line 759, in main
fix_mu=args.fix_mu, fix_sigma=args.fix_sigma )
File "/home/wszheng/Install/Source_binaries/idr-2.0.2/idr/idr.py", line 391, in fit_model_and_calc_idr
fix_mu=fix_mu, fix_sigma=fix_sigma)
File "/home/wszheng/Install/Source_binaries/idr-2.0.2/idr/optimization.py", line 468, in estimate_model_params
fix_mu=fix_mu, fix_sigma=fix_sigma)
File "/home/wszheng/Install/Source_binaries/idr-2.0.2/idr/optimization.py", line 420, in EMP_with_pseudo_value_algorithm
z1 = compute_pseudo_values(r1, theta[0], theta[1], theta[3])
File "/home/wszheng/Install/Source_binaries/idr-2.0.2/idr/utility.py", line 46, in py_compute_pseudo_values
-10, 10, EPS ) )
TypeError: py_cdf_i() takes 6 positional arguments but 7 were given
Can anyone help me figure out this problem? Thanks very much.
Hi!
I want to use IDR on peaks file obtained using HMMRATAC pipeline.
From this pipeline, we obtained a list of peaks in gappedPeak format (score in the 13th column) with the highest the score the better the call.
How can I use IDR to find reproducibility between replicates?
How can I change the comuns the IDR uses to find the score, considering the highest the score the better the call?
The files are as follows:
chr1 630114 630158 HighCoveragePeak_0 . . 0 0 255,0,0 1 44 0 -1 -1 -1
I runned the following command:
idr --verbose --samples MKN74R2_sorted.bed MKN74R3_sorted.bed --input-file-type bed --rank 13 --output-file MKN74-idr --plot --log-output-file MKN74.idr.log
And obtained the following error:
/home/cjose/Software/anaconda3/bin/idr --verbose --samples MKN74R2_sorted.bed MKN74R3_sorted.bed --input-file-type bed --rank 13 --output-file MKN74-idr --plot --log-output-file MKN74.idr.log
Traceback (most recent call last):
File "/home/cjose/Software/anaconda3/bin/idr", line 4, in
import('pkg_resources').run_script('idr==2.0.3', 'idr')
File "/home/cjose/Software/anaconda3/lib/python3.8/site-packages/pkg_resources/init.py", line 665, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/home/cjose/Software/anaconda3/lib/python3.8/site-packages/pkg_resources/init.py", line 1463, in run_script
exec(code, namespace, namespace)
File "/home/cjose/Software/anaconda3/lib/python3.8/site-packages/idr-2.0.3-py3.8-linux-x86_64.egg/EGG-INFO/scripts/idr", line 10, in
idr.idr.main()
File "/home/cjose/Software/anaconda3/lib/python3.8/site-packages/idr-2.0.3-py3.8-linux-x86_64.egg/idr/idr.py", line 840, in main
merged_peaks, signal_type = load_samples(args)
File "/home/cjose/Software/anaconda3/lib/python3.8/site-packages/idr-2.0.3-py3.8-linux-x86_64.egg/idr/idr.py", line 732, in load_samples
f1, f2 = [load_bed(fp, signal_index) for fp in args.samples]
File "/home/cjose/Software/anaconda3/lib/python3.8/site-packages/idr-2.0.3-py3.8-linux-x86_64.egg/idr/idr.py", line 732, in
f1, f2 = [load_bed(fp, signal_index) for fp in args.samples]
File "/home/cjose/Software/anaconda3/lib/python3.8/site-packages/idr-2.0.3-py3.8-linux-x86_64.egg/idr/idr.py", line 56, in load_bed
raise ValueError("Invalid Signal Value: {:e}".format(signal))
ValueError: Invalid Signal Value: -1.000000e+00
Thank you in advance for your help.
Cheers,
Celina
Hello,
I been getting different errors when running IDR.
If I run Idr v2.0.2 I get the following error:
./idr --input-file-type bed --rank 7 --plot --verbose --samples peaks_TGATCA_clean.bed peaks_PrNotT_clean.bed
Loading the peak files
Merging peaks
Ranking peaks
Initial parameter values: [0.10 1.00 0.20 0.50]
Fitting the model parameters
Iter 0 1.57e+00 4.57e-01 [0.09023906 0.71172818 0.98999476 0.98043616]
Iter 1 2.36e-01 3.29e-01 [0.1093229 0.49997562 0.98999476 0.9755989 ]
Iter 2 1.56e-01 2.29e-01 [0.1093229 0.34967449 0.98999476 0.97019152]
Iter 3 1.12e-01 1.59e-01 [0.1093229 0.24417437 0.98999476 0.96406317]
Iter 4 4.48e-02 6.87e-02 [0.1093229 0.20000658 0.98999476 0.96466309]
Iter 5 2.89e-03 3.72e-03 [0.1093229 0.20000658 0.98999476 0.9675544 ]
Iter 6 1.44e-03 1.86e-03 [0.1093229 0.20000658 0.98999476 0.96899706]
Iter 7 7.46e-04 9.61e-04 [0.1093229 0.20000658 0.98999476 0.96974351]
Iter 8 3.85e-04 4.96e-04 [0.1093229 0.20000658 0.98999476 0.9701287 ]
Iter 9 2.01e-04 2.58e-04 [0.1093229 0.20000658 0.98999476 0.97032952]
Iter 10 1.06e-04 1.36e-04 [0.1093229 0.20000658 0.98999476 0.97043521]
Iter 11 5.60e-05 7.20e-05 [0.1093229 0.20000658 0.98999476 0.97049116]
Iter 12 2.97e-05 3.82e-05 [0.1093229 0.20000658 0.98999476 0.97052088]
Iter 13 1.58e-05 2.04e-05 [0.1093229 0.20000658 0.98999476 0.9705367 ]
Iter 14 8.42e-06 1.08e-05 [0.1093229 0.20000658 0.98999476 0.97054513]
Iter 15 4.49e-06 5.78e-06 [0.1093229 0.20000658 0.98999476 0.97054962]
Iter 16 2.39e-06 3.08e-06 [0.1093229 0.20000658 0.98999476 0.97055201]
Iter 17 0.00e+00 0.00e+00 [0.1093229 0.20000658 0.98999476 0.97055201]
Finished running IDR on the datasets
Final parameter values: [0.11 0.20 0.99 0.97]
Writing results to file
./idr --input-file-type bed --rank 7 --plot --verbose --samples peaks_TGATCA_clean.bed peaks_PrNotT_clean.bed
Traceback (most recent call last):
File "./idr", line 10, in
idr.idr.main()
File "/rhome/icoimbra/.conda/envs/py3_env/lib/python3.7/site-packages/idr-2.0.2-py3.7-linux-x86_64.egg/idr/idr.py", line 774, in main
useBackwardsCompatibleOutput=args.use_old_output_format)
File "/rhome/icoimbra/.conda/envs/py3_env/lib/python3.7/site-packages/idr-2.0.2-py3.7-linux-x86_64.egg/idr/idr.py", line 415, in write_results_to_file
if localIDRs == None or IDRs == None:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
If I run Idr v2.0.3 I get:
./idr --input-file-type bed --rank 7 --plot --verbose --samples ENCFF437QMO.bed ENCFF025CWO.bed
Traceback (most recent call last):
File "./idr", line 10, in
idr.idr.main()
File "/rhome/icoimbra/.conda/envs/py3_env/lib/python3.7/site-packages/idr-2.0.3-py3.7-linux-x86_64.egg/idr/idr.py", line 828, in main
merged_peaks, signal_type = load_samples(args)
File "/rhome/icoimbra/.conda/envs/py3_env/lib/python3.7/site-packages/idr-2.0.3-py3.7-linux-x86_64.egg/idr/idr.py", line 726, in load_samples
f1, f2 = [load_bed(fp, signal_index) for fp in args.samples]
File "/rhome/icoimbra/.conda/envs/py3_env/lib/python3.7/site-packages/idr-2.0.3-py3.7-linux-x86_64.egg/idr/idr.py", line 726, in
f1, f2 = [load_bed(fp, signal_index) for fp in args.samples]
File "/rhome/icoimbra/.conda/envs/py3_env/lib/python3.7/site-packages/idr-2.0.3-py3.7-linux-x86_64.egg/idr/idr.py", line 64, in load_bed
float(data[6]), float(data[7]), float(data[8])
ValueError: could not convert string to float: 'chr1_minus_88541_827795'
My bed files were generated using grit (following scrip from encode rampage protocol) and are organized as follow:
chr1 629361 629492 TSS_chr1_minus_13397_827795_pk1 1000 - 15.0 chr1_minus_13397_827795 chr1_minus_13397_827795 TSS_chr1_minus_13397_827795_pk1 1.0,0.0,0.0,0.0,0.0,0.0,1.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0$
chr1 629580 629610 TSS_chr1_minus_13397_827795_pk2 1000 - 17.0 chr1_minus_13397_827795 chr1_minus_13397_827795 TSS_chr1_minus_13397_827795_pk2 2.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0$
chr1 629639 629939 TSS_chr1_plus_629080_634923_pk1 1000 + 1646.0 chr1_plus_629080_634923 chr1_plus_629080_634923 TSS_chr1_plus_629080_634923_pk1 270.0,0.0,0.0,1.0,79.0,6.0,4.0,2.0,3.0,23.0,8.0,97.0,0.$
chr1 629752 629867 TSS_chr1_minus_13397_827795_pk3 1000 - 37.0 chr1_minus_13397_827795 chr1_minus_13397_827795 TSS_chr1_minus_13397_827795_pk3 2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,1.0$
chr1 629968 630009 TSS_chr1_minus_13397_827795_pk4 1000 - 97.0 chr1_minus_13397_827795 chr1_minus_13397_827795 TSS_chr1_minus_13397_827795_pk4 41.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0$
I am using Python 3.7.6
How do I work around this?
For multi-summit peaks, add an option to set the IDR value to the maximum value across all summits.
Hi,
I am trying to run IDR on a set of 3 replicates and I get the following error:
idr: error: unrecognized arguments: K9AcY4_2/MACS2_K9acY4_2/K9acY4_2_macs2_peaks.broadPeak
This is the third replicate. If I run it for 2 replicates things are working, but not for more than 2.
Any advice on the issue?
Hi,
Does anyone know how to uninstall idr on a MAC OS X?
Cheers,
Ben
The code, as it is, does not work with bed files. Changing line 45 of idr.py should do the trick. Currently, that line says "data[5]". I believe it should be "data[3]", as you want the strand information.
It seems that the argument --use-nonoverlapping-peaks
is never used. Instead, the code checks for an argument "--use_nonoverlapping_peaks". When I corrected this and ran it on two replicates where there are non-overlapping peaks, I got the following error:
Traceback (most recent call last):
File "/cluster/zeng/code/research/software/miniconda/envs/idr/bin/idr", line 4, in <module>
__import__('pkg_resources').run_script('idr==2.0.3', 'idr')
File "/cluster/zeng/code/research/software/miniconda/envs/idr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 750, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/cluster/zeng/code/research/software/miniconda/envs/idr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 1527, in run_script
exec(code, namespace, namespace)
File "/cluster/zeng/code/research/software/miniconda/envs/idr/lib/python3.6/site-packages/idr-2.0.3-py3.6-linux-x86_64.egg/EGG-INFO/scripts/idr", line 10, in <module>
idr.idr.main()
File "/cluster/zeng/code/research/software/miniconda/envs/idr/lib/python3.6/site-packages/idr-2.0.3-py3.6-linux-x86_64.egg/idr/idr.py", line 876, in main
useBackwardsCompatibleOutput=args.use_old_output_format)
File "/cluster/zeng/code/research/software/miniconda/envs/idr/lib/python3.6/site-packages/idr-2.0.3-py3.6-linux-x86_64.egg/idr/idr.py", line 482, in write_results_to_file
merged_peak, IDR, localIDR, output_file_type, signal_type)
File "/cluster/zeng/code/research/software/miniconda/envs/idr/lib/python3.6/site-packages/idr-2.0.3-py3.6-linux-x86_64.egg/idr/idr.py", line 343, in build_idr_output_line_with_bed6
rv.append( "%i" % min(x.start for x in m_pk.pks[key]))
ValueError: min() arg is an empty sequence
Maybe this functionality hasn't been tested thoroughly due to the mismatched naming? Could you advise?
Hello! After I install the idr-2.0.2 and run idr --samples tests/data/peak1 tests/data/peak2, the error comes:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
How could I fix this? Wish your reply, thanks.
Add option to log to file and unify the logging code (avoid prints in sub-modules)
I am facing error while installing IDR
command used: python3 setup.py install
Error:
Traceback (most recent call last):
File "setup.py", line 2, in
import numpy
ImportError: No module named 'numpy'
I have also installed numpy and its showing
"Requirement already satisfied: numpy in /usr/local/lib/python2.7/dist-packages/numpy-1.11.1-py2.7-linux-x86_64.egg"
Still same error. Any idea how to solve it ?
Hi
When I ran ATAC-Seq data with IDR and encountered below error for so many times. so why it came?
code:
idr --samples sample1_peaks.narrowPeak sample2_peaks.narrowPeak
--input-file-type narrowPeak
--rank p.value
--output-file $workdir/IDR/rep_idr
--plot
--log-output-file $workdir/IDR/rep.idr.log
error:
File "/mypathtodir/site-packages/idr-2.0.2-py3.8-linux-x86_64.egg/idr/idr.py", line 415, in write_results_to_file
if localIDRs == None or IDRs == None:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Thanks !
Best wishes
Zoe
For certain histone modifications, ENCODE uses the gappedPeak representation. Right now IDR doesn't support using gappedPeak files - is there a workaround for this? For example, converting a gappedPeak file to narrowPeak file? Or running IDR on the broadPeak file and selecting features in the gappedPeak file which intersect with it?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.