gwastro / pycbc Goto Github PK

Core package to analyze gravitational-wave data, find signals, and study their parameters. This package was used in the first direct detection of gravitational waves (GW150914), and is used in the ongoing analysis of LIGO/Virgo data.

Home Page: http://pycbc.org

License: GNU General Public License v3.0

Python 97.21% Shell 0.67% HTML 0.48% CSS 0.07% JavaScript 0.07% Dockerfile 0.05% C++ 0.55% Cython 0.91%

astronomy physics gravity ligo gravitational-waves pycbc analysis python black-hole neutron-star gwastro virgo signal-processing open-science cosmic-explorer einstein-telescope lisa

pycbc's Introduction

PyCBC is a software package used to explore astrophysical sources of gravitational waves. It contains algorithms to analyze gravitational-wave data, detect coalescing compact binaries, and make bayesian inferences from gravitational-wave data. PyCBC was used in the first direct detection of gravitational waves and is used in flagship analyses of LIGO and Virgo data.

PyCBC is collaboratively developed by the community and is lead by a team of GW astronomers with the aim to build accessible tools for gravitational-wave data analysis.

The PyCBC home page is located on github at

https://pycbc.org/

Documentation is automatically built from the latest master version

https://pycbc.org/pycbc/latest/html/

For the detailed installation instructions of PyCBC

https://pycbc.org/pycbc/latest/html/install.html

Want to get going using PyCBC?

Try out our tutorials. No software installation required and these can run entirely from the browser.

Quick Installation

pip install pycbc

To test the code on your machine

pip install pytest "tox<4.0.0"
tox

If you use any code from PyCBC in a scientific publication, then please see our citation guidelines for more details on how to cite pycbc algorithms and programs.

For the citation of the pycbc library, please use a bibtex entry and DOI for the appropriate release of the PyCBC software (or the latest available release). A bibtex key and DOI for each release is avaliable from Zenodo.

pycbc's People

Contributors

Stargazers

Watchers

Forkers

skoranda ahnitz abhaymk duncan-brown josh-willis cmbiwer cdcapano shasvath tjma12 spxiwh tdent titodalcanton stevereyes01 pannarale a-r-williamson lppekows jsread cyberface soumide1102 lauranuttall frankohme asamajdar ajwcaltech alex-nielsen micamu andrew-lundgren lenona samantha-usman sn1984jm stasbabak sfairhur bcheeseboro swethapbhagwat ofek7 lmagana3 lpsinger bbockelm couvares shinsei90 dipongkar vitale82 sudughonge whitew1994 jakerobertandrews hfh12a willnewton519 jasonshih safeenharis bema-aei eldrfuming duncanmmacleod rajesh-nayak johnveitch millsjc gayathrigcc pbacon42 sum33it mbejger aravind-pazhayath rorysmith bfarr anuradha-gupta sturani mhulko ntveem hagabbar specimenlife dfinstad n-wbrown paulhopkins ueno-phys idorrington92 priscilla- geoffrey4444 nikikilbertus gabewilliam elenacuoco torreycullen maxisi calbooth tapaimarton swethabhagwat chetanv90 vivienr prayush paulcarstens sergeiossokine dethodav molodiuc farr tcallister leohsuntnu kayejlli julianwesterweck stevertaylor mlilom jretza changitp dyeeles yiting206205

pycbc's Issues

Need documentation that describes how the bank is tested

We need PyCBC documentation for the process of testing the bank minimal match.

Changing value of a readonly variable

Bumped into this using an old ini-file: once I cleaned up and fixed the ini-file, I did not hit this problem again. Having said that, it looks like line 106 of timeseries.py tries to change the value of a readonly variable, which should not be happening.

Traceback (most recent call last):
File "/home/francesco.pannarale/opt/pycbc//bin/pycbc_inspiral" line 120, in
segments = strain_segments.fourier_segments()
File "/home/francesco.pannarale/opt/pycbc/lib/python2.6/site-packages/pycbc/strain.py", line 587, in fourier_segments
freq_seg = make_frequency_series(self.strain[seg_slice])
File "/home/francesco.pannarale/opt/pycbc/lib/python2.6/site-packages/pycbc/types/timeseries.py", line 106, in getitem
index.start += len(self)
TypeError: readonly attribute

hdf results page task list

The hdf based coinc result pages are severely lacking at the moment. Let's come together and count the ways (and hopefully fix some of them too!). Please comment with what should be added to this list, if you are working on some new plot, or are interested in helping out.

last example version
https://sugar-jobs.phy.syr.edu/~ahnitz/projects/paper/testing/w1/w1_test3/html/

The current example ini files are broken in pycbc workflow

The current example ini files seem to broken in the current master of pycbc. Running them gives an error (shown below). I'm not really sure what this means ... I worry that the bitwise operation performed between two segment lists here may not be compatible with the lal.LIGOTimeGPS type. If so, that would be bad. Thoughts?

2015-06-16 18:04:53,187:INFO : Leaving split output files module.
2015-06-16 18:04:53,187:INFO : Entering injection module.
2015-06-16 18:04:53,196:INFO : Leaving injection module.
2015-06-16 18:04:53,196:INFO : Entering time slides setup module.
2015-06-16 18:04:53,604:INFO : Entering matched-filtering setup module.
2015-06-16 18:04:53,605:INFO : Adding matched-filter jobs to workflow.
2015-06-16 18:04:53,605:INFO : Setting up matched-filtering for H1.
Traceback (most recent call last):
File "./pycbc_make_coinc_workflow", line 123, in
tags = [tag])
File "/home/spxiwh/lscsoft_git/executables_master/lib/python2.7/site-packages/PyCBC-65a9dd-py2.7.egg/pycbc/workflow/matched_filter.py", line 119, in setup_matchedfltr_workflow
compatibility_mode=compatibility_mode)
File "/home/spxiwh/lscsoft_git/executables_master/lib/python2.7/site-packages/PyCBC-65a9dd-py2.7.egg/pycbc/workflow/matched_filter.py", line 221, in setup_matchedfltr_dax_generated
compatibility_mode=compatibility_mode)
File "/home/spxiwh/lscsoft_git/executables_master/lib/python2.7/site-packages/PyCBC-65a9dd-py2.7.egg/pycbc/workflow/jobsetup.py", line 263, in sngl_ifo_job_setup
useSplitLists=True)
File "/home/spxiwh/lscsoft_git/executables_master/lib/python2.7/site-packages/PyCBC-65a9dd-py2.7.egg/pycbc/workflow/core.py", line 867, in find_outputs_in_range
overlap_windows = [abs(i.segment_list & currsegment_list) for i in overlap_files]
TypeError: in method 'LIGOTimeGPS___gt__', argument 2 of type 'LIGOTimeGPS *'

Add file names (and checksum?) to the attributes of hdf files

Currently, STATMAP, TRIGGERMERGE, and HDFINJFIND files make references to other files without actually recording those files' names anywhere. For example, STATMAP files have 'template_id' in their foreground and background fields, which point to a template in a BANKHDF file, but the name/location of the bankhdf file is not stored anywhere in the file. Ditto 'trigger_idX' (which point to single detector ids) and TRIGGERMERGE files. Likewise, HDFINJFIND files will include an injection_id which points to an injection in an INJECTION xml file, but doesn't record which file. (These are particularly annoying, since there are multiple INJECTION files in a run, and the HDFINJFIND file names don't exactly match the INJECTION file name.) This means that if a user wants to do something with both coincident results and single-detector results, they just have to know which file goes with what. Also, if you want to write a program that uses information from multiple files, you have to have the user provide all the files on the command line, e.g., 'coinc-files STATMAP --bank-file BANKFILE', etc.

This also isn't very safe from a review stand point. There's no way to ensure that, for instance, the TRIGGERMERGE file that a STATMAP file used on creation is the same TRIGGERMERGE file currently sitting in the run directory.

The request is that the files on which each hdf file depends on get recorded in the attributes of that file, along with a checksum. In particular: STATMAP files record the BANKHDF, and TRIGGERMERGE files that they use. HDFINJFIND files record the BANKHDF, TRIGGERMERGE, INJECTION*xml, and STATMAP files. TRIGGERMERGE files record the BANKHDF file.

As I'm not too familiar with what codes generate each of these files, I'm just throwing this out there right now to see if anyone wants to take this on. If not, I'll come back to it and try to do it myself.

Accounting tags in staging/cleanup jobs

Jobs without accounting_group tags are now blocked on Atlas. I found the hard way that some .sub files in PyCBC workflows don't yet have the tag:

[tito@atlas6 ~/er7/runs/week1/local]$ find . -type f -name \*.sub -exec grep -L accounting_group "{}" \;
./create_dir_er7_uberbank_week1_0_local.sub
./subdax_main_ID0000001.sub
./subdax_finalization_ID0000002.sub
./cleanup_er7_uberbank_week1_0_local.sub
./er7_uberbank_week1-0.dag.condor.sub
./main_ID0000001.000/create_dir_main_0_local.sub
./main_ID0000001.000/stage_in_remote_local_0_1.sub
./main_ID0000001.000/stage_in_remote_local_1_1.sub
./main_ID0000001.000/stage_in_remote_local_0_0.sub
./main_ID0000001.000/stage_in_remote_local_1_0.sub
./main_ID0000001.000/stage_out_local_local_0_1.sub
./main_ID0000001.000/stage_out_local_local_0_0.sub
./main_ID0000001.000/stage_out_local_local_1_1.sub
./main_ID0000001.000/stage_out_local_local_1_0.sub
./main_ID0000001.000/stage_out_local_local_2_0.sub
./main_ID0000001.000/stage_out_local_local_2_1.sub
./main_ID0000001.000/stage_out_local_local_3_0.sub
./main_ID0000001.000/stage_out_local_local_3_1.sub
./main_ID0000001.000/main-0.dag.condor.sub
./main_ID0000001.000/stage_out_local_local_4_0.sub
./main_ID0000001.000/stage_out_local_local_5_0.sub
./main_ID0000001.000/stage_out_local_local_5_1.sub
./main_ID0000001.000/stage_out_local_local_6_1.sub
./main_ID0000001.000/stage_out_local_local_6_0.sub
./main_ID0000001.000/stage_out_local_local_7_1.sub
./main_ID0000001.000/stage_out_local_local_7_0.sub

I don't know Pegasus enough to fix this quickly, but I hope it's an easy fix for Alex or Ian.

Datafind AT_RUNTIME_MULTIPLE_CACHES and AT_RUNTIME_SINGLE_CACHES methods are broken

I think my recent change to support a backup datafind server has broken the two datafind methods that return cache files. This is not used by the all-sky workflow any more, but the GRB pipeline might be using this. I think the fix will be quick, but I may not be able to fix and test while at the meeting. Just noting that this issue probably exists and to be aware of this for now if you are using it. Let me know if fixing this becomes urgent otherwise I will fix this next week.

Ability to read hdf files from pycbc_plot_glitchgram

This is a request to add the ability to read the HDF single trigger file from the pycbc_plot_glitchgram code.

enhancements

pycbc_page_coinc_snrchi doesn't treat injections equally to background

The background chisq values are determined by the chisq-choice parameter, but the injection values currently aren't (still hard-coded to Bruce-chisq)
https://github.com/ligo-cbc/pycbc/blob/master/bin/hdfcoinc/pycbc_page_coinc_snrchi

correlate parallel broken when used in simple usage

I have reverted the correlate function (not the Correlator, so pycbc_inspiral still used it), so the match function will use the old correlate_inline.

(from Riccardo)

Hi All,

I'm trying to compute overlaps in pycbc.
While in the past it has been as straightforward as

hp, hc = get_td_waveform(coa_phase=phiRef, delta_t=dT, mass1=m1,
mass2=m2, f_lower=fLow, distance=dist, inclination=inc,
approximant=apprxStr)
overlap, idxM = match(hp, hc, low_frequency_cutoff=fLow)

using present master version I hit the following error:

/tmp/1000_python27_compiled/sc_ce6a0f08bc653d0518ec8e7d5816a62311.cpp:
In function ‘PyObject* compiled_func(PyObject_, PyObject_)’:
/tmp/1000_python27_compiled/sc_ce6a0f08bc653d0518ec8e7d5816a62311.cpp:1070:82:
error: cannot convert ‘std::complex’ to
‘std::complex’ for argument ‘1’ to ‘void
ccorrf_parallel(std::complex, std::complex,
std::complex*, uint32_t, uint32_t)’
ccorrf_parallel(htilde, stilde, qtilde, (uint32_t) arrlen,
(uint32_t) segsize);

Traceback (most recent call last):
File "checkEOBNRv2.py", line 54, in
overlap, idxM = match(hpM_tilde, hpM_tilde, low_frequency_cutoff=fLow)
File "/home/riccardo/lvc/lscsoft/opt/EOBNRv2review/pycbc/lib/python2.7/site-packages/pycbc/filter/matchedfilter.py",
line 677, in match
high_frequency_cutoff, v1_norm, out=_snr)
File "/home/riccardo/lvc/lscsoft/opt/EOBNRv2review/pycbc/lib/python2.7/site-packages/pycbc/filter/matchedfilter.py",
line 557, in matched_filter_core
correlate(htilde[kmin:kmax], stilde[kmin:kmax], _qtilde[kmin:kmax])
File "", line 2, in correlate
File "/home/riccardo/lvc/lscsoft/opt/EOBNRv2review/pycbc/lib/python2.7/site-packages/pycbc/scheme.py",
line 172, in scheming_function
return schemed_fn(_args, *_kwds)
File "/home/riccardo/lvc/lscsoft/opt/EOBNRv2review/pycbc/lib/python2.7/site-packages/pycbc/filter/simd_correlate.py",
line 400, in correlate_parallel
support_code = corr_support, auto_downcast = 1)
File "/usr/lib/python2.7/dist-packages/scipy/weave/inline_tools.py",
line 361, in inline
*_kw)
File "/usr/lib/python2.7/dist-packages/scipy/weave/inline_tools.py",
line 491, in compile_function
verbose=verbose, *_kw)
File "/usr/lib/python2.7/dist-packages/scipy/weave/ext_tools.py",
line 373, in compile
verbose=verbose, **kw)
File "/usr/lib/python2.7/dist-packages/scipy/weave/build_tools.py",
line 297, in build_extension
raise e
scipy.weave.build_tools.CompileError: error: Command "c++ -pthread
-fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -fPIC
-I/usr/lib/python2.7/dist-packages/scipy/weave
-I/usr/lib/python2.7/dist-packages/scipy/weave/scxx
-I/usr/lib/python2.7/dist-packages/numpy/core/include
-I/usr/include/python2.7 -c
/tmp/1000_python27_compiled/sc_ce6a0f08bc653d0518ec8e7d5816a62311.cpp
-o /tmp/scipy-riccardo-XzpoDg/python27_intermediate/compiler_8855277b295f576c423c618665ded9a0/tmp/1000_python27_compiled/sc_ce6a0f08bc653d0518ec8e7d5816a62311.o

-march=native -O3 -w -fopenmp" failed with exit status 1

Can anyone tell me what I am doing wrong?
Cheers,

Riccardo

P.S. I'm sorry I can't retrieve the git hash of the pycbc that worked
for me, but it should be master from last April or so.

Need to think about how to handle pegasus executable entries for different tags

This issue only affects MPI running on Stampede, so it's not urgent, but I'm putting it here so as not to lose the issue.

Larne, Duncan,

After some further thought on the request to use the same executable entry for each injection run, I realized that my simple suggestion to merge injection executable was wrong, and will not work in general for the injection inspiral jobs.

I am cc'ing Ian, as this may involve significant changes to the way the pycbc workflow modules conceptually work.

The model that we have used for pycbc workflow is that it would be built from a top-level set of components that are essentially procedural, but modular, where each would add some useful amount of work to the workflow in an independent manner, and only share information by explicit lists of result files, ensuring that the data dependencies are clearly visible at the highest level.

Executables are only instantiated and used within the context of a single workflow function. The model we have used is that they independent generators of jobs. As such, in principle (through ini file configuration), although not I believe in active use, one could use a different physical executable for each call to a setup function.

To recap the main problems we are having, as have discovered by running on stampeded are the following.

Each ifo/injection combination gets a different entry in the transformation catalog.
a) An executable is staged for each entry even if the PFN is the same (it is renamed on the remote site)
b) horizontal clustering if the main mode of analysis and operates at the level of the transformation catalog, meaning that it can only cluster jobs made by a single executable instance at the moment, which is a problem in the case of a large number of injection sets. Whereas with label-based clustering we can easily set to a few jobs (i.e. 1 per ifo is a trivial ini file option), but can't granularly control the number of clustered jobs with a single option.
There is a general conceptual mismatch between the Dax3.Executable class and the pycbc Executable class. The worry is that this could cause us to move out of step with Pegasus development and cause further issues down the line.

Larne, Duncan, is that a fair understanding of what the problems are? Please point out any other issues.

The current model has a number of consequences that make it difficult to quickly implement the model you are requesting, though certainly not unfeasible.

These are the main technical issues.

Setup functions do not share executable instances. Currently, they only share files, file lists, data products, etc. The idea was that this is something that should not be exposed at the top-level as it has no bearing on the actual connection of data driven plumbing.
Executables instances are viewed as generators. This means they keep track of the common options for the executable (derived from the ini file), and information such as which output folder the files it will generate will be stored in, etc, which needs to be separate.

We could explicitly instantiate executables in the top-level workflow and pass them to the setup functions. We would also need to move the common option, output folder, etc. logic into the node instantiation functions. This would require changes to nearly all parts of the code. We would need to pass information we would normally pass to the executable, to the node creation instead. These are mostly local changes, however, as typically the executable is generated and then called many times to generate jobs. We would have to think very carefully how this would affect the configuration file hooks, and while I think this change could be made without changing the way the configuration file addresses different tasks, that requires some thought and verification. I am struggling with is the idea you one passes around the executable instance between setup function calls. To me the current agreement about what goes into and comes out of a workflow setup function is very clear, and this would break that agreement. Do you do this for all setup functions, or do we have the inspiral ones as a special case?

If this is a change worth making, then would need to plan for it, and it certainly can't be done until after O1. In the meantime, are there show-stoppers on why we can't use label-based clustering for now?

-Alex

I'm missing the context here (this appears to be the result of some
previous runs which I haven't heard about!), so my answer probably
doesn't make sense. But ....

Is the problem is that you are wanting to cluster over different
injection runs? As Alex says this could be done with label clustering.
However, why would you want to do that? Even in our largest workflows
we have never run more than 200 injection sets. Now you could use
horizontal clustering to cluster all inspiral jobs in each injection
run together, giving 200 jobs. What is the problem with that?

Maybe the issue is gwf file staging? Maybe you want a node to copy a
bunch of data files and then have a bunch of inspiral jobs from
different inspiral runs analyse the same data files? If that is
the goal then you need label clustering, horizontal clustering will
just cluster jobs at random.

Cheers
Ian

Our landscape health needs improvement

If you want to get involved in running PyCBC on O1 data, please volunteer to help improve our landscape heath. See https://landscape.io/github/ligo-cbc/pycbc for open issues and add a comment here if you plan on working on something.

coinc_time option doesn't analyze some coincident time

The COINC_TIME option is implemented by taking the intersection of two detectors science time and treating it as the science time instead, so filtering jobs only get tiled during this time. This is fine, except in the case where a coincident segment is shorter than the analysis-length, but the science segments that made it up were longer. In this case, the data would not be analyzed.

If anyone has a chance to fix this, please let me know, and go ahead. Otherwise, simply take note.

Sec_to_year function in pycbc_coinc_statmap

This function hardcodes a value of 3.15569e7 in the executable as the transformation between seconds and years.

Constants like this should be defined in one place (use the values in lal), and it should be clear whether we are defining a year as:

lal.YRSID_SI
lal.YRTROP_SI

lal.YRJUL_SI

Cheers
Ian

pycbc.events.veto.indices_within_times can blow up RAM usage

The function pycbc.events.veto.indices_within_times can explode in RAM usage in cases where large numbers of triggers are present.

Specifically the issue is that it (in pycbc_coinc_statmap) is testing for trigger IDXes around a set of foreground triggers. It does not ensure that the idxes returned are unique, so in the case that you have a large number of foreground triggers you will get a lot of idxes, but each will show up lots of times, leading to an N**2 blow up of RAM. Somehow this function should include a uniqueness test when generating the list of integer idxes. Locally I am using something like:

for i in range((len(left)+1) // 100000):
    print i, (len(left)+1) // 100000
    start_idx = i*10000
    end_idx = (i+1) *10000
    if end_idx > len(left):
        end_idx = len(left)
    curr_left = left[start_idx:end_idx]
    curr_right = right[start_idx:end_idx]
    curr_stacked = numpy.hstack(numpy.r_[s:e] for s, e in zip(curr_left, curr_right))
    stacked = numpy.unique(numpy.hstack([stacked,curr_stacked]))
return tsort[stacked]
# The original code
# return tsort[numpy.hstack(numpy.r_[s:e] for s, e in zip(left, right))]

But this is hardcoding a value of 100000 that is good for my use-case, but maybe not good always. I don't know a magic numpy way of including uniqueness in the list comprehension.

..... And a gripe. We are doing a lot of optimizing of the inspiral code to make it run faster. But I have never really been hit by run time issues since the hard work to make the chisq really fast and optimize the main FFT computation. We are all managing to complete ER7 runs quickly on different clusters. However, I have hit memory issues on a number of occasions. I think our optimization tests need to also consider and balance RAM usage. For example a clustering algorithm that is really fast, but keeps more triggers than the algorithm used in coh_PTF is probably going to be an issue for RAM usage before the speedup is useful.

Show config files in HTML reports from HDF workflows

Hi Chris,

it would be useful to have the merged .ini file visible from your pycbc_make_html_page reports from HDF workflows. Is that something you can do easily? If not, what's the best/quickest/easiest way to implement it?

Add a mechanism for validating ini file versions

Incompatibilities between ini file versions and code versions has been the bane of our existence for years. I suggest adding a section like

[version]
pycbc = 1.1.1

that pycbc_make_*_workflow can check against its own version information and tell the user whether or not the ini file is valid for the version of the workflow generator.

Inspiral will not produce triggers if cluster-window > injection-window

I found that if the cluster-window argument is larger than the injection-window argument, pycbc_inspiral will produce no triggers. You can find an example command to make this happen on atlas8, at: /home/cdcapano/overlapStudies/early_aligo/flow30Hz/M_4_50_bbh/ians_new_banks/geom_bank_6_1100_stoch_filled_2step/pycbc_workflow/run_mkl/test/run_inspiral.sh. In that script, you'll see that cluster-window is set to 4, while injection-window is set to 1. That command will produce no triggers. If I change injection-window to 3.9, I still get no triggers; if I set it to 4.1, I get triggers. To ensure that it is something to do with the cluster window, if I set the injection-window to 3.9, but change the cluster window to 3.5, I get triggers.

I'm not sure what's going on, but it appears to be due to something in threshold_and_cluster.

allow tarballed / zipped files in results pages

In the case of an indeterminate amount of output files from a plotting / result page generation code, where it is not reasonable to individually specify the files, we would like to standardize on generating a single tar'd file. Note that this would also resolve Pegasus handling.

This request is to ensure that these files are properly handled in the pycbc_make_html_page script. It will untar these files and render the resulting output as normal. If the tar'd file contains its own index.html, this will also be respected and not overwritten.

Include LIGOTimeGPS bug fix into O1 lalsuite branch

This is to track the inclusion of lalsuite's fix to a LIGOTimeGPS bug into the O1 release branch. The bug may cause an infinite loop when generating a workflow.

Need to move code for generating veto gates into PyCBC

Andy has a draft code that needs to be moved into the main PyCBC repository. Andy, please can attach a link to this issue?

pipeline.py based jobs don't accept accounting tags

Jobs based on pipeline.py, i.e. those in pipedown_plots, currently don't accept an accounting tag option, meaning they will fail to run on clusters such as ATLAS without manual intervention.

PyCBC install instructions are confusing

When running

[dbrown@seaview ~]$ pip install pycbc --user

I get the error:

Downloading/unpacking mpld3>=0.3git (from pycbc)
Could not find a version that satisfies the requirement mpld3>=0.3git (from pycbc) (from versions: 0.0.1, 0.1, 0.2)
Some insecure and unverifiable files were ignored (use --allow-unverified mpld3 to allow).

I looked at pypa/pip#1423 which discusses this issue, but I couldn't resolve it with any combination of --allow-unverified and --allow-all-external that I could figure out.

pip 1.5.6

mc integral senstivity when using chirp distance correction

As Tom wrote,

BTW I worked out today that the chirp distance formulae here are not correct - in fact would lead to a significant overestimate of V. I've put a corrected version (which checks pretty closely against the binned calculation in pylal/upper_limit_utils.py) at
https://bugs.ligo.org/redmine/issues/2088

Using empty lists as default arguments

Looking at our landscape.io page there are a lot of functions that use an empty list as a default option, eg. tags=[].

I don't think we've run into a problem with this before but this is not the best practice. The default list [] is created when the function is imported not when it is called. A small experiment in ipython shows this:

In [1]: def print_tags(tags=[]):
   ...:     print tags
   ...:     tags.append(1)
In [2]: print_tags()
[]
In [3]: print_tags()
[1]
In [4]: print_tags()
[1, 1]

A page that describes the gotcha is here: https://pythonconquerstheuniverse.wordpress.com/2012/02/15/mutable-default-arguments/

A simple alternative is to pass a None object as the default, then below check if None create an empty list.

Handling of tags in SplitXXX codes

The handling of tags in the splitinspinj and splittmpltbank codes is inconsistent:

https://github.com/ligo-cbc/pycbc/blob/master/pycbc/workflow/jobsetup.py#L1117

https://github.com/ligo-cbc/pycbc/blob/master/pycbc/workflow/jobsetup.py#L1461

these codes are a little different from others as they inherit tags from the parent jobs and then add a split number tag. But in some cases you may want to split the parent file more than once using different numbers of split files (ie. for injection analysis the bank could be denser than for full-data, most relevant in GRB where sliding is done in full-data).

Alex, what do you think is the best way to handle this, should we add a extra_tags option (or something) in the "init" of these classes and append that to the parent's tags and the split number tag?

Having these two share functions code (and be adjacent in the code) would also be good ... Maybe the executables could even be combined?

Single detector trigger info is missing from results page

It was suggested that it would be nice to have single detector SNRs in the page_foreground output alongside coincident information. I'm just opening this issue to let people know I am working on this as part of dumping the loudest events to XML format for upload to gracedb at box opening.

merge segment handling functions

Ok, Ian, here is the todo list. Please let me know if there is something else I am missing.

Task List

Out of order dependencies in PyCBC build

It looks like the dependencies in the PyCBC build are out of order. If I

[dbrown@sugar-dev3 ~]$ virtualenv /home/dbrown/projects/pycbc
[dbrown@sugar-dev3 ~]$ source /home/dbrown/projects/pycbc/bin/activate
(pycbc)[dbrown@sugar-dev3 ~]$ pip install "numpy>=1.6.4" unittest2
(pycbc)[dbrown@sugar-dev3 ~]$ pip install -e git+https://github.com/ligo-cbc/pycbc#egg=pycbc --process-dependency-links

In the build I see:

Running setup.py bdist_wheel for h5py
running build_ext
File "setup_build.py", line 140, in run
from Cython.Build import cythonize
ImportError: No module named Cython.Build

Failed building wheel for h5py

Later I see:

Running setup.py bdist_wheel for Cython

Then later I see:

Running setup.py install for h5py

and my guess is that the install cleans up anything that failed in the earlier build.

PyCBC needs a mini-followup page

This page is to keep track of what is needed and what is implemented for the mini-followup page in PyCBC. So that it reproduces what we had in ihope, it should at least have:

Table showing the summary of the combined statistics of the trigger (FAR, combined NewSNR, time-delay between single-detector triggers)
Table showing single-detector trigger information (SNR, NewSNR, chisq/dof, m1, m2, mchirp, eta, mtotal, mass ratio, template duration, end time in GPS seconds, UTC, and site local time, effective distance). Also show the injection ID, if it is an injection.
Time series (+/- 10 second) of the single detector triggers SNR after applying CAT1 + gating and CAT2 vetoes.
Time series (+/- 10 second) of the single detector triggers NewSNR after applying CAT1 + gating and CAT2 vetoes.
Time series (+/- 10 second) of the single detector SNR for the loudest trigger in each detector.
Time series (+/- 10 second) of the single detector NewSNR for the loudest trigger in each detector.
Version information for the code used to produce the mini-followup.

Each event should have

Link to the alog for the day in which the event occurred.
Link to the daily summary page for which the event occurred.
Link to the CBC daily pages for which the event occurred.
Links to the glitch grams for the summary page plots for the hour in which the event occurred.

Add any other items that you can think of below:

Spectrograms of the data around the trigger.

pycbc_inspiral does not work with --psd-file option

First run:
pycbc_inspiral with --psd-estimation option, and output the psd.
Second run:
use the generated psd in the previous run to refilter the data with the --psd-file option.

We expect the results from both runs to be very similar to each other. However, in the second case no triggers are produced.

The necessary tar ball (containing the test data set and the template bank, both pycbc_inspiral scripts with the command line options used, the output psd and the output triggers in both cases) can be found in atlas miram.cabero/PyCBC/SNRLoss/test/temp/testpsd.tar.gz

software injection into a TimeSeries of zeroes is not tapered as expected

I was doing software injections into a TimeSeries of zeroes.

I'm using pycbc.inject.InjectionSet.apply (ie. what we use to do software injections in pycbc_inspiral). But the EOBNRv2 waveform that is supposed to be tapered looks like: https://sugar-jobs.phy.syr.edu/~cbiwer/tmp/nottapered.png

I did a little digging and here's what I've looked at so far. The function call that actually generates the waveform timeseries is lalsimulation.SimDetectorStrainREAL8TimeSeries (called via pycbc.detector.project_wave).

lalsimulation.SimDetectorStrainREAL8TimeSeries adds these little bits of noise at the start of the waveform timeseries. I saved the output in inject.py and it coming from this function call. There is some padding and interpolation that happens here, maybe needs a closer look.

Then this waveform timeseries is tapered with SimInspiralREAL8WaveTaper (called via pycbc.waveform.utils.taper_timeseries). Now the algorithm in SimInspiralREAL8WaveTaper is to taper to the second peak in the waveform. So these little bits of noise add a bunch of little peaks and SimInspiralREAL8WaveTaper tapers just a couple noise peaks.

Note if the waveform just abruptly ends then there is no problem with SimInspiralREAL8WaveTaper, it works as its suppose to. However these little noise bits cause a problem.

For the plot above the following injection file on sugar was used: /home/cbiwer/projects/pycbc_test_inject/issue/injection.xml.gz
And the following script on sugar to make the plot: /home/cbiwer/projects/pycbc_test_inject/issue/pycbc_test_inject

Maybe I'm doing something weird here but I'm a bit suspicious about the taper of our injections now.

CPU matched filter ignores number of threads argument

When I use the matched filter engine with the processing scheme set to CPU on Atlas, it ignores the num threads argument that is set in scheme.from_cli. For instance, if I run pycbc_inspiral with:

pycbc_inspiral {some arguments}  --processing-scheme cpu

Several parallel threads will get launched when it hits the filtering part (observed by running htop while the job while was running). This is true both on the head node and on the cluster nodes. If I manually try to set the number of threads with:

pycbc_inspiral {some arguments}  --processing-scheme cpu:1

It still ignores it. Oddly, however, if I manually export OMP_NUM_THREADS prior to running, then the number of threads does respect what I set it to (though still ignoring whatever from_cli is set to). For instance, if I do:

export OMP_NUM_THREADS=3
pycbc_inspiral {some arguments}  --processing-scheme cpu:1

The number of parallel threads that are launched is 3. If I repeat the same thing, but manually setting NUM_THREADS to 1, the number of threads is 1.

I don't know why this is happening. If I set the processing scheme to mkl, it does respect the number of threads that are set in from_cli. Looking at scheme.py, I see that the MKLScheme inherits from CPUScheme, the code that sets the environment must be working properly. Is it possible that omp somehow doesn't respect python's os.environ?

In any event, until this is fixed, 'cpu' should not be used (at least on atlas; maybe it's ok on another cluster), as it can cause a job to hog resources when running in a dag. I realize mkl is probably better anyway, but cpu is the default that is used if processing-scheme is not specified. This can cause issues for new users.

Fallback return value of get_waveform_filter_length_in_time() and time clustering

pycbc.waveform.get_waveform_filter_length_in_time() currently returns None if an approximant does not have an entry in _filter_time_lengths. This breaks --cluster-window template in pycbc_inspiral, effectively disabling clustering. Tom and I discovered this due to erroneously using TaylorF2 rather than SPAtmplt templates.

What's a sensible behavior here? Should an error be raised when using --cluster-method template and an approximant which can not report its duration? Or should we have a meaningful fallback duration in get_waveform_filter_length_in_time()?

Some of the old coh_PTF external commands are disabled

At the moment some of the old coh_PTF external commands are disabled, see for e.g.:

https://github.com/ligo-cbc/pycbc/blob/master/pycbc/results/legacy_grb.py#L243

We need to figure out a way to either make these work safely or to replace them in the workflow.

missing documenation

I've opened this issue to record the major sections of missing documentation. Please post additional things (or write the documentation).

hdf file format (Alex)
hdf workflow instructions (Alex)
examples of calculating waveform matches

Need exact version of dependencies for O1 release

Most of our PyPI dependencies say e.g. numpy >= 1.6.4 so we get at least that version, but may get newer versions (containing changes or bugs). For the O1 release and review, we should specify exact versions of all of the dependencies.

pycbc_inspiral psd-ouput option gives zero strain values

When running pycbc_inspiral with the psd-output option, the resulting psd has many zeros for the strain value. Luckily these are all at the start of the file produced, so you can go in to the file and delete them all before using this psd. I was getting issues when trying to run pycbc_inspiral using a fixed psd with these zeros in place (since you need to log these values).

cannot use the --frame-type command with pycbc_inspiral

I'm trying to use the --frame-type option with pycbc_inspiral and I'm receiving an error.

The command I ran on sugar was:

cd /home/cbiwer/src/cbccalibration_exe_dev/examples/workflow/er6_t1/work/main_ID0000001
pycbc_inspiral  --segment-end-pad 64 --segment-length 256 --pad-data 8 --sample-rate 4096 --segment-start-pad 64 --psd-segment-stride 128 --psd-inverse-length 16 --filter-inj-only  --psd-segment-length 256 --processing-scheme cpu --snr-threshold 5.5 --cluster-method template --approximant SPAtmplt --psd-estimation median --maximization-interval 30 --strain-high-pass 30 --order 7 --chisq-bins 16 --channel-name L1:OAF-CAL_DARM_DQ --low-frequency-cutoff 40 --gps-start-time 1102660099 --gps-end-time 1102662147 --trig-start-time 1102660091 --trig-end-time 1102662155 --output  110266/L1-INSPIRAL_KAPPA_A_0.975-1102660091-2064.xml.gz  --frame-type L1_RDS  --bank-file  L1-TMPLTBANK_SNGL-1102660163-1920.xml.gz  --user-tag KAPPA_A_0.975

The error was:

Traceback (most recent call last):
  File "/home/cbiwer/pycbc/taper_fix_20150811/bin/pycbc_inspiral", line 4, in <module>
    __import__('pkg_resources').run_script('PyCBC===ee0cef', 'pycbc_inspiral')
  File "/home/cbiwer/.local/lib/python2.6/site-packages/pkg_resources/__init__.py", line 735, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/home/cbiwer/.local/lib/python2.6/site-packages/pkg_resources/__init__.py", line 1652, in run_script
    exec(code, namespace, namespace)
  File "/home/cbiwer/pycbc/taper_fix_20150811/lib/python2.6/site-packages/PyCBC-ee0cef-py2.6.egg/EGG-INFO/scripts/pycbc_inspiral", line 186, in <module>
    gwstrain = strain.from_cli(opt, DYN_RANGE_FAC)
  File "/home/cbiwer/pycbc/taper_fix_20150811/lib/python2.6/site-packages/PyCBC-ee0cef-py2.6.egg/pycbc/strain.py", line 64, in from_cli
    end_time=opt.gps_end_time+opt.pad_data)
  File "/home/cbiwer/pycbc/taper_fix_20150811/lib/python2.6/site-packages/PyCBC-ee0cef-py2.6.egg/pycbc/frame.py", line 287, in query_and_read_frame
    paths = frame_paths(frame_type, start_time, end_time)
  File "/home/cbiwer/pycbc/taper_fix_20150811/lib/python2.6/site-packages/PyCBC-ee0cef-py2.6.egg/pycbc/frame.py", line 251, in frame_paths
    gpsend=end_time)
  File "build/bdist.linux-x86_64/egg/glue/datafind.py", line 232, in find_times
  File "build/bdist.linux-x86_64/egg/glue/datafind.py", line 124, in _requestresponse
RuntimeError: Server returned code 404: Not Found404 Not Found

The server has not found anything matching the Request-URI.

Opening an issue to try and figure this out.

pycbc_inspiral with SEOBNRv2_ROM_DoubleSpin produces no triggers

Pycbc_inspiral at commit 453db1f does not produce any trigger when SEOBNRv2_ROM_DoubleSpin templates are used. Discovered at AEI Hannover during a thunderstorm. Was already broken at 0827484.

Incompatibility between different LIGOTimeGPS

I hit this error trying to analyze ER7 data. For some reason, a call to pycbc.events.veto.segments_to_file() is given a segment list made of segments with a variety of time types, even inside a single segment. It looks like this is the result of arithmetic operations between segment lists of different types. One of the segments happens to have a lal.LIGOTimeGPS type. This raises a ValueError when the time is casted to a glue.ligolw.lsctables.LIGOTimeGPS in segments_to_file() (apparently the cast is not allowed). I suppose the proper solution would be for the cast to succeed (fixing glue?). But maybe PyCBC can be more robust about such problems. What do people think? Can we just switch to using lal.LIGOTimeGPS everywhere?

The --psd-model in pycbc_inspiral is broken

There is currently a type mismatch in pycbc_inspiral if using the --psd-model option. Specifically the strain and templates will be single precision, but the PSD will be double precision. This will then cause a TypeError when calculating the sigma for a template.

I have made a simple fix locally for this (force psd module to return float32), but I'm not sure the right place to fix this in master. Should the PSD take a "precision" option, and convert to that precision if needed, as the strain module does .... that seems the best solution to me? Let me know if that's the right thing to do and I can prepare a patch.

We need a new release of pycbc-pylal to support recent changes for GRB code

As it says on the issue we need a new release of pycbc-pylal and presumably an update to setup.py to request this new version. I can do this if someone can tell me what needs to be done (or if someone can do this quickly then please do!)

Need a program to generate the harmonic mean PSDs

We need code to generate the harmonic mean PSDs to build the template banks.

Add PSD option explaination for hwinj instructions

Ben has brought to my attention that I didn't document all the PSD options on the hardware injection instructions page for pycbc_generate_hwinj, eg. --psd-estimation, etc. At the moment they're just listed with pycbc_generate_hwinj --help.

I assign myself this issue to add them to the docs page.

Executable to determine concident livetime

One of the things we'll have to do fairly regularly in O1 (and beyond) is to wait for enough coincident livetime to be accumulated before cutting the next analysis block. Can we quickly put together a simple script to calculate this given a veto definer and the segment names, etc?

Basic features

take gps start/end time
account for applying vetoes
account for minimum science segment length

Bonus features

Report the maximum FAR we can estimate from the available time (include number of background bins?)
handle more than two detectors (needed for beyond O1)

Please respond if you can put this together, or if there is already a good way to get this information.

add useful mkl error message

Currently, if you try to run with MKL support and it isn't available most executable don't give a useful error message, instead giving some cryptic failure. I've seen this come up multiple times now, so this is my reminder to fix it.

--processing-scheme mkl is not working

Using pycbc_inspiral --processing-scheme mkl does not appear to work on master.

On sugar-dev3:

source ~/lalsuite/heads/lalsuite-v6.29_20150520_dfc6a/etc/lscsoftrc
source ~/pycbc/master_20150626/etc/pycbc-user-env.sh
cd /home/cbiwer/projects/pycbc_test_arg_fix/gw/work/main_ID0000001
pycbc_inspiral  --segment-end-pad 16 --cluster-method window --low-frequency-cutoff 40 --pad-data 8 --cluster-window 1.0 --sample-rate 4096 --injection-window 1.0 --segment-start-pad 64 --psd-segment-stride 8 --psd-inverse-length 16 --filter-inj-only  --psd-segment-length 16 --processing-scheme mkl --snr-threshold 5.0 --segment-length 512 --approximant SPAtmplt --newsnr-threshold 5.0 --psd-estimation median --keep-loudest-num 100 --strain-high-pass 30 --keep-loudest-interval 2 --order 7 --chisq-bins 128 --channel-name L1:LDAS-STRAIN --gps-start-time 966388166 --gps-end-time 966390406 --trig-start-time 966388230 --trig-end-time 966389982 --output  96638/L1-INSPIRAL_FULL_DATA-966388230-1752.hdf  --frame-files  96638/L-T1200307_V4_EARLY_RECOLORED_V2-966385664-4096.gwf   96638/L-T1200307_V4_EARLY_RECOLORED_V2-966389760-4096.gwf  --bank-file  BNS_NonSpin_30Hz_earlyaLIGO.xml --verbose

Gives:

Traceback (most recent call last):
  File "/home/cbiwer/pycbc/master_20150626/bin/pycbc_inspiral", line 4, in <module>
    __import__('pkg_resources').run_script('PyCBC===e95439', 'pycbc_inspiral')
  File "/home/cbiwer/.local/lib/python2.6/site-packages/pkg_resources/__init__.py", line 735, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/home/cbiwer/.local/lib/python2.6/site-packages/pkg_resources/__init__.py", line 1652, in run_script
    exec(code, namespace, namespace)
  File "/home/cbiwer/pycbc/master_20150626/lib/python2.6/site-packages/PyCBC-e95439-py2.6.egg/EGG-INFO/scripts/pycbc_inspiral", line 190, in <module>
    segments = strain_segments.fourier_segments()
  File "/home/cbiwer/pycbc/master_20150626/lib/python2.6/site-packages/PyCBC-e95439-py2.6.egg/pycbc/strain.py", line 596, in fourier_segments
    freq_seg = make_frequency_series(self.strain[seg_slice])
  File "/home/cbiwer/pycbc/master_20150626/lib/python2.6/site-packages/PyCBC-e95439-py2.6.egg/pycbc/filter/matchedfilter.py", line 274, in make_frequency_series
    fft(vec, vectilde)   
  File "/home/cbiwer/pycbc/master_20150626/lib/python2.6/site-packages/PyCBC-e95439-py2.6.egg/pycbc/fft/__init__.py", line 182, in fft
    thebackend = backends_dict[backends_list[0]]
IndexError: list index out of range

If I remove the --processing-scheme it keeps running.

Config file not showing up in reports

Hi Chris,

the config file is still not showing up properly in the reports of my latest runs:

https://atlas1.atlas.aei.uni-hannover.de/~tito/LSC/er7/er7_full_uberbank2_chi2var3_dq_2/0._configuration/

I only see the .ini file when I expand "B1. configuration.ini" and click "Link to file". Could you make it show up directly in the main body instead? I tried looking into the HTML rendering module it but I'm a bit lost.

Workflow/pegasus has no way to supply multi-ifo input arguments

At the moment, as can be seen on

https://github.com/ligo-cbc/pycbc/blob/master/pycbc/workflow/psd.py#L52

there is no neat way to supply input for the multiple-detector input argument formatting. I looked into this a while ago for pycbc_multi_inspiral but found an underlying issue with pegasus that prevented this. Just posting this here to remind me (or anyone else) and I'll try and update with an example of how I think this should be fixed (but which may not actually work).