GithubHelp home page GithubHelp logo

nanograv / enterprise Goto Github PK

View Code? Open in Web Editor NEW
65.0 65.0 64.0 14.19 MB

ENTERPRISE (Enhanced Numerical Toolbox Enabling a Robust PulsaR Inference SuitE) is a pulsar timing analysis code, aimed at noise analysis, gravitational-wave searches, and timing model analysis.

Home Page: https://enterprise.readthedocs.io

License: MIT License

Makefile 0.68% Python 87.69% Jupyter Notebook 11.63%

enterprise's People

Contributors

aarchiba avatar aarondjohnson avatar ark0015 avatar bencebecsy avatar bvgoncharov avatar danielreardon avatar hazboun6 avatar jellis18 avatar josephjsimon avatar nihanpol avatar paulthebaker avatar pennucci avatar pyup-bot avatar siyuan-chen avatar stevertaylor avatar svigeland avatar tcromartie avatar vallis avatar vhaasteren avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

enterprise's Issues

TypeError in data tutorial

When running data.ipynb or just copy/pasting into ipython from the data tutorial, the 'efac1' class instance's method 'sample' doesn't appear to recognize the argument 'size'; executing efac1.sample() returns one sample, but I cannot pass it 'size=5':
TypeError Traceback (most recent call last)
in ()
10
11 # return 5 samples from this prior distribution
---> 12 print(efac1.sample(size=5))

TypeError: sample() got an unexpected keyword argument 'size'

Summary or description for Prior class

As I was testing I found that I wanted an easy way to see what prior I was using. It may be nice to try to come up with some kind of summary method for the Prior class that displays the name of the distribution and maybe the parameters (i.e. mean, std for gaussian or limits on uniform, etc.).

We don't need this now but I wanted to start an issue so we don't forget.

Change Pulsar to run if user has PINT, but not libstempo

Currently the user gets a printed message that says 'ERROR: Must have libstempo package installed!' if the import call for libstempo raises an ImportError exception. There are a number of user types who do need libstempo installed, including those dealing with gamma-ray or x-ray timing data, since libstempo (really TEMPO2) can't deal with these types of TOAs.

It looks like the only change to the working code would need to be here and would just be changing:

    if pint:
        toas = list(filter(lambda x: isinstance(x, toa.TOAs), args))
        model = list(filter(lambda x: isinstance(x, TimingModel), args))

    t2pulsar = list(filter(lambda x: isinstance(x, t2.tempopulsar), args))

to

    if pint:
        toas = list(filter(lambda x: isinstance(x, toa.TOAs), args))
        model = list(filter(lambda x: isinstance(x, TimingModel), args))
    if tempo2:
        t2pulsar = list(filter(lambda x: isinstance(x, t2.tempopulsar), args))

Besides that we'd just add a truthful error message. Something like: raise ValueError('Must have either PINT or libstempo installed.')

Does anyone (@vallis @paulthebaker @stevertaylor ) know of other places in the code that might require libstempo/TEMPO2?

Add frequentist statistics to enterprise

Allow users to compute GWB optimal statistic and CW Fe, Fp, and eccentric-Fe within enterprise. They can already piece together the components, but it would be good to especially have the latter CW functions explicit.

implement arbitrary priors (1D)

We can use scipy.stats.rv_histogram() to implement arbitrary priors. Prior can already take an arbitrary RV instance, but Parameter is set up to use class factories.

scipy.stats.rv_histogram returns and RV instance with pdf, cdf, sample, etc methods. In addition to working with explicit histograms, it can take in an arbitrary function sampled at discrete points to use as the PDF.

hist = np.histogram(samples)
my_RV = scipy.stats.rv_histogram(hist)

Note that numpy.histogram returns (weights, bins), with the bins defining both endpoints so for n weights there are n+1 bins. An RV can be generated from an arbitrary function in this way:

xs = np.linspace(left, right, 1000)
data = my_func(xs[:-1])  # generate 1 fewer point...
my_RV = scipy.stats.rv_histogram((data, xs))
my_prior = Prior(my_RV)

To generate an enterprise.Parameter we need a class factory. The obvious use is for empirical priors, so in a new branch on my fork I've done this as a class called Empirical.

Towards distinguishing CW sources on GWB

During the astrophysics busyweek, we discussed a paper idea to study the prospects for distinguishing CW sources from the stochastic background.

For phase 2 of this, I would like to use an RJMcMC to adaptively choose the number of CW sources (by frequency and sky location). This could even get crazy and adaptively fit an unknown number anisotropic multipole moments and CW sources.

I'm opening this issue to make sure we think about this sort of functionality as we move forward. I don't know of any out of the box, widely used RJ samplers. @jellis18, your RJSampler from PAL2 is BayesWave inspired, right? It could handle this sort of thing.

How to save instantiated `PTA`s

I really think we need to come up with some way to be able to save and re-load instantiated PTA instances like pta.save(outfile) and then re-load with pta = load_pta(infile). I'm not sure exactly how to do this but it would be a huge time saver and also allow us to know exactly what model was run for each analysis.

using same PSR distance parameter for multiple CW signals

When a user initializes two (or more) CW signals to search simultaneously, each CW carries it's own set of pulsar distance parameters. These parameters should be collected so that all CW's use the same pulsar distance for each pulsar.

To my knowledge, this was discovered by @caitlinawitt during the IPTA MDC2. It will be important for when @bencebecsy incorporates pulsar terms in his N*CW work.

Prior parameter groupings

We need functionality for non-seperable priors. Currently we assume that p(a,b,c) = p(a)p(b)p(c) but that is not always the case.

fix pulsar distance table and try to pull distance from DM distance in timing code

I noticed a small bug in the way that the pulsar class gets pulsar distances from the file. If the pulsar name does not have a B or J then it won't recognize it. For example it won't know that 1713+0747 is J1713+0747. I can make that fix pretty easily.

Also, right now by default if the pulsar distance is not in that table then it uses 1 and 0.2 for the value and standard deviation of the distance. Perhaps it would be better to try to get the DM value from tempo2...

How to do upper limits (i.e. how to put linear priors on log parameters)?

I have just started testing and wanted to re-create the 9-year upper limit. However, we currently don't have a good way to put linear priors on log parameters. Of course we could hardcode something but I think we should have the parameter somehow know if the parameter is log or not.

Any thoughts?

Repeat parameters

Ok I have spent way way too long on this and figured I'd see what you all think. All I'm trying to do is come up with a robust way of dealing with repeat parameters. For example, say we want to include two wavelets (or two of any type of signal). Right now the parameters would be repeated and it would get all confused. I'd like a way of having it make the parameter names with _1, _2, etc appended so we at least can tell the difference. This becomes extra tricky when we have pre-initialized parameters that are meant to be shared across pulsars. In this case we need to make sure that we return the same parameter instance corresponding to the name, so parameter par and par_1 needs to be the same instance of par and par_1 for all pulsars.

I thought I had figured it out with this

class Parameter(object):
    """Parameter base class."""

    _names = []
    _calls = {}
    _call_counts = {}

    def __init__(self, name):
        if name in Parameter._names:
            self.name = name + '_{}'.format(Parameter._names.count(name))
        else:
            self.name = name

        Parameter._names.append(name)
        Parameter._calls[self.name] = self

    @classmethod
    def clear_registry(cls):
        cls._names = []
        cls._call_counts = {}

    def get_logpdf(self, value):
        return self._prior.logpdf(value)

    def get_pdf(self, value):
        return self._prior.pdf(value)

    def sample(self, size=1, random_state=None):
        return self._prior.sample(size, random_state)

    # this trick lets us pass an instantiated parameter to a signal;
    # the parameter will refuse to be renamed and will return itself
    def __call__(self, name):
        cnt = Parameter._call_counts.get(self.name, 0)
        Parameter._call_counts[self.name] = cnt + 1
        pname = self.name if not cnt else self.name + '_' + str(cnt)
        if pname in Parameter._calls:
            return Parameter._calls[pname]
        else:
            new = self.__class__(pname)
            Parameter._calls[pname] = new
            return new

but this seems very dangerous since the registries (_names, etc) accumulate. I made it so that it would clear this registry after a SignalCollection is initialized which would work in practice but this could still fail if we initialized single Signals and not a SignalCollection.

Any thoughts?

Enterprise not compatible with numpy 1.14.0

Sarah Vigeland and I were just troubleshooting a problem I was encountering when trying to run the continuous wave search code. We found that the latest version of numpy (1.14.0) doesn't seem to be compatible with enterprise. Here is a look at some of the documentation on the that latest numpy release:
https://github.com/numpy/numpy/releases

It sort of looked like from the error messages that I was getting that something was wrong with the np.einsum function, which appears to be under the "highlights" as one of the things that just changed in this release. (Sorry, I didn't take a screen-shot of the actual error message before I closed the terminal)

Get `PINT` working

I think we should re-factor Pulsar a bit so that it can use either libstempo or PINT. I think we should do this by making an ltinterface class as @vhaasteren had started a while ago.

Factors of 2pi needed to properly normalize likelihood

Factors of 2*pi are not included in determinant factors in enterprise. This is fine when e.g. we keep the number of red noise components the same. But if we vary, or use different GP basis kernels, then likelihood values can not be compared across models in a model selection scenario.

How to replace old prior.py references

Not sure if this is the right place for this, but I recently updated my enterprise code and none of my programs or scripts work as they used the old prior.py code. Is there any reference for how to refactor code to use the new way of defining priors?

Normalization for DM in utils.createfourierdesignmatrix_dm

Hi All,

I have recently started to use Enterprise, and while resolving a discrepancy of my results with other person results (from PAL2) noticed the following. In utils.createfourierdesignmatrix_dm we create a Fourier design matrix for DM variation noise. But it looks like there is a to-do item: # TODO: should we use a different normalization.

Here are my suggestions:

  1. Replace line 134 with the following:
    Dm = 4.15e3/(freqs**2)
  2. Add a comment:
    # 4.15e3 = K^(-1), where K is from Eq. 23 in L. Lentati paper: https://academic.oup.com/mnras/article/437/3/3004/2907742

Do you think these are worth implementing?
I would be happy to git-push these changes.

PTA class and spatially correlated signals

We need a PTA class that can hold >=1 pulsar and will know how to construct spatially correlated "phi" matrices. This class is what will be given to the Likelihood. It should have enough abstraction so that the Likelihood doesn't need to know if you are using spatial correlations or not (i.e. the API is as similar as possible for all different kind of signal and pulsar configurations).

@vallis, you got this.

Start usage documentation

We have enough code (at least once PR #54 is merged) that we can start to write some usage documentation. There are various ways to do this with Jupyter notebooks that can be converted to rst. I will have a go at doing this and using the notebooks will kill 2 birds with one stone so we have nice documentation and notebooks for new users to play with!

use list comprehension instead of map / filter

The code mixes list comprehensions and map / filter with lambda statements. I find list comprehension to be much more readable, and lambda statements are generally slower than their alternatives.

I say we:

  • replace all map and filter calls with list comprehension.
  • eliminate lambda statements wherever possible

Non-residual datatypes

We should be thinking about how we will handle datatypes different than residuals, e.g., DM data for cyclic spectroscopy timing (with DM uncertainties possibly correlated with TOA uncertainties), profile intensity data for profile-domain timing, etc.

Pulsar can easily host other vectors, but we will probably have to extend/replace the PTA object.

put reqs in `setup.py`

When setup.py runs it doesn't check if any requirements are satisfied. We use requirements.txt and requirements_dev.txt to setup the testing environment for Travis. These do not actually enforce anything for a user install.

If a user installs enterprise either using pip with git+ or from source the requirements won't be checked during the install process.

Consistent function naming in utils

Sometimes you have:
create_ABC versus createABC versus createAB_C

The get_* functions seem more consistently named. Probably best to make them create_ABC to be consistent.

no `scikit-sparse` dependency

It may be good not to require scikit-sparse as a hard dependency since it is only used in special cases (i.e. certain ECORR settings and spatial correlations). We could potentially just make a wrapper like I have done for mpi4py in PAL2 that will still allow the code to run but will use the standard scipy library instead. Something like

class cholesky(object):
    
    def __init__(self, x):
        self.cf = scipy.linalg.cho_factor(x)
        
    def __call__(self, y):
        return scipy.linalg.cho_solve(self.cf, y)
    
    def logdet(self):
        return np.sum(2 * np.log(np.diag(self.cf[0])))

That way we could import this if scikit-sparse is not found and it would have the same signature. We would of course output a warning if this were used but at least it would make installing and running easier.

What do you think?

How to do generic Fourier signals?

We already have the ability go pretty much use whatever spectrum we want through the FourierBasisGP class factory and we can do lots of different selections. However, right now the basis is fixed (i.e. for something like DM we would need a completely new signal factory or at least a lot of copied code).

I've been working on a few things to make this more flexible and I'd like to know what you think. I've defined a new factory (the name could change):

def Fourier(func, ftype, **func_kwargs):
    
    class Fourier(object):
        
        def __init__(self, psr):
            self._psr = psr
            
        def __call__(self, **kwargs):
            kwargs.update(func_kwargs)
            return selection_func(func)(self._psr, **kwargs)
        
        @property
        def ftype(self):
            return ftype
    
    return Fourier

This uses some of the fancier stuff that we are using from the selection, that is, if the underlying funcs arguments are attributes of the pulsar class then it will use them. This way we can still use generic functions without having to specifically write them to read in the Pulsar class (just make the arguments attributes of the Pulsar class).

So this could interact with FourierBasisGP as follows

pl = base.Function(utils.powerlaw, log10_A=parameter.Uniform(-18,-12), gamma=parameter.Uniform(1,7))
basis_dm = Fourier(utils.createfourierdesignmatrix_dm, 'dm')
basis_red = Fourier(utils.createfourierdesignmatrix_red, 'red')
rn = FourierBasisGP(pl, basis_red, selection=selection)
dm = FourierBasisGP(pl, basis_dm, selection=selection)

for DM and red power-law noise. Basically it would now require a spectrum and a basis. The type part of it is needed so that we could have different kinds. The cool thing is that we can also set arguments on the outside so if we wanted like Fourier(utils.createfourierdesignmatrix_red, 'red', fmin=1e-9, fmax=1e-7).

What do you all think?

How to do wideband TOAs

I've started to think of how we are going to do wideband TOAs. In this case we have both TOAs and DM measurements per band per epoch. In PAL2 I just appended the extra DM measurements onto the residuals and constructed the components as follows

eq

where d is the data vector (dt and dx are the time residuals and DM residuals, respectively), T is the overall basis matrix with T' the standard TOA basis matrix (not including DMX), K is the DMX design matrix and K is the DMX design matrix again but without the 1/nu^2 scaling and N is the new white noise covariance matrix which is just the normal TOA uncertainties with the DM uncertainties appended.

Once I had this, the likelihood is the same as before since everything is just a product of these three components. Since the signal is built up from multiple components instead of defining T "by hand" as was done in PAL2 this approach is a bit harder.

I'm not quite sure of an elegant way to do this. Any suggestions?

Parameterization and units of PSD (Fourier design matrix)

I'd like to raise an issue that has been bothering me in all our previous codes for a while: we have always been using the incorrect units for the PSD, as first defined in Lentati et al. (2013). Enterprise is the place to do it properly. However, there are some caveats that have to do with numerical stability, so let me explain.

The function createfourierdesignmatrix_red contains the relevant code. We always create the Fourier Design Matrix (F) as consisting of basis functions that are essentially of the form cos(x), so with a range of [-1,1]. Doing this places the binsize of the PSD (1/T) in the definition of the PSD (diagonal of phi), which makes the units of phi sec^-2. The actual PSD is of course independent of the binning, and should have units of sec^-3. I would propose the following:
Place the binsize in the definition of F, so that the PSD becomes independent of the binning, and has proper units.

The caveats I can see are the following:

  • the mode parameters ('a', the ones that one typically analytically marginalizes over), now no longer have the units of sec, which makes them less well-interpretable. However, since they pretty much always have a modeled prior on them, I don't think this is an issue. When sampling, this is therefore not really an issue.
  • When using complicated PSD parameterizations with different sized binning at different frequencies (like in van Haasteren & Vallisneri, 2014), the matrix F no longer scales with the original F (with 1/T included), since different columns are scaled differently. However, I don't see a real reason for this to cause numerical instability.
  • In a way, the binning is actually a model parameter, which makes F dependent on a model parameter. However, covertly that is already the case in the current function, since the binning also requires one to set the PSD sampling frequencies, which are already passed to createfourierdesignmatrix.
  • The results of the package would have to be scaled to compare with previous results. However, in my memory, there aren't many significant results that have been presented in papers that use the units of sec^-2. Those are mostly just internal values.

How do we subclass signals?

Currently all signals are defined by class factories of the form

def SignalName(*args, **kwargs):
     class SignalName(Signal):
         def __init__(self, psr):

    # define things here

    return SignalName

The question is how can we subclass the specific signal class? For example we have FourierBasisGP signals for red noise but a DMFourierBasisGP factory would be identical except the F matrix and parameter names would be different. We definitely need some way of doing this, otherwise we are going to be re-writing a lot of code.

Any thoughts?

Import names into `enterprise`?

I like the logical arrangement of submodules within enterprise, but it leads to some verbosity in analysis scripts. For instance, having to import enterprise.white_signals as white_signals, and then write white_signals.MeasurementNoise, etc.

We could alleviate this to different degrees: e.g.:

  • Import all names into enterprise. I think the risk of collisions is actually slight, but one may prefer the structuredness of separating parameters from signals, etc.
  • Import all signal names into enterprise.signal, so users don't need to think about the type of a signal.
  • Import all utility functions into enterprise proper.

cache_call messes up likelihood if more than one CW source is added

TLDR: It looks like cache_call does something funky if the PTA object has more than one CW signal in it and the likelihood is called multiple times (which, of course, happens always in MCMCs). For now, I have a workaround of changing the default limit to 1 in cache call (here: https://github.com/nanograv/enterprise/blob/master/enterprise/signals/signal_base.py#L853), but it would be nice if someone familiar with cache_call could look into this and figure out how to solve this properly.

Details:
I ran on Linux, with an enterprise version sometime from earlier this year. I set up my PTA object like this:

n_source = 2

efac = parameter.Constant(1.0)
ef = white_signals.MeasurementNoise(efac=efac)
tm = gp_signals.TimingModel(use_svd=True)

base_model = ef + tm

cws = []
for i in range(n_source):
    log10_fgw = parameter.Uniform(np.log10(3.5e-9), -7)(str(i)+'_'+'log10_fgw')
    log10_mc = parameter.Constant(np.log10(5e9))(str(i)+'_'+'log10_mc')
    cos_gwtheta = parameter.Uniform(-1, 1)(str(i)+'_'+'cos_gwtheta')
    gwphi = parameter.Uniform(0, 2*np.pi)(str(i)+'_'+'gwphi')
    phase0 = parameter.Uniform(0, 2*np.pi)(str(i)+'_'+'phase0')
    psi = parameter.Uniform(0, np.pi)(str(i)+'_'+'psi')
    cos_inc = parameter.Uniform(-1, 1)(str(i)+'_'+'cos_inc')
    log10_h = parameter.LinearExp(-18, -13)(str(i)+'_'+'log10_h')
    cw_wf = models.cw_delay(cos_gwtheta=cos_gwtheta, gwphi=gwphi, log10_mc=log10_mc,
                 log10_h=log10_h, log10_fgw=log10_fgw, phase0=phase0,
                 psi=psi, cos_inc=cos_inc, tref=53000*86400)
    cws.append(models.CWSignal(cw_wf, psrTerm=False, name='cw'+str(i)))

s = base_model
for i in range(n_source):
    s = s + cws[i]

model = []
for p in psrs:
    model.append(s(p))

pta = signal_base.PTA(model)

Then, I calculate the likelihood difference when moving a bit away from a given point:

xx = {'0_cos_gwtheta':np.cos(np.pi/3),
      '0_cos_inc':np.cos(1.0),
      '0_gwphi':4.5,
      '0_log10_fgw':np.log10(8e-9),
      '0_log10_h':np.log10(7.5e-15),
      '0_phase0':1.0,
      '0_psi':1.0,
      '1_cos_gwtheta':np.cos(np.pi/2),
      '1_cos_inc':np.cos(2.0),
      '1_gwphi':1.0,
      '1_log10_fgw':np.log10(2e-8),
      '1_log10_h':np.log10(5e-15),
      '1_phase0':2.0,
      '1_psi':0.5}

xx_keys = xx.keys()
print(xx_keys)
dx = 0.1
for key in xx_keys:
    xx_mod = copy.deepcopy(xx)
    xx_mod[key] += dx
    delta_L = pta.get_lnlikelihood(xx_mod) - pta.get_lnlikelihood(xx)
    print(delta_L)

When I run it first, I get the right answer:

-1.38655732293
-0.674862583255
-0.217860197201
-51.636416802
-2.66131148511
-0.424086374253
-1.72225094947
0.0709663308226
-0.252515423083
-0.958126336074
-0.490308497469
-0.7501117713
-0.173086848721
0.0540416986587

The second time I run the same code, I get different values:

0.0
0.703805200599
-0.214432425753
-51.6128221575
-2.66575064577
-0.415693103816
-1.7059718191
0.0169246321639
-0.252515423083
-0.958126336074
-0.490308497469
-0.7501117713
-0.173086848721
0.0540416986587

If I change the limit to 1 in cache_call as described above, I get the same results anytime I call this test which is identical to the one I get the first time.

Standard naming of parameters

We should standardize on a convention, e.g., PSRNAME_SIGNALNAME_PARNAME_SELECTION; or should it be (more logical, but perhaps less readable) PSRNAME_SELECTION_SIGNALNAME_PARNAME?

pta.summary() should return a string

Currently PTA.summary() prints metadata to stdout. This makes it awkward when one wants to save this to a file as a record of a run.

It would be better if this returned a string that could be printed or written to file as necessary.

Pulsar dependent parameters

We have talked a bit about this before but it is time to get on it. At the moment, Parameters are basically defined generically with some bounds (or mean, std for gaussian) and then initialized with a name (for pulsar and/or selection). We can also initialize them before being passed to a signal so that they are pulsar independent. However, for some cases (pulsar timing parameters and pulsar distances come to mind) we want a nice way of setting the hyper-parameters (i.e. bounds or gaussian means and variances, etc) from pulsar information. It is possible to do this by pre-initializing these parameters before hand but then we would need to define separate signal classes for every pulsar.

Maybe we can make some sort or wrapper or container class for parameters of this type?

Any ideas?

Parameter and Signal classes

In terms of overall design I think that we should have a Parameter class similar to PINT Parameters with priors attached as they do there. We should then have a Signal class, again similar to PINT models which will have a setup method that reads in Pulsar class instances, or at least attributes. This way we can make a collection of signals independent of the pulsars. It is only when the signals are evaluated (or setup) that they interact with the pulsar.

In terms of the model, I think we should have a PulsarModel and a PTAModel where the pulsar model is for single pulsars (i.e., noise, timing model, etc.) and the PTA model is for common signals (i.e., GWs, ephemeris, clocks). We should probably focus on the PulsarModel first as a PTAModel would just be a collection of PulsarModel instances + some methods to compute PTA covariances and such.

For the PulsarModel class I think it should read in the Pulsar object and a collection of Signal objects similar to the PINT residual class. I haven't fully thought through how likelihoods would be incorporated into this. Of course they should be part of the PulsarModel but maybe that should be sub-classed for different kinds of likelihood (i.e., marginalized vs fully hierarchical).

For now I'm attaching a Milestone to have a basic working version of PulsarModel finished by the beginning of the busy week.

Speed up basis matrix column checking

Currently we always go through every column of every basis matrix to check if any columns are identical. This is very useful since it allows us to use an arbitrary number (and value) of frequencies for different red noise processes without using more columns in the overall basis matrix than we need. For example, if we are using the same 30 frequencies for red noise and GWB then the overall basis matrix should only have 60 values (sine and cosine double the value) and not 120.

This all works fine but the method that we are currently using is very brute force and very slow. It takes several minutes for the newer data sets. This is not terrible since we only have to do it once but it is still a bit annoying.

Does anyone have any thoughts on how to speed this up?

Use fixed BayesEphem parameters by extending Parameter classes

I've been thinking about using a fixed BayesEphem model for single pulsar noise runs. These are for model selection to determine what sort of RN model to use for that pulsar. I don't want an ephemeris error to be picked up so it seems like a more complicated model is needed. It may not matter, but I'd like to find out...

There are two options:

  1. fix ephemeris params to mean values from full PTA GWB analysis
  2. use informative priors on ephemeris params from full PTA GWB analysis output by fitting a normal distr to the chains

(this is only marginally cheating since most of the BE parameter info is coming from the other N-1 pulsars)

I can write a modified PhysicalEphemerisSignal class factory for each case... but I would prefer to extend the existing Parameter classes to handle the it. I may need some help.

Method 1

I want to do something like

eph = deterministic_signals.PhysicalEphemerisSignal(
                frame_drift_rate=parameter.Constant()('frame_drift_rate'),
                d_jupiter_mass=parameter.Constant()('d_jupiter_mass'),
                d_saturn_mass=parameter.Constant()('d_saturn_mass'),
                d_uranus_mass=parameter.Constant()('d_uranus_mass'),
                d_neptune_mass=parameter.Constant()('d_neptune_mass'),
                jup_orb_elements=parameter.Constant(size=6)('jup_orb_elements'),
                use_epoch_toas=True)

then use PTA.set_default_params() to populate them.

If we modify Constant to take a size input (like Normal and Uniform), then the above should just work.

Method 2

To do (2) we need to allow a size > 1 parameter to take a list of distribution parameters instead of one. Currently, we do things like this

jup_orb_elements=parameter.Uniform(-0.05, 0.05, size=6)('jup_orb_elements')

to get 6 orbital parameters with the same prior.

I would like to do this

joe_mean = [-0.0072, -0.0036, -0.0099, -0.0099, 0.0015, 0.0150]
joe_std = [0.0068, 0.0082, 0.0059, 0.0084, 0.0054, 0.0098]
jup_orb_elements=parameter.Normal(joe_mean, joe_std, size=6)('jup_orb_elements')

to get 6 parameters with different priors (although the same underlying distribution. The size could even be inferred from the size of the input arrays.

We could also allow

parameter.Normal([a, b, c], 1)

to get three parameters with different means, but the same standard deviation.

Test out sparse matrix speed for full PTA band diagonal matrix

In the current codes (i.e. PAL2, NX01, piccard) we use a scheme where we invert individual npsr x npsr blocks of the full PTA phi matrix. This is very fast but it requires a lot of bookkeeping. We would like to try out some sparse matrix algebra to see if inverting this band diagonal matrix via sparse methods is comparable to the block method.

Unless the sparse matrix is significantly slower we should start with this method in the PTA code base but should allows for generic methods that will have solve methods. That way, the user can specify which method to use but the API remains the same.

@stevertaylor, can you look in to this?

Developers tutorial/guide for enterprise

I'm interested in building some analysis method into enterprise so it's usable with all the functionality that already exists. I have no idea whether this is feasible, but in any case I would have to dig a little deeper into the workings of enterprise than the tutorials provide. I have started digging into the code behind some of the things I've come across in the tutorials, but I couldn't see the wood for the trees (A decorator that makes a decorator that returns a class factory? O.o)
There currently isn't a tutorial that helps with this, but maybe this is something that can be made. Alternatively, what is a good place to start digging?

Allow for arbitrarily named pulsars in the BasePulsar class

If an arbitrary pulsar (in my case, a simulated pulsar with a random name) is given to the Pulsar class and the tempopulsar.name does not begin with a J or a B, then the "if" statement beginning at line 69 of pulsar.py will cause pdist to be reference before assignment. I know this can be fixed just by adding a J or a B in front of the pulsar name, but for the sake of allowing for ease of use with simulated pulsars, it might be worth adding additional "if" statements to allow for arbitrary names.

Should use logging

We should probably use the logging module to give warnings and errors where needed in the code as we move forward.

bundle subset of 11yr data for testing

Currently, enterprise comes with all of the NG 9yr and IPTA MDC1 data. The 9yr data also contains PAL2 compatible noisefiles. We should bundle enterprise with the minimal recent data needed for testing.

  • a few 11yr .par and .tim files
  • matching enterprise compatible .json noisefiles

Currently, we test Pulsar on one MDC1 pulsar to test "IPTA style" data. Is this even necessary?

PPTA DR1e J1909-3744

Hello - Just wondering if anyone had managed to run the PPTA pulsars through enterprise/ptmcmcsampler? I've found that trying to fit for red/white noise models on the 1909 DR1e dataset causes it to get "stuck" on an unfeasible likelihood and no longer accepts any new samples (the chain file just stays constant). I imagine this would have been tried before, but otherwise I can put together an example to show what I mean.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.