GithubHelp home page GithubHelp logo

probcomp / cgpm Goto Github PK

View Code? Open in Web Editor NEW
25.0 25.0 11.0 9.27 MB

Library of composable generative population models which serve as the modeling and inference backend of BayesDB.

License: Apache License 2.0

Python 99.66% Shell 0.34%
probabilistic-programming data-analysis bayesian-inference tabular-data machine-learning

cgpm's People

Contributors

avinson avatar axch avatar gregory-marton avatar leocasarsa avatar riastradh-probcomp avatar schaechtle avatar zane avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cgpm's Issues

experiment: chain_cmi - Was gpmcc trained on a different data generator?

I wanted to check whether the error in the gpmcc_cmi results were due to few samples on the conditioned value, so I made the following plots. It seems that the data on which gpmcc was trained is different from simulate_abc.

simulate_abc
Plot from simulate_ABC(N=250)

engine_x
Plot from engine._X

engine_simulate
Plot from engine.simulate(-1,[0,1,2]) (4 samples for each of the 60 states)

Resolve matplotlib warnings from example.py

Running ipython -i example.py gives:

MatplotlibDeprecationWarning: The use of 0
(which ends up being the _last_ sub-plot) is deprecated in 1.4 and will
raise an error in 1.5

and

/usr/local/lib/python2.7/dist-packages/matplotlib/backend_bases.py:2399: MatplotlibDeprecationWarning: Using default event loop until function
specific to this GUI is implemented
  warnings.warn(str, mplDeprecation)

Should drop by pdb by running

In [1]: import warnings
In [2]: warnings.filterwarnings('error')

which gives a traceback.

The third warning

/usr/local/lib/python2.7/dist-packages/matplotlib/collections.py:590:
FutureWarning: elementwise comparison failed; returning scalar instead,
but in the future will perform elementwise comparison
  if self._edgecolors == str('face'):

is a matplotlib issue, fixed in 1.5.0

Do column parameters of uncollapsed columns need to be transitioned before hypers?

As it stands the row transition of a the random forest might leave the internal regressor in a state where it cannot compute logpdf_marginal, since it has observations that were incorporated (and counts[x]>0) but the internal regressor was never transitioned to use those counts.

Options:

  • Patch the computation of calc_predictive_logp to detect this case and resort to the uniform outlier model, def not ideal since we want hyperparameter transitions to account for the regressor and not just the outlier model.
  • Enforce the transition order as specified above (params and then hyperparams) for the uncollapsed dimensions.
  • Brainstorm more about alternatives.

Design and implement conditional component models

Typically discriminative. The conditional component model is fundamentally different from a foreign predictor.

  • In a foreign predictor, the latents of CC are unknown to kepler_law.
    MODEL period GIVEN apogee, perigee USING kepler_law
  • In a conditional component model
    MODEL eccentricity GIVEN * USING regression
    where * are the variables in the same view.

Control all entropy in gpmcc by passing around rng explicitly

Currently each state relies on setting the global rng using np.random.seed. We need each and every function which does some random computation to take in an rng object and use it for random bits. The change will be quite comprehensive, and requires update

  • All the simulate and sample_params methods in classes that implement DistributionGpm.
  • All the simulate_* in utils.general (used during inference).
  • All the simulate_* in state (use for queries).
  • All the simulate_* in utils.test (used for synthetic data generation).

There probably are more locations which a git-grep or other procedure need to identify.

Design and implement interface for MixtureGpm.

Dim, View, and State are all mixtures of other GPMs.

  • Dim is mixture of DistirbutionGpm
  • View is a multivariate Dim (approximation of a joint distribution of variables which are marginally DistributionGpms). A single Dim in a View is a standard univariate DP mixture of DistirbutionGpm.
  • State is a mixture of Dim, organizing each mixture component into a View.

At this stage I am inclined to eliminate the topical difference between Dim and View, and require all uses of even a single Dim to be wrapped into a View. Why? Because we cannot reuse inference code written for Dim in View, while the reverse is true.

State._transition_column should propose m-1 new views if not singleton

See: https://github.com/probcomp/gpmcc/blob/master/gpmcc/state.py#L247

According to the auxiliary variable algorithm (Algorithm 8, Neal 2000), there are supposed to be a total of m new views, including the current one if it is a singleton, such that the Gibbs step is stationary on the conditional distribution over the extended space of k + m views.

In the case of m = 1, the code does the right thing because m - 1 = 0 new views are created when the current is a singleton; but this is incorrect for general m and in the case of the default argument to the function which is apparently m = 3.

Unify state.transition to use max(N,S) for better inference control

Quoting @madeleineth

@fsaad from that code, it looks like it'd actually be simpler to merge the time and iterations code into a single loop, so the kernel execution and progress code is not repeated. Maybe a few helper functions would make it cleaner. And by merging the loops, you get rid of the artificial distinction. Something like:
while True:
p = _proportion_done(N, S, ...)
if p >= 1: break
_show_progress(p, ...)
_transition(...)

make serialization json-friendly

@fsaad I would like Engine.to_metadata and Engine.from_metadata to generate/accept plain-old-python-objects (not numpy) so that in gpmcc_metamodel, I can serialize/deserialize with json.{loads,dumps} instead of my current _engine_to_json and _json_to_engine hacks.

Remove rowids from dim.

Needs a significant refactoring but essential for SMC implementation. Dims should be wrapped in a View to unify the location of the inference algorithm.

Moreover dim is a GPM which exposes its latents anyway in its queries.

Enable column transitions with a conditional model

Currently disabled, need to ensure the following cases

Invariants to maintain

  • Conditional models are never initialized into singletons.
  • Conditional models are not proposed to move into singletons in the column CRP.
  • Conditionals models are not isolated (become singletons) when columns in their Views are being transitioned.

Correctly computing the transition probability

  • When an unconditional column is proposed to a View with a conditional column, need to compute both the marginal probability of the unconditional column under the View clustering and the delta in marginal logp of the conditional column, retraining given the proposal.

suspicious code in simple_predictive_probability

This variable name does not look right:

    is_observed = row > state.n_rows                                        
    if is_observed:                                                         
        logp = _simple_predictive_probability_unobserved(state, col, x)     
    else:                                                                   
        logp = _simple_predictive_probability_observed(state, row, col, x)  

Line 271 of gpmcc/utils/sampling.py.

Unfix seeds for random procedures

Evaluating twice procedures such as simulate and mutual_information produce exactly the same result. Passing seeds manually to them is not an optimal solution if we must evaluate them multiple times in a loop.

Initializing a StateGPM (and State.incorporate_dim) with a conditional model?

Initializing a Dim without the regression metadata will crash (entry points are State.incorporate_dim and State.__init__ with a conditional model.

The ViewGPM is smart enough to take the metadata from its unconditional dims and appending that important metadata to the conditional Dim metadata before creating it.

Possible solutions: like State.update_cctypes, delegate all initializing of Dim to the enclosing View. Requires refactoring.

Create test for simulate of DistributionGpms.

For instance, the posterior predictive of Normal-Gamma Normal is a Student t. Do we achieve identical simulation if we instead simulate rho ~ Gamma(Hn), mu ~ Normal(rho,Hn), then x ~ Normal(mu,rho) where Hn are the posterior hyperparameters?

Update the particle learning dimension for new hyperparameter inference scheme.

Currently almost completely broken after massive changes. More generally, need to use current implementation of SMC on a single dimension to SMC on an entire crosscat state. Talk with vkm on design of this inference algorithm.

To get the particle demo running checkout the particle-demo tag, see README.md for details.

Make sure the tests run.

Actually, rewrite the whole test suite. The best test we currently have is test_simulate, which should be made generic. I refrained from touching the tests until I rewrote the source to a point where it made sense to update the tests, since the intermediate GPM interfaces were not complete.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.