probcomp / cgpm Goto Github PK
View Code? Open in Web Editor NEWLibrary of composable generative population models which serve as the modeling and inference backend of BayesDB.
License: Apache License 2.0
Library of composable generative population models which serve as the modeling and inference backend of BayesDB.
License: Apache License 2.0
Should be in the StateGPM.
Required for some solid MML datatype experiments I am designing.
To capture cluster-specific dependence.
Empirically showing that a fixed ordering of the data in a Gibbs sweep converges faster -- permuting columns yet to be tested.
Metrics to follow.
Currently each DistributionGpm has its own method for plotting its univariate distribution, need perhaps half an hour to figure out how to unify that code for continuous/discrete settings. Low priority.
Running ipython -i example.py
gives:
MatplotlibDeprecationWarning: The use of 0
(which ends up being the _last_ sub-plot) is deprecated in 1.4 and will
raise an error in 1.5
and
/usr/local/lib/python2.7/dist-packages/matplotlib/backend_bases.py:2399: MatplotlibDeprecationWarning: Using default event loop until function
specific to this GUI is implemented
warnings.warn(str, mplDeprecation)
Should drop by pdb
by running
In [1]: import warnings
In [2]: warnings.filterwarnings('error')
which gives a traceback.
The third warning
/usr/local/lib/python2.7/dist-packages/matplotlib/collections.py:590:
FutureWarning: elementwise comparison failed; returning scalar instead,
but in the future will perform elementwise comparison
if self._edgecolors == str('face'):
is a matplotlib issue, fixed in 1.5.0
As it stands the row transition of a the random forest might leave the internal regressor in a state where it cannot compute logpdf_marginal, since it has observations that were incorporated (and counts[x]>0
) but the internal regressor was never transitioned to use those counts.
Options:
Separately for each state.
When doing State.transition
, keep a record of the number of iterations each kernel has achieved.
It seems to work on uncollapsed beta and truncated normal models, but the code is a mess.
More generally, implementation of MML. Task blocked on #38
Look into why switch >
to <
.
Currently implemented partial BQL in sampling.py
and some in engine.py
. Rewrite this whole mess into query.py
.
Probably numpy style.
using pool.close
or pool.terminate
The MEng proposal probably presents a more accurate view of GPMs though. Will put in both probably.
Typically discriminative. The conditional component model is fundamentally different from a foreign predictor.
MODEL period GIVEN apogee, perigee USING kepler_law
MODEL eccentricity GIVEN * USING regression
Currently each state relies on setting the global rng using np.random.seed
. We need each and every function which does some random computation to take in an rng
object and use it for random bits. The change will be quite comprehensive, and requires update
simulate
and sample_params
methods in classes that implement DistributionGpm
.simulate_*
in utils.general
(used during inference).simulate_*
in state
(use for queries).simulate_*
in utils.test
(used for synthetic data generation).There probably are more locations which a git-grep
or other procedure need to identify.
The logic for uncollapsed samplers looks broken. Need to read and rewrite very carefully.
Currently sitting in bdbcontrib, and not accessible to CMML.
Dim
, View
, and State
are all mixtures of other GPMs.
Dim
is mixture of DistirbutionGpm
View
is a multivariate Dim
(approximation of a joint distribution of variables which are marginally DistributionGpms
). A single Dim
in a View
is a standard univariate DP mixture of DistirbutionGpm
.State
is a mixture of Dim
, organizing each mixture component into a View
.At this stage I am inclined to eliminate the topical difference between Dim
and View
, and require all uses of even a single Dim
to be wrapped into a View
. Why? Because we cannot reuse inference code written for Dim
in View
, while the reverse is true.
See: https://github.com/probcomp/gpmcc/blob/master/gpmcc/state.py#L247
According to the auxiliary variable algorithm (Algorithm 8, Neal 2000), there are supposed to be a total of m new views, including the current one if it is a singleton, such that the Gibbs step is stationary on the conditional distribution over the extended space of k + m views.
In the case of m = 1, the code does the right thing because m - 1 = 0 new views are created when the current is a singleton; but this is incorrect for general m and in the case of the default argument to the function which is apparently m = 3.
For categorical values in an open set.
Recreated this duplicated issue to maintain in original Bitbucket repo numbers.
Quoting @madeleineth
@fsaad from that code, it looks like it'd actually be simpler to merge the time and iterations code into a single loop, so the kernel execution and progress code is not repeated. Maybe a few helper functions would make it cleaner. And by merging the loops, you get rid of the artificial distinction. Something like:
while True:
p = _proportion_done(N, S, ...)
if p >= 1: break
_show_progress(p, ...)
_transition(...)
@fsaad I would like Engine.to_metadata
and Engine.from_metadata
to generate/accept plain-old-python-objects (not numpy) so that in gpmcc_metamodel
, I can serialize/deserialize with json.{loads,dumps}
instead of my current _engine_to_json
and _json_to_engine
hacks.
Simple bulk operations in the State that take in multiple queries. Saves rebuilding a State for each query in the engine, for instance.
Needs a significant refactoring but essential for SMC implementation. Dims should be wrapped in a View to unify the location of the inference algorithm.
Moreover dim is a GPM which exposes its latents anyway in its queries.
Currently disabled, need to ensure the following cases
Invariants to maintain
Correctly computing the transition probability
This variable name does not look right:
is_observed = row > state.n_rows
if is_observed:
logp = _simple_predictive_probability_unobserved(state, col, x)
else:
logp = _simple_predictive_probability_observed(state, row, col, x)
Line 271 of gpmcc/utils/sampling.py
.
Many of the operations of the distributions in dists/ is duplicated code that can be reused. Define the interface, and generic implementations of duplicated methods.
For correctness of naming.
Evaluating twice procedures such as simulate
and mutual_information
produce exactly the same result. Passing seeds manually to them is not an optimal solution if we must evaluate them multiple times in a loop.
Initializing a Dim without the regression metadata will crash (entry points are State.incorporate_dim
and State.__init__
with a conditional model.
The ViewGPM is smart enough to take the metadata from its unconditional dims and appending that important metadata to the conditional Dim metadata before creating it.
Possible solutions: like State.update_cctypes
, delegate all initializing of Dim to the enclosing View. Requires refactoring.
For instance, the posterior predictive of Normal-Gamma Normal is a Student t. Do we achieve identical simulation if we instead simulate rho ~ Gamma(Hn), mu ~ Normal(rho,Hn), then x ~ Normal(mu,rho) where Hn are the posterior hyperparameters?
The posterior predictive is actually just a Beta Negative Binomial with r = 1. Work in progress on 1226-fsaad-dist-sampling branch, as well as other fixes to samplers.
Currently almost completely broken after massive changes. More generally, need to use current implementation of SMC on a single dimension to SMC on an entire crosscat state. Talk with vkm on design of this inference algorithm.
To get the particle demo running checkout the particle-demo
tag, see README.md for details.
Actually, rewrite the whole test suite. The best test we currently have is test_simulate
, which should be made generic. I refrained from touching the tests until I rewrote the source to a point where it made sense to update the tests, since the intermediate GPM interfaces were not complete.
Seems that singleton_logp
is no longer required as well?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.