GithubHelp home page GithubHelp logo

bracketjohn / kerndisc Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 85 KB

Automatic Kernel Discovery for Gaussian Processes

Python 100.00%
discovery gaussian-processes imputation interpolation machine-learning structure

kerndisc's People

Contributors

bracketjohn avatar

Stargazers

 avatar

Watchers

 avatar  avatar

kerndisc's Issues

Simplify Kernels and Remove redundancy

This also enables us to do textual description.

DoD

  • Assure that kerndisc never retries unnecessary models

Tasks

  • Create criterions on when a model can be evaluated:
        current_models = [model.simplified() for model in current_models]
        current_models = ff.remove_duplicates(current_models)

with simplified being defined as:

    def simplified(self):
        k = self.copy()
        k_prev = None
        while not k_prev == k:
            k_prev = k.copy()
            k = k.collapse_additive_idempotency()
            k = k.collapse_multiplicative_idempotency()
            k = k.collapse_multiplicative_identity()
            k = k.collapse_multiplicative_zero()
            k = k.canonical()
return k
  • Remove duplicates after simplification
  • remove duplicates that were already scored (#22)
  • Testing

Add simplification

DoD

  • Models are automatically simplified after extension

Tasks

  • Implement simplification of kernels with regard to:
    def simplified(self):
        k = self.copy()
        k_prev = None
        while not k_prev == k:
            k_prev = k.copy()
            k = k.collapse_additive_idempotency()
            k = k.collapse_multiplicative_idempotency()
            k = k.collapse_multiplicative_identity()
            k = k.collapse_multiplicative_zero()
            k = k.canonical()
return k

Users should be able to pass a grammar to `discover`

Currently a user has to add a new file to the grammars package and modify existing source code in order to create and use a new grammar.

This is not really "library like" and should be easier. Instead some method in kernDisc should be able to accept a method somewhere that is then used for expansion instead.

Add stopping criterions

DoD

  • Have selectable stopping criterions for search

Tasks

  • Implement stopping criterions
    • max_depth
    • didn't improve by more than ... percent (no_improvment)
    • ...
  • Test

Add automatic textual description

AS implements the ABCD framework for textual description/explanation of kernels generated by it. We also need this for simplified kernel expressions.

Relies on #11

DoD

  • Have ability to describe a simplified kernel expression

Tasks

  • Add explanation package to explain and describe kernels
  • Have explanation syntax and texts as described by Duvenaud et al.

Introduce simple graphing capabilities

Graphing is often necessary to see what's going on and manually tune search.

This has to be done manually every time currently, a feature for this might be nice.

Tasks

  • Evaluate whether this is a good new feature
  • Extend tasks here

Create tasks necessary to be completed for `kernDisc`

DoD

  • Have all new tasks in kerndisc repository

Tasks

  • Break down kernDisc construction into sub tasks
  • Create sub tasks
  • Specify dod, tasks in new sub tasks
  • Go trough current automated-statistician
    • jot down it's abilities
    • specify subset to realize in this new, smaller lib

Add random restarts

DoD

  • Have random restarts functionality, if necessary

Tasks

  • Implement functionality
    • Select n models at random to have them restart
      • initialise model with random standard deviation, likelihood
  • Write tests
  • Look further into:
def add_random_restarts(models, n_rand=1, sd=4, data_shape=None):
    new_models = []
    for a_model in models:
        for (kernel, likelihood, mean) in zip(add_random_restarts_single_k(a_model.kernel, n_rand=n_rand, sd=sd, data_shape=data_shape), \
                                              add_random_restarts_single_l(a_model.likelihood, n_rand=n_rand, sd=sd, data_shape=data_shape), \
                                              add_random_restarts_single_m(a_model.mean, n_rand=n_rand, sd=sd, data_shape=data_shape)):
            new_model = a_model.copy()
            new_model.kernel = kernel
            new_model.likelihood = likelihood
            new_model.mean = mean
            new_models.append(new_model)
return new_models 

Have functioning prototype of `kerndisc`

DoD

  • Have functioning first version of kernDisc

Tasks

  • Finish all base tasks in this repo
  • Have this run successfully on some time series
    • Make results of first run accessible

Add jitter

DoD

  • Apply nondeterministic jitter to models

Tasks

AS Does the following:

def add_jitter_k(kernels, sd=0.1):    
    '''Adds random noise to all parameters - empirically observed to help when optimiser gets stuck'''
    for k in kernels:
        k.load_param_vector(k.param_vector + np.random.normal(loc=0., scale=sd, size=k.param_vector.size))
    return kernels     

def add_jitter(models, sd=0.1):
    for a_model in models:
        a_model.kernel = add_jitter_k([a_model.kernel], sd=sd)[0]
return models 
  • Implement here
    • Test

Sort is missing as part of simplification

This currently does not hold:

def test_simplify_order(are_asts_equal):
    """Test whether order is also irrelevant."""
    ast_one = Node(gpflow.kernels.Sum, full_name='Sum')
    Node(gpflow.kernels.RBF, parent=ast_one, full_name='rbf')
    Node(gpflow.kernels.Constant, parent=ast_one, full_name='constant')

    ast_two = Node(gpflow.kernels.Sum, full_name='Sum')
    Node(gpflow.kernels.Constant, parent=ast_two, full_name='constant')
    Node(gpflow.kernels.RBF, parent=ast_two, full_name='rbf')

    simpl_one = simplify(ast_one)
    simpl_two = simplify(ast_two)

    assert are_asts_equal(simpl_one, simpl_two)

To fix this:

  • sort has to be implemented,
  • conftest.py::are_asts_equal has to get the following addition:
         	if lvl_order_ast_one != lvl_order_ast_two:
     	        return False
    
    This could be part of another if block, maybe we don't need to check order every time.

Move to truely bayesian approach

It might be better to use a Bayesian approach in combination with auto diff.

Tasks

  • Evaluate whether this improves performance
  • Implement Bayesian parameter initialization
  • Test this

Update to newer gpflow version

The new gpflow version changed its kernel hierarchy. Now a Sum or Product kernel has a kernels child instead of all childs that are part of said Sum or Product. kernels can then be used to get children.

This is breaking for kernDisc and can currently not be supported.

Remove nan scored

DoD

  • Go trough models scores after training and remove nan scores

Tasks

  • Implement this functionality
  • Test
  • Check whether AS does anything more than this -> it does not

Implement own version of Duvenauds Grammar

There are some things that occurred during development and evaluation that might be improvements on the grammar of Duvenaud et al.
These improvements also don't seem to be covered in Lloyd et al.

It might be good to implement those in a modified version of duvenauds grammar, some small changes were already made that move away from Duvenauds design, these must then also be moved.

Tasks

  • Create new _grammar_duvenaud_modified.py grammar
  • Implement/Move over base kernel exclusion
    (rationale: Stops GPs from just taking constant or white as the "best" result)
  • Maybe make initial full expansion part of this grammar? Current implementation as part of discover might be unreasonable

TODO: Add more improvements noticed during development here.

Out Of Bound optimising functionality

DoD

  • Apply 0 1 like prior to oob optimization
  • Have documented somewhere what this means

Tasks

  • Look what AS does here
  • Write down their procedure
  • Implement
    • Test

Add time series scaling to preprocessing

It seems to have an impact on a time series if it is very sparse and also stretches over a large time interval. Some of these examples include 39 data points over a duration of about 45k minutes (~75h).

This leads to:

  • kerndisc tending to find constant and white kernels (as an end result) to deal with this,
  • bad interpolations (function returns to some constant value in between pre conditioned points)

To tackle this behavior kerndisc should be able to rescale X to some interval during preprocessing.

This interval should still keep the same relative distance in between points.

Tasks

  • Implement rescaling option for preprocessing
    • Allow to rescale to some arbitrary, desired interval
    • Allow automatic rescaling (e.g., [0, ..., 49] for n=49 data points)

Implement Precedence Ordering in `describe`

Currently describe returns it's description in an arbitrary order for all sub components. This should be improved to take into account some kind of metric. Examples of this include:

  • Sorting by reduction of some error,
  • sorting by depth component was found (if this is identifiable),
  • ....

Transition to `gpflow` Naming

Currently there is a kind of duality in the naming used internally:

gpflow has it's own naming scheme, which capitalizes kernel names. However they go as far as also using camel case (ArcCosine) and upper case (RBF). Currently this is then abstracted in kerndisc by the _kernels module, which maintains a dict of kernel_name.lower(): kernel_class mappings. This is useful for general coding, but also for grammar definition: The _grammar_duvenaud allows for kernels to be excluded from being a base_kernel. These can be passed by passing:

grammar_kwargs={'base_kernels_to_exclude': ['constant']}

as a parameter to discover. Here it is easier for a user to just use lower case kernel names, than remembering all the different casing options.

Whether this justifies deviation from gpflow is questionable though.

Find better naming for `SPECIAL_KERNELS`

SPECIAL_KERNELS are cp and cw kernels, currently accessible as SPECIAL_KERNELS, which doesn't make a lot of sense. They are closer to structural kernels or sth.

Fix install errors that are sometimes experienced

There seems to be lazy_load errors and deprecation warnings after a fresh installation right now. This is most likely fixed by addressing #44.

Will wait for more people to experience this before fixing, as this library is in low maintanence mode right now.

kerndisc alpha

This epic can be closed once kerndisc is ready for alpha.

Closing this will also move the transition of this project from using an internal, self management storyboard to an open source issue/participation style system.

Remove duplicates

DoD

  • Automatically remove duplicates from search space

Tasks

  • Look into difference between redundancies, duplicated in AS
    • Add more tasks here if there is a difference

Implement`additive_form` AS

DoD

  • Have method that simplifies kernels into additive form
  • Method should return canonical representations

Tasks

  • Implement additive form method
    • Test
  • Add option to enable this
  • Reremove duplicates afterwords

Re-add best model(s) to search space

DoD

  • Have readd functionality
  • Observe its performance compared to usual performance

Tasks

It seems like AS never really used this functionality, as best_models is never set to anything else than None.

  • Implement this
  • Write tests
  • Evaluate its performance gain vs search time
        if not best_models is None:
            for a_model in best_models:
                current_models = current_models + [a_model.copy()] + ff.add_jitter_to_models([a_model.copy() for dummy in range(exp.n_rand)], exp.jitter_sd)

Improve AST handling in `simplify`

Every sub function of simplify currently uses pythons deepcopy, in order to return an actual object.

It should be discussed whether this is necessary or whether hiding these lower levels of simplification API should be taken to full extend. This would mean creating a single deep copy in simplify and working on this copy afterwards.

Write scoring and training for `kerndisc`

DoD

  • Have working model evaluation
  • Have working training function

Tasks

  • Find way to score models
    • Can be ML at first
  • Implement scoring
  • Implement training, invoke training as defined by gpflow
  • Determine parameter procedure
  • Think about converting to a truely bayesian env

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.