caomw / bhattacharrya Goto Github PK
View Code? Open in Web Editor NEWThis project forked from teamdandelion/bhattacharrya
Implementing the recursive Bhattacharrya kernel
This project forked from teamdandelion/bhattacharrya
Implementing the recursive Bhattacharrya kernel
Implements the Bhattacharrya Kernel in Python and Matlab. The Bhattacharrya Kernel: The Bhattacharrya Kernel is described by Kondor in "A Kernel Between Sets of Vectors" (Kondor & Jebara, http://machinelearning.wustl.edu/mlpapers/paper_files/icml2003_KondorJ03.pdf). The kernel takes a base kernel k, two datasets D1 and D2, where each dataset is a set of vectors in R^r (r-dimensional space). The Bhattacharrya kernel then maps the vectors in these two sets into a Hilbert space using the base kernel and calculates the Bhattacharrya overlap between the two datasets in the Hilbert space. For the mathematics, see "A Kernel Between Sets of Vectors". The Algorithm: The program in "bhatta.py" is built so it can take a number of datasets and compute the Bhattacharrya Kernel matrix between them, or can compute Bhattacharrya evaluations between individual datasets. Because a lot of the computation is independent to each individual dataset, the Bhatta_Manager class instantiates a Dataset object for each dataset, which precomputes the eigenvector decomposition, local kernel matrix, etc. The MemorizedBhatta class operates with a similar principle but takes new Datasets on the fly, so that not all the Datasets need to be known at object instantiation. Proving Correctness: Since the mathematics are fairly opaque it can be difficult to prove correctness of the Bhattacharrya algorithm. However, we can test the correctness of the algorithm using the polynomial kernel. Since the polynomial kernel maps to a known and easily computed Hilbert space, we can directly map datasets into the polynomial kernel's Hilbert space and manually test the Bhattacharrya overlap, and compare this result to the Bhattacharrya algorithm evaluation with a polynomial base kernel. If the results are identical for several random datasets, it is extremely likely that the algorithm is producing the correct result. Furthermore, calculating the direct Bhattacharrya overlap in a known space and mapping data into the polynomial-kernel Hilbert space are both very simple operations. "bhatta.py" contains the kernel. This is the main file "basis.py" contains some helper functions "bhatta_plots.py" is a utility for vizualising kernel evaluations between datasets "mnist.py" handles loading the MNIST digit database and vectorizing it into a form the Bhattacharrya kernel can handle "bhatta_poly" contains utilities to explicitly discover the polynomial feature space, for bugtesting the kernel and confirming correctness "Matlab/" contains Matlab versions of the kernel; may not be correct atm. The multi_poly_bhatta.matlab file is useful for comparing the closed-form polynomial RKHS version of the algorithm to empirical and PCA evaluations. Here's an email I sent to Risi which is relevant: Hey Risi, I've done some more work organizing and commenting the Bhattacharrya code, as well as adding a readme. The key insight I had while working on this project was figuring out how to test the correctness of a Bhattacharrya kernel implementation. It's very straightforward to calculate the Bhattacharrya overlap between two distributes in a known, regular space, and it's straightforward to map any dataset into the reproducing Hilbert space for the polynomial kernel. So, to test a kernelized Bhattacharrya implementation, take some data, map it into the polynomial kernel rkhs, directly compute the Bhattacharrya overlap in that space, and compare to what the Bhattacharrya kernel algorithm gives for the same data with a polynomial base kernel. This testing method has another significant benefit, in that you can test individual components of Bhattacharrya algorithm. Since there is a 1-to-1 mapping between components (covariance matrices and determinants) between the Bhattacharrya evaluation in the polynomial RKHS and the Bhattacharrya algorithm with a base kernel, if there is an error in the algorithm you can isolate which component is failing. Through this method, I discovered that the math as described in the paper doesn't quite work out: the mixed determinant and covariance terms break. I think this had to do with the proposed method of constructing a shared basis using Gram-Schmidt orthogonalization failing. I don't quite remember why the issue surfaced, but I do know that I fixed it and the fix is reflected in the Python algorithm. Matlab was particularly helpful in debugging the algorithm. Take a look for example at multi_poly_bhatta.m. It generates a bunch of random datasets, and computes Phi (the polynomial RKHS) for them. It then generates the closed-form Bhattacharrya evaluation RKHS data, as well as the empirical and PCA Bhattacharrya algorithm evaluations using just the kernel matrix. So you can use it to compare the different Bhattacharrya components in the kernelized and closed form versions.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.