Comments (5)
is this still an issue? in the MATLAB version i dealt with nans as follows:
1.) remove all nans
2.) compute PCA transforms
3.) apply PCA transforms to the original (nan-included) data
4.) now the transformed data has nans in the same observations as the original data, and is the same size as the original data
another (fancier! better!) option is to use ppca (probabilistic pca): https://github.com/allentran/pca-magic
ppca is robust to nans and can even fill in missing data.
from hypertools.
not resolved yet! happy to implement either one/both. PPCA sounds neat but not confident I understand how it works entirely..
how about any([type(i,np.nan) for i in data]) use PPCA, else use PCA?
from hypertools.
with a warning message
from hypertools.
sure...that'll be faster than always using ppca. (are you thinking the warning will go with ppca-- i.e. something like "missing data: inexact solution"?)
the ppca paper is here: https://www.microsoft.com/en-us/research/publication/probabilistic-principal-component-analysis/
we can also talk about it with a whiteboard sometime and/or go through the paper together. the basic idea is:
1.) principle components (the "feature space") come from a unit Gaussian: p(z) = N(z | 0, I)
2.) the data come from weighted combinations of features, plus some noise: p(x | z) = N(x | Wz + mu, S), where S is a diagonal covariance matrix-- some constant, sigma^2 times the identify matrix
Then, given x (the data), we have to estimate z (the components) and W (the weights) using Bayes' rule. (For details, see above paper.)
from hypertools.
OK cool, thanks for the explanation. and yes, something like that for a warning
from hypertools.
Related Issues (20)
- [Documentation Request] How to Reproduce `hypertools.plot(...)` from constituent parts
- Suggestion: animating uncertainty
- passing ndims to hypertools.reduce erases all other model params
- Allow option to use DataGeometry objects à la scikit-learn pipelines
- dependency install bug for non-UTF-8 locales HOT 1
- version info in config.py doesn't match real version
- animate=True does not work in Google Colab
- extract cluster and latent variable data from the result of ht.plot(x, reduce=..., cluster=..)
- dependencies bump to allow scikit-learn 0.22 HOT 1
- Extra ticks causing improper readme formatting
- revamp setting matplotlib backend
- plotting with hypertools overwrites existing matplotlib style
- better tests
- tests for backend management
- update available plotting backends to reflect changes in matplotlib
- arm64/Apple Silicon compatibility HOT 1
- sphinx documentation not building
- (almost) no interactive output on MacOS HOT 2
- importing hypertools updates matplotlib rcParams
- Plotting animations do not seem to be compatible with Jupyter Notebook 7 HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hypertools.