Comments (6)
Almost all the problems I've encountered while maintaining openTSNE have been pynndescent
related. As such, I would be very hesitant to add anything that ties openTSNE even closer to pynndescent
.
However, it should be fairly simple to achieve what you want. I've made it so that the neighbors
parameter can be an instance of openTSNE.nearest_neighbors.KNNIndex
, so you can just copy over the NNDescent
class and remove the check that disallows custom metrics. You can probably just change check_metric
to return True
and I'd guess that would do it.
from opentsne.
Sorry, I wasn't clear. I'm not suggesting you tie openTSNE
closer to pynndescent
. Passing a callable metric is a standard feature for many packages. scikit-learn
also allows it for many of their algorithms, including their implementation of TSNE
. Adding a check if metric
is callable is fairly straightforward. If you're short of time/motivation then I can make a PR once I have time to look through the code. You're of course free to reject it, but I think this is a reasonable feature to have for a t-SNE implementation and would be potentially useful to quite a few people.
I'm trying to apply openTSNE
to use these methods for analyzing animal behavior:
https://royalsocietypublishing.org/doi/10.1098/rsif.2014.0672
as an example analysis for my recently developed package:
https://github.com/jgraving/behavelet
Applying t-SNE to these data requires using KL-divergence as a distance metric between normalized time-frequency vectors. If accomplishing that requires that users subclass and modify your code to get it working then I will likely just find something else, as I'd like it to be straightforward for people to use.
Again, I appreciate the work you've done on this. I'm not trying to be pushy. Just wanted to let you know that openTSNE
would be more widely useful if it had this feature.
from opentsne.
Hmm, I'm all for custom distance metrics, but the "numba-compiled callable" threw me off a little bit. I'm not very familiar with numba so is it possible to call numba compiled from regular old python code? What I'm getting at is that opentsne supports both exact neighbor search via scikit-learn and approximate neighbor search via pynndescent, and I'd like to keep some consistency between the two. My concern was that if the function is numba-compiled, that it wouldn't be possible to use that callable in the scikit-learn BallTree
.
Putting all numba-related things aside, adding custom distance metrics was definitely on my to-do list, but I've been and will likely be short on time for a while, so if you're willing to open a PR, I'd be more than happy to take a look. The changes should be fairly minimal anyhow.
from opentsne.
Ah, I see. Sorry I assumed you were familiar with numba as it's one of the main dependencies for pynndescent. numba is a library that JIT-compiles pure python/numpy functions to make them faster, so a numba compiled function works just like any other python function in practice. scikit-learn's BallTree
accepts user-defined functions, which includes numba compiled functions. Here is an example:
from sklearn.neighbors import BallTree
from numba import njit
import numpy as np
@njit(fastmath=True)
def l1(x, y):
return np.sum(np.abs(x - y))
tree = BallTree(np.random.normal(size=(1000,10)), metric=l1)
distances, indices = tree.query(np.random.normal(size=(100,10)), k=5)
The only issue I can see is if the function isn't numba compiled and it's passed to pynndescent then pynndescent will throw an uninformative error message from numba. Whether or how to deal with this might require some thought, or maybe just make clear in the docstring and leave this for pynndescent to deal with. I assume if the user is savvy enough to pass a custom metric then they'd expect error messages are a possibility.
I'll take a look at the code and submit a PR with the changes.
from opentsne.
The only issue I can see is if the function isn't numba compiled and it's passed to pynndescent then pynndescent will throw an uninformative error message from numba.
I would hope that numba has some kind of way to check if a function is compiled. Then if it isn't, we could do that within the pynndescent
wrapper to avoid potential errors. A decorator is just a function, after all.
After a bit of searching, I found it should be fairly straightforward to check if a function is an instance of the Dispatcher
type and then if it isn't, to just call numba.njit()(f)
otherwise.
from opentsne.
Fixed via #94.
from opentsne.
Related Issues (20)
- `latest` version of ReadTheDocs not rendering Python code HOT 4
- Switching spectral initialization to sklean.manifold.SpectralEmbeddings HOT 14
- Adding tiny amount of noise to PCA/spectral init to prevent points from overlapping
- Tutorials do not show ipynb code HOT 2
- Bug: running optimize() multiple times produces different result compared to running it once HOT 3
- Failed to install from source HOT 3
- Extend openTSNE to specific purposes HOT 4
- Barnes-Hut optimization with the default learning rate collapses on small datasets HOT 4
- Tests fail: ImportError: attempted relative import with no known parent package HOT 7
- Negative reported KL divergence for dof>1 HOT 4
- Unable to use custom callable metric HOT 2
- process crashes when /tmp gets full HOT 2
- Question about SGD method used HOT 2
- [Windows] save TSNEEmbedding to binary, Directory error HOT 5
- Test failure on i386 HOT 9
- Cannot install on Mac M1 HOT 1
- `utils` import error in example notebooks HOT 1
- Problem with data from CSV file HOT 7
- Question on initialization HOT 4
- import errer HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from opentsne.