GithubHelp home page GithubHelp logo

Comments (7)

pavlin-policar avatar pavlin-policar commented on August 17, 2024

I haven't tried this code myself, and it's hard to tell since you haven't posted the error message, but it seems to me you're creating a numpy array containing str objects here

data = np.array(data[1:], dtype=str)

When you subset the matrix here

X = data[:, 1:]

it will still probably have dtype str.

Perhaps just casting it to a float64 will do the trick, like this

X = data[:, 1:].astype(np.float64)

Otherwise, I can't see anything obviously wrong with this code here. If this is indeed the problem, it seems strange to me scikit-learn handles this, as I'd rather have an explicit failure on a string matrix than an implicit conversion.

from opentsne.

sbembenek18 avatar sbembenek18 commented on August 17, 2024

I agree - there's something I am missing.

I've reworked it using pandas and have tried 3 different data sets. I can only get Open TSNE to work with 1 of the 3 sets, while sklearn TSNE works with all 3 sets. I suspect I am missing something here.

To be sure, I was able to get the example sets to run with Open TSNE. I've attached the code and data sets.

tsne_pandas_version_v0.ipynb.tar.gz

DataSets.tar.gz

from opentsne.

pavlin-policar avatar pavlin-policar commented on August 17, 2024

Could you please paste the code and the error you're getting here on GH?

from opentsne.

sbembenek18 avatar sbembenek18 commented on August 17, 2024
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.manifold import TSNE 

data = pd.read_csv('./data_full_set_10_clusters_35_pca.csv')
#data = pd.read_csv('./data_full_set_10_clusters_35_pca.csv') # This set work.

from openTSNE import TSNE

tsne = TSNE(
    perplexity=24.33,
    metric="euclidean",
    n_jobs=8,
    random_state=42,
    verbose=True,
)

tsne_result = tsne.fit(data.iloc[:,1:]) # Error here

data[['x','y']] = tsne_result

target_names = data['Cluster'].unique()

#Plot
colors = 'r', 'g', 'b', 'c', 'm', 'y', 'k', 'gray', 'orange', 'purple'
fig, ax = plt.subplots(figsize=(8, 6))
for color, label in zip(colors, target_names):
    selected = data[data.Cluster.eq(label)]
    ax.scatter(x='x', y='y', data=selected, c=color, label=label)
ax.legend(bbox_to_anchor=(1, 0.5), loc='center left', frameon=False)
plt.show()

Error ==>

--------------------------------------------------------------------------------
TSNE(n_jobs=8, perplexity=24.33, random_state=42, verbose=True)
--------------------------------------------------------------------------------
===> Finding 72 nearest neighbors using Annoy approximate search using euclidean distance...
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File /data/Apps/anaconda3/envs/open-tsne/lib/python3.11/site-packages/pandas/core/indexes/base.py:3802, in Index.get_loc(self, key, method, tolerance)
   3801 try:
-> 3802     return self._engine.get_loc(casted_key)
   3803 except KeyError as err:

File /data/Apps/anaconda3/envs/open-tsne/lib/python3.11/site-packages/pandas/_libs/index.pyx:138, in pandas._libs.index.IndexEngine.get_loc()

File /data/Apps/anaconda3/envs/open-tsne/lib/python3.11/site-packages/pandas/_libs/index.pyx:165, in pandas._libs.index.IndexEngine.get_loc()

File pandas/_libs/hashtable_class_helper.pxi:5745, in pandas._libs.hashtable.PyObjectHashTable.get_item()

File pandas/_libs/hashtable_class_helper.pxi:5753, in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 0

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
Cell In[14], line 14
      4 from openTSNE import TSNE
      6 tsne = TSNE(
      7     perplexity=24.33,
      8     metric="euclidean",
   (...)
     11     verbose=True,
     12 )
---> 14 tsne_result = tsne.fit(data.iloc[:,1:])

from opentsne.

sbembenek18 avatar sbembenek18 commented on August 17, 2024

What I am finding is that for great than 999 rows, I get the error. Here's two more data set to check with, one with 999 rows (works) and another with 1000 (error).

data_1000_records_10_clusters_35_pca.csv

data_999_records_10_clusters_35_pca.csv

from opentsne.

pavlin-policar avatar pavlin-policar commented on August 17, 2024

This seems to be related to pandas. See #182.

If this is indeed the case, you can simply extract the numpy matrix from the pandas dataframe like so

tsne_result = tsne.fit(data.iloc[:,1:].values)

This is technically not a bug, since openTSNE doesn't officially support pandas dataframes. I didn't want to drag that dependency into the requirements, and the solution is simply to use .values instead.

from opentsne.

sbembenek18 avatar sbembenek18 commented on August 17, 2024

Yes, I can confirm this solves it (and agree this is not a bug in OpenTSNE). Thanks!

from opentsne.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.