Hi, I am using SparkLinearSVC. The code is as follows: <div clas

ValueError: This solver needs samples of at least 2 classes in the data about sparkit-learn HOT 4 OPEN

lensacom commented on May 27, 2024

ValueError: This solver needs samples of at least 2 classes in the data

from sparkit-learn.

Comments (4)

kszucs commented on May 27, 2024 1

Sadly this is a bug indeed. Sparkit trains sklearn's linear models in parallel, then averages them in a reduce step. There is at least one block, which contains only one of the labels. To check try the following:

train_Z[:, 'y']._rdd.map(lambda x: np.unique(x).size).filter(lambda x: x < 2).count()

To resolve You could randomize the train data to avoid blocks with one label, but this is still waiting for a clever solution.

from sparkit-learn.

mrshanth commented on May 27, 2024

Thanks

from sparkit-learn.

jaydee92 commented on May 27, 2024

I believe I found a workaround for this. Considering these problems tend to happen to highly imbalanced datasets, I would suggest using StratifiedShuffleSplit, and alter the train_size or test_size ratio as an alternative as seen below:

for trainRatio in np.arange(0.05, 1, 0.05):
    split = StratifiedShuffleSplit(n_splits=2, train_size=trainRatio)
    for trainIdx, testIdx in split.split(X, y):
        Xtrain, Xtest = X[trainIdx], X[testIdx]
        ytrain, ytest = y[trainIdx], y[testIdx]
        model = someModel()
        model.fit(Xtrain, ytrain)
        pred = model.predict(Xtest)

from sparkit-learn.

YoannCheung commented on May 27, 2024

Sadly this is a bug indeed. Sparkit trains sklearn's linear models in parallel, then averages them in a reduce step. There is at least one block, which contains only one of the labels. To check try the following:
train_Z[:, 'y']._rdd.map(lambda x: np.unique(x).size).filter(lambda x: x < 2).count()
To resolve You could randomize the train data to avoid blocks with one label, but this is still waiting for a clever solution.

Can't believe that this bug is still not fixed! Sad!

from sparkit-learn.

Recommend Projects

ValueError: This solver needs samples of at least 2 classes in the data about sparkit-learn HOT 4 OPEN

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs