Comments (2)
Decision function method is not yet implemented. BTW it's pretty straightforward:
class LinearClassifierMixin(ClassifierMixin):
"""Mixin for linear classifiers.
Handles prediction for sparse and dense X.
"""
def decision_function(self, X):
"""Predict confidence scores for samples.
The confidence score for a sample is the signed distance of that
sample to the hyperplane.
Parameters
----------
X : {array-like, sparse matrix}, shape = (n_samples, n_features)
Samples.
Returns
-------
array, shape=(n_samples,) if n_classes == 2 else (n_samples, n_classes)
Confidence scores per (sample, class) combination. In the binary
case, confidence score for self.classes_[1] where >0 means this
class would be predicted.
"""
if not hasattr(self, 'coef_') or self.coef_ is None:
raise NotFittedError("This %(name)s instance is not fitted"
"yet" % {'name': type(self).__name__})
X = check_array(X, accept_sparse='csr')
n_features = self.coef_.shape[1]
if X.shape[1] != n_features:
raise ValueError("X has %d features per sample; expecting %d"
% (X.shape[1], n_features))
scores = safe_sparse_dot(X, self.coef_.T,
dense_output=True) + self.intercept_
return scores.ravel() if scores.shape[1] == 1 else scores
We need to create a spark version of LinearClassifierMixin, simply map the sklearn's decision_function method on the RDD, something like this:
class SparkLinearClassifierMixin(LinearClassifierMixin, SparkBroadcasterMixin):
"""Mixin for linear classifiers.
Handles prediction for sparse and dense X.
"""
__transient__ = ['coef_', 'intercept_'] #broadcastable variables, possibly larger arrays
def decision_function(self, X):
check_rdd(X, (sp.spmatrix, np.ndarray))
mapper = self.broadcast(
super(LinearClassifierMixin, self).decision_function, X.context)
return X.map(mapper)
Finally extend SparkLinearSVC to support the functionality above:
class SparkLinearSVC(LinearSVC, SparkLinearClassifierMixin, SparkLinearModelMixin):
We plan to implement it in the next few weeks, but as always, contribution is appreciated :)
from sparkit-learn.
@mrshanth I saw You've implemented the decision function support. Would You make a pull request please? :)
from sparkit-learn.
Related Issues (20)
- Scala support? HOT 1
- Linear models fail with AttributeError: 'int' object has no attribute 'coef_' HOT 1
- [RFC] Scikit interface for the `ml` and `mllib` packages
- DBSCAN Import Error HOT 6
- Integrate skflow
- ImportError: pyspark home needs to be added to PYTHONPATH HOT 1
- Py4JJavaError while fit_transform(X_rdd) HOT 1
- Py4JJavaError while fitting a splearn.rdd.DictRDD?
- [RFC] Plan Next Release HOT 1
- How can i use RandomForestClassifier with sparkit-learn library HOT 7
- For executing SparkRandomForestClassifier how should I create a BlockRDD HOT 5
- ImportError: No module named splearn.rdd , but no errors in import splearn HOT 1
- ImportError: No module named _common HOT 2
- Poor performances HOT 3
- What is the roadmap for this project: is it moribund? HOT 1
- Import error cannot import name "frombuffer_empty" HOT 2
- [Question] ArrayRDD to Pyspark Dataframe? HOT 1
- ImportError: cannot import name _check_numpy_unicode_bug
- Examples missing
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sparkit-learn.