GithubHelp home page GithubHelp logo

Comments (9)

vruusmann avatar vruusmann commented on August 20, 2024

You have two issues combined in this report.

First, about the predicted probability values being all 0. This is an OutputField mis-placement issue - recent versions of SkLearn2PMML package try to "optimize" the layout of PMML documents by moving "common" OutputField elements from the child model level (eg. /PMML/MiningModel/Segmentation/RegressionModel) to the parent model level (eg. /PMML/MiningModel).

The JPMML-lEvaluator library gets confused about this, and thinks that it is dealing with a non-probabilistic model, and doesn't need to provide predicted probability values at all.

Second, the precision of predicted probability values changing (eg. from 12 decimal places to 7 decimal places) when switching between language environments (direct Python, Scikit-Learn over Python, PMML).

This appears like a Python runtime configuration issue - something related to "console display preferences". The PMML engine is definitely performing any explicit rounding, it always return float32 values for single-precision models (ie. Model@mathContext="float") and float64 values for double-precision models (ie. Model@mathContext="double").

from sklearn2pmml.

vruusmann avatar vruusmann commented on August 20, 2024

Recent versions of SkLearn2PMML package try to "optimize" the layout of PMML documents by moving "common" OutputField elements from the child model level to the parent model level.

As a workaround, I'd suggest that you export the model ensemble as sklearn2pmml.ensemble.EstimatorChain (instead of a sklearn2pmml.ensemble.SelectFirstClassifier).

Keep a parallel SelectFirstEstimator object when you need to perform predict_proba(X) in Python environment.

Something like this:

# Shared steps - all SkLearn2PMML ensemble model types expect the same step layout
steps = [...]

# Fit and export
pmml_estimator = EstimatorChain(steps, multioutput = False)
pmml_estimator.fit(X, y)

sklearn2pmml(pmml_estimator)

# Predict probabilities
py_estimator = SelectFirstClassifier(steps)
# The steps have already been fitted, so the object is ready for predict_proba(X) as-is
py_estimator.predict_proba(X)

from sklearn2pmml.

vruusmann avatar vruusmann commented on August 20, 2024

The ultimate fix would be to add EstimatorChain.predict_proba(X) method for non-multioutput cases, where all child estimators are subclasses of ClassifierMixin.

from sklearn2pmml.

liuhuanshuo avatar liuhuanshuo commented on August 20, 2024

Keep a parallel SelectFirstEstimator object when you need to perform predict_proba(X) in Python environment.

In fact, the reason I chose sklearn2pmml for conversion is to call pmml in Java for prediction.

Therefore, it is not a very reasonable method to keep a prediction scheme in the Python environment.

However, there doesn't seem to be any other way I can save the SelectFirstClassifier with the predicted probability values?

from sklearn2pmml.

vruusmann avatar vruusmann commented on August 20, 2024

I'm thinking about refactoring SkLearn2PMML ensemble models in the following way:

  1. Add EstimatorChain.predict_proba(X) method, which will work only multioutput = False mode, when all child models are classifiers.
  2. Make SelectFirstClassifier and SelectFirstRegressor subclasses of EstimatorChain(multioutput = False).

Right now you could fix the SelectFirstClassifier PMML file by relocating OutputField elements from the top-level back to member model-level. Takes less than a minute to do in text editor. But it's very difficult to give you the instructions for doing so via GitHub.

from sklearn2pmml.

liuhuanshuo avatar liuhuanshuo commented on August 20, 2024

hi,villu

Nice to see you pushed a new version

But it seems that this problem is not implemented with the new version? After I save the pipeline containing SelectFirstClassifier as a pmml file, the predicted values are still all 0

Right now you could fix the SelectFirstClassifier PMML file by relocating OutputField elements from the top-level back to member model-level. Takes less than a minute to do in text editor.

So I need to modify the generated pmml now?

But it's very difficult to give you the instructions for doing so via GitHub.

I think I understand what you mean, do I need to move the OutputField below to another location?

I don't know if you can give a simple explanation, since your operation only takes 1 minute, I don't think it will be very complicated

from sklearn2pmml.

vruusmann avatar vruusmann commented on August 20, 2024

After I save the pipeline containing SelectFirstClassifier as a pmml file, the predicted values are still all 0

The fix was addressed towards the sklearn2pmml.ensemble.EstimatorChain model type, which now provides the predict_proba(X) method. Please consider migrating from SelectFirstClassifier(steps) to EstimatorChain(steps, multioutput = False).

The latest SkLearn2PMML 0.91.0 release was about completely refactoring Python-to-PMML translation functionality (affecting ExpressionTransformer, the step predicates of SelectFirstEstimator, EstimatorChain and RuleSetClassifier).

The new translator has some crazy new capabilities (which I will blog about in short time). Also, the Python side evaluation should be 10x faster, because the expression/predicate is "precompiled" once, and then reused across all rows.

@liuhuanshuo The pandas.isnull(X) nullability check is now also supported in the predicate context. There shouldn't be any module import issues anymore.

from sklearn2pmml.

liuhuanshuo avatar liuhuanshuo commented on August 20, 2024

I observed many normal working pmml files myself, observed the position of OutputField, and tried some modifications.

I found that just moving the OutputField in front of the LocalTransformations seems to work

But this can only ensure that the predicted values are not all 0, and there are still problems in the prediction of rows with null values.

from sklearn2pmml.

liuhuanshuo avatar liuhuanshuo commented on August 20, 2024

Surprised we both replied at the same time, I will research yours first

from sklearn2pmml.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.