GithubHelp home page GithubHelp logo

Comments (12)

vruusmann avatar vruusmann commented on August 20, 2024 2

The fix is expected to ship as SkLearn2PMML 0.98.1.

I have now prepared and released SkLearn2PMML package version 0.98.1 to PyPI:
https://pypi.org/project/sklearn2pmml/0.98.1/

Tested CalibratedClassifierCV with XGBClassifier, LGBMClassifier and GradientBoostingClassifier - both in ensemble = True (default) and ensemble = False modes - all clean!

from sklearn2pmml.

vruusmann avatar vruusmann commented on August 20, 2024 1

Also, I can see that you're using a Python code template, where you train a plain sklearn.pipeline.Pipeline object, and then wrap it into a sklearn2pmml.pipeline.PMMLPipeline object using the make_pmml_pipeline() utility function.

This wrapping is no longer needed in latest SkLearn2PMML package versions.

You can pass the original sklearn.pipeline.Pipeline object to the sklearn2pmml() utility function, and it will work. Only use the make_pmml_pipeline() utility function if you want to add some extra information to the PMMLPipeline object afterwards - such as customized feature/label names, model transformations (eg. compact vs non-compact decision tree structures), embedded verfification data, etc.

That is, you can do this now:

calibrated_xgb_pipeline = Pipeline(...)
calibrated_xgb_pipeline.fit(X_train, y_train)

sklearn2pmml(calibrated_xgb_pipeline, "./calibrated_xgb.pmml")

from sklearn2pmml.

vruusmann avatar vruusmann commented on August 20, 2024 1

In the meantime, you can conduct local experiments as follows:

  1. Dump your pipeline in Pickle data format using joblib.dump(calibrated_xgb_pipeline, "pipeline.pkl")
  2. Check out the HEAD revision of the JPMML-SkLearn repository, and build it into a command-line executable as explained in the README file (https://github.com/jpmml/jpmml-sklearn#installation).
  3. Use the newly built command-line app to perform the conversion: java -jar pmml-sklearn-example-executable-1.7-SNAPSHOT.jar --pkl-input pipeline.pkl --pmml-output pipeline.pmml

from sklearn2pmml.

vruusmann avatar vruusmann commented on August 20, 2024

I've updated the title to reflect my preliminary working hypothesis.

XGBoost is a third-party library (ie. not part of the Scikit-Learn library itself), and perhaps its integration is a bit incomplete with CalibratedClassifierCV. Yes, the latter is a rather complex meta-estimator with rather complex/differing business logic inside.

I'll take a look at it over the weekend. In the meantime, can you confirm/deny my working hypothesis by replacing XGBClassifier with some native Scikit-Learn estimator such as GradientBoostingClassifier (must be some non-linear model, because linear models use a different branch in CalibratedClassifierCV business logic). I believe the conversion succeeds then?

from sklearn2pmml.

puifais avatar puifais commented on August 20, 2024

Thank you for such a quick response.

I just tried CalibratedClassifierCV with GradientBoostingClassifier. Unfortunately, I got the same error

Standard output is empty
Standard error:
Sep 01, 2023 4:52:47 PM sklearn2pmml.pipeline.PMMLPipeline initTargetFields
WARNING: The estimator object of the final step (Python class sklearn.pipeline.Pipeline) does not specify the number of outputs. Assuming a single output
Sep 01, 2023 4:52:47 PM sklearn2pmml.pipeline.PMMLPipeline initTargetFields
WARNING: Attribute 'sklearn2pmml.PMMLPipeline.target_fields' is not set. Assuming [y] as the name of target fields
Exception in thread "main" java.lang.IllegalArgumentException
	at sklearn.calibration.CalibratedClassifier.encodeModel(CalibratedClassifier.java:159)
	at sklearn.Estimator.encode(Estimator.java:137)
	at sklearn.Estimator.encode(Estimator.java:166)
	at sklearn.calibration.CalibratedClassifierCV.encodeModel(CalibratedClassifierCV.java:61)
	at sklearn.Estimator.encode(Estimator.java:137)
	at sklearn.Composite.encodeModel(Composite.java:166)
	at sklearn.pipeline.PipelineClassifier.encodeModel(PipelineClassifier.java:110)
	at sklearn.Estimator.encode(Estimator.java:137)
	at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:186)
	at com.sklearn2pmml.Main.run(Main.java:80)
	at com.sklearn2pmml.Main.main(Main.java:65)

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[406], line 1
----> 1 sklearn2pmml(calibrated_gbm_pipeline, './test.pmml')

File ~/anaconda3/envs/python3/lib/python3.10/site-packages/sklearn2pmml/__init__.py:309, in sklearn2pmml(estimator, pmml_path, with_repr, java_home, java_opts, user_classpath, dump_flavour, debug)
    307 			print("Standard error is empty")
    308 	if retcode:
--> 309 		raise RuntimeError("The SkLearn2PMML application has failed. The Java executable should have printed more information about the failure into its standard output and/or standard error streams")
    310 finally:
    311 	if debug:

RuntimeError: The SkLearn2PMML application has failed. The Java executable should have printed more information about the failure into its standard output and/or standard error streams

Interestingly, I tried this with RandomForestClassifier and LogisticRegression and they both worked!

from sklearn2pmml.

vruusmann avatar vruusmann commented on August 20, 2024

This problem affects boosting-style ensemble models such as XGBClassifier and GradientBoostingClassifier. It does not affect bagging-style ensemble models such as RandomForestClassifier, and elementary models such as DecisionTreeClassifier.

Here's my test script:

from sklearn.calibration import CalibratedClassifierCV
from sklearn.datasets import load_iris
from sklearn.ensemble import GradientBoostingClassifier, RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier

X, y = load_iris(return_X_y = True)
# Convert from multiclass to binary
y = (y == 1)

#classifier = DecisionTreeClassifier()
classifier = RandomForestClassifier()
#classifier = GradientBoostingClassifier()

calib_classifier = CalibratedClassifierCV(classifier)
calib_classifier.fit(X, y)
print(calib_classifier)

from sklearn2pmml import sklearn2pmml

sklearn2pmml(calib_classifier, "CalibratedClassifierCV.pmml")

And note - the sklearn2pmml() utility function now also accepts the CalibratedClassifierCV object as-is, there is no need to wrap it into sklearn.pipeline.Pipeline object at all!

from sklearn2pmml.

puifais avatar puifais commented on August 20, 2024

I see. So how do you suggest I create a calibrated XGBoost PMML object in this case?

from sklearn2pmml.

vruusmann avatar vruusmann commented on August 20, 2024

So how do you suggest I create a calibrated XGBoost PMML object in this case?

I suggest you to wait for the next SkLearn2PMML package release, which should support your original code example as-is.

I've got something working locally already (XGBClassifier and LGBMClassifier). Adding support for GradientBoostingClassifier is much more difficult, because it follows a different code path (the calibrator component works with decision_function(X) results rather than predict_proba(X) results).

You'll be notified automatically once I've pushed my code to GitHub.

from sklearn2pmml.

puifais avatar puifais commented on August 20, 2024

Understood. Thank you for your help.

from sklearn2pmml.

vruusmann avatar vruusmann commented on August 20, 2024

I've published some code changes, which implement support for boosting-style ensemble models.

There are some more things that I'd like to polish and verify before packaging this code as a new SkLearn2PMML package version. Should be all done in one week or so.

from sklearn2pmml.

puifais avatar puifais commented on August 20, 2024

I did update the package by doing pip install sklearn2pmml --upgrade and successfully installed sklearn2pmml-0.98.0. I tried sklearn2pmml(calibrated_xgb_pmml_pipeline, './calibrated_xgb.pmml') again but unfortunately still got the same error.

from sklearn2pmml.

vruusmann avatar vruusmann commented on August 20, 2024

I installed sklearn2pmml-0.98.0 .. but unfortunately still got the same error.

The relevant code changes have been pushed to the JPMML-SkLearn repository (see jpmml/jpmml-sklearn@6d83051 and jpmml/jpmml-sklearn@afde496).

However, these changes haven not yet been packaged as new JPMML-SkLearn and SkLearn2PMML releases.

The fix is expected to ship as SkLearn2PMML 0.98.1.

from sklearn2pmml.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.