GithubHelp home page GithubHelp logo

Comments (7)

vruusmann avatar vruusmann commented on August 20, 2024

First, you need to introduce an appropriate converter class into the JPMML-SkLearn project:

  1. Create a subclass of sklearn.Transformer, and implement the conversion "business logic" in the Transformer#encodeFeatures(List, List, FeatureMapper) method.
  2. Register this class with org.jpmml.sklearn.PickleUtil.

There's not much documentation about it. Here's an example about implementing a converter class for Scikit-Learn's FunctionTransformer transformation type:
jpmml/jpmml-sklearn@5c4a181

Then, build your modified JPMML-SkLearn project with Apache Maven, and drop the resulting JAR file into the sklearn2pmml /resources/ directory. Currently, you would be replacing jpmml-sklearn-1.1.4.jar with your own jpmml-sklearn-1.1-SNAPSHOT.jar.

from sklearn2pmml.

vruusmann avatar vruusmann commented on August 20, 2024

Edited the title of this issue to reflect the real user need.

Actually, one shouldn't be making custom converter classes part of the JPMML-SkLearn library. They could be isolated into a standalone (mini-)project, which depends on the JPMML-SkLearn library (and other Java libraries).

If this (mini-)project is built, then it should produce a JAR file, which is suitable for dropping into sklearn2pmml /resources/ directory. At the moment, the problem is that there is no way of informing org.jpmml.sklearn.PickleUtil about those newly dropped-in converter classes - the list of supported converters is hard-coded.

A solution would be to introduce some sort of "sklearn2pmml extension module metadata" mechanism. For example, the JAR file could contain a properties file META-INF/sklearn2pmml.properties, which lists the names of new converter classes.

from sklearn2pmml.

geoHeil avatar geoHeil commented on August 20, 2024

Is there a simpler solution e.g. similar to http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.FunctionTransformer.html where an arbitrary function can be registered?

from sklearn2pmml.

vruusmann avatar vruusmann commented on August 20, 2024

Class sklearn.preprocessing.FunctionTransformer supports a limited number of 1-parameter Numpy ufuncs: https://github.com/jpmml/jpmml-sklearn/blob/master/src/main/java/sklearn/preprocessing/FunctionTransformer.java#L79

As you can see, in order to support an ufunc, you still need to write conversion "business logic" in Java, and (re-)build a modified JPMML-SkLearn library.

from sklearn2pmml.

vruusmann avatar vruusmann commented on August 20, 2024

Could you perhaps share your Python Transformer class? Maybe I can suggest a simple and easy way of automatically translating it to PMML then.

For example, the JPMML-R library now includes a general-purpose R-to-PMML expression conversion functionality:

iris.rf = randomForest(Species ~ . + I(log(Sepal.Length / Sepal.Width) + 1), data = iris)

Should be possible to build a similar Python-to-PMML expression converter. Of course, the trouble is that you cannot refer fields by name in Scikit-Learn; have to use field references something like $1, $2, .., $n instead.

from sklearn2pmml.

geoHeil avatar geoHeil commented on August 20, 2024

Unfortunately sharing the code will not be possible. But I can explain you the actions which are performed.

  • filtering some customer groups (no state involved)
  • handling NaN values (state required for imputation)
  • generating some features (no state involved)

The whole pipeline is a bit similar to http://zacstewart.com/2014/08/05/pipelines-of-featureunions-of-pipelines.html where the cleaning of the data is encapsulated in its own transformer.

from sklearn2pmml.

vruusmann avatar vruusmann commented on August 20, 2024

Starting from the JPMML-SkLearn library version 1.2.0, the PickleUtil utility class will scan all the JAR files in the application classpath for the META-INF/sklearn2pmml.properties file, and if found, will automatically register all the listed converter classes with the runtime. For example, here is the list of built-in converter classes: https://github.com/jpmml/jpmml-sklearn/blob/master/src/main/resources/META-INF/sklearn2pmml.properties

You can implement your own Transformer, Selector and Estimator converter classes as appropriate. When done, create a corresponding META-INF/sklearn2pmml.properties file, and package everything as a regular JAR file.

During conversion, you can add a list of JAR files to the application classpath using the newly introduced user_classpath argument:

sklearn2pmml(estimator, mapper, "estimator_mapper.pmml", user_classpath = ["/path/to/extensions.jar"])

from sklearn2pmml.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.