GithubHelp home page GithubHelp logo

Comments (6)

vruusmann avatar vruusmann commented on August 20, 2024

What version of JPMML are you using?

First, the conversion from double to float was not available in older versions, because it loses numeric precision. However, it has been enabled since September 2015 (eg. see jpmml/jpmml-evaluator@2079ebc).

Please note that SkLearn uses 32-bit floating-point values for representing tree split conditions. Therefore, it is absolutely necessary to use the same datatype in PMML also, because otherwise some splits may be evaluated incorrectly.

Second, you're working with a classification-type problem, where the final (ie. the top-level MiningModel element) class probability distribution is calculated by applying the average aggregation function over all member (ie. nested TreeModel elements) class probability distributions. JPMML uses interface org.jpmml.evaluator.HasProbability to expose that information to interested parties.

Both of your problems can be solved by upgrading to the latest JPMML-Evaluator library version, which is 1.2.11 at the moment. Also, the upgrade should give you a considerable performance boost.

from sklearn2pmml.

koinadn avatar koinadn commented on August 20, 2024

Thank you for the quick response.

That was a good call. I was using JPMML 1.1.17. However, it seems a new error has occurred after upgrading to 1.2.11 as soon as it hits the first field name:

org.jpmml.evaluator.MissingFieldException: x3
    at org.jpmml.evaluator.ModelEvaluationContext.resolve(ModelEvaluationContext.java:150)
    at org.jpmml.evaluator.EvaluationContext.evaluate(EvaluationContext.java:64)
    at org.jpmml.evaluator.PredicateUtil.evaluateSimplePredicate(PredicateUtil.java:106)
    at org.jpmml.evaluator.PredicateUtil.evaluatePredicate(PredicateUtil.java:63)
    at org.jpmml.evaluator.PredicateUtil.evaluate(PredicateUtil.java:51)
    at org.jpmml.evaluator.TreeModelEvaluator.evaluateNode(TreeModelEvaluator.java:188)
    at org.jpmml.evaluator.TreeModelEvaluator.handleTrue(TreeModelEvaluator.java:205)
    at org.jpmml.evaluator.TreeModelEvaluator.evaluateTree(TreeModelEvaluator.java:146)
    at org.jpmml.evaluator.TreeModelEvaluator.evaluateClassification(TreeModelEvaluator.java:118)
    at org.jpmml.evaluator.TreeModelEvaluator.evaluate(TreeModelEvaluator.java:87)
    at org.jpmml.evaluator.MiningModelEvaluator.evaluateSegmentation(MiningModelEvaluator.java:349)
    at org.jpmml.evaluator.MiningModelEvaluator.evaluateClassification(MiningModelEvaluator.java:176)
    at org.jpmml.evaluator.MiningModelEvaluator.evaluate(MiningModelEvaluator.java:143)
    at org.jpmml.evaluator.MiningModelEvaluator.evaluate(MiningModelEvaluator.java:115)
    at org.jpmml.evaluator.MiningModelEvaluator.evaluate(MiningModelEvaluator.java:110)
    at com.ea.eadp.risk.service.pmml.impl.PMMLEvaluatorImpl.evaluate(PMMLEvaluatorImpl.java:111)
    at com.ea.eadp.risk.service.pmml.impl.PMMLLoadTestServiceImpl.evaluate(PMMLLoadTestServiceImpl.java:167)
    at com.ea.eadp.risk.service.pmml.impl.PMMLLoadTestServiceImpl.runLoadTest(PMMLLoadTestServiceImpl.java:99)
    at com.ea.eadp.risk.service.pmml.impl.PMMLLoadTestServiceImpl.runLoadTest(PMMLLoadTestServiceImpl.java:31)
    at com.ea.eadp.test.jpmml.Program.main(Program.java:22)

However, the PMML (as generated from the code above) does include the names as derived fields:


    <DataDictionary>
        <DataField name="Species" optype="categorical" dataType="string">
            <Value value="0"/>
            <Value value="1"/>
            <Value value="2"/>
        </DataField>
        <DataField name="Sepal.Length" optype="continuous" dataType="double"/>
        <DataField name="Sepal.Width" optype="continuous" dataType="double"/>
        <DataField name="Petal.Length" optype="continuous" dataType="double"/>
        <DataField name="Petal.Width" optype="continuous" dataType="double"/>
    </DataDictionary>
    <MiningModel functionName="classification">
        <MiningSchema>
            <MiningField name="Species" usageType="target"/>
            <MiningField name="Sepal.Length"/>
            <MiningField name="Sepal.Width"/>
            <MiningField name="Petal.Length"/>
            <MiningField name="Petal.Width"/>
        </MiningSchema>
        <Output>
            <OutputField name="probability_0" feature="probability" value="0"/>
            <OutputField name="probability_1" feature="probability" value="1"/>
            <OutputField name="probability_2" feature="probability" value="2"/>
        </Output>
        <LocalTransformations>
            <DerivedField name="x1" optype="continuous" dataType="float">
                <FieldRef field="Sepal.Length"/>
            </DerivedField>
            <DerivedField name="x2" optype="continuous" dataType="float">
                <FieldRef field="Sepal.Width"/>
            </DerivedField>
            <DerivedField name="x3" optype="continuous" dataType="float">
                <FieldRef field="Petal.Length"/>
            </DerivedField>
            <DerivedField name="x4" optype="continuous" dataType="float">
                <FieldRef field="Petal.Width"/>
            </DerivedField>
        </LocalTransformations>
        <Segmentation multipleModelMethod="average">
            <Segment id="1">
                <True/>
                <TreeModel functionName="classification" splitCharacteristic="binarySplit">
                    <MiningSchema>
                        <MiningField name="Sepal.Width"/>
                        <MiningField name="Petal.Length"/>
                        <MiningField name="Petal.Width"/>
                    </MiningSchema>
                    <Node id="1">
                        <True/>
                        <Node id="2" score="0" recordCount="54.0">
                            <SimplePredicate field="x3" operator="lessOrEqual" value="2.5999999046325684"/>

Any idea on the cause of this?

Thanks!

from sklearn2pmml.

vruusmann avatar vruusmann commented on August 20, 2024

This is a legitimate bug now. The derived field x3 depends on the user-supplied field Petal.Length, but the latter is not "imported" by the MiningSchema element of the TreeModel element.

Probably, this happens because your DataFrameMapper object does not specify any transformations for input fields.

from sklearn2pmml.

koinadn avatar koinadn commented on August 20, 2024

Thanks again for the quick repsonse.

I tested the same code with the original PCA transformation in the example:

iris_mapper = sklearn_pandas.DataFrameMapper([
    (["Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"], PCA(n_components = 3)),
    ("Species", None)
])

And the PMML was consumed and evaluated properly so that does seem to be the cause of the error.

I do believe there are some cases where no transformations would be used so that would be nice to have.

Thank you.

from sklearn2pmml.

vruusmann avatar vruusmann commented on August 20, 2024

The conversion produces an invalid PMML document, because your DataFrameMapper object contains a duplicate mapping for the Petal.Width field (and no mapping for the Petal.Length field). If this typo is corrected, then the mapping to None transform works as intended.

I've updated the JPMML-SkLearn library to do extra sanity checking along those lines: jpmml/jpmml-sklearn@7d0578a

from sklearn2pmml.

tbayrak avatar tbayrak commented on August 20, 2024

Hi,

I've tried the same code above to create pmml file but got the following error;
TypeError: The pipeline object is not an instance of PMMLPipeline

any suggestions? Thanks

from sklearn2pmml.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.