Hello, I'm having a few issues in testing a random forest classifier

The conversion produces an invalid PMML document, because your <code class="notranslat

Random Forest Conversions and Consumption about sklearn2pmml HOT 6 CLOSED

jpmml commented on August 20, 2024

Random Forest Conversions and Consumption

from sklearn2pmml.

Comments (6)

vruusmann commented on August 20, 2024

What version of JPMML are you using?

First, the conversion from double to float was not available in older versions, because it loses numeric precision. However, it has been enabled since September 2015 (eg. see jpmml/jpmml-evaluator@2079ebc).

Please note that SkLearn uses 32-bit floating-point values for representing tree split conditions. Therefore, it is absolutely necessary to use the same datatype in PMML also, because otherwise some splits may be evaluated incorrectly.

Second, you're working with a classification-type problem, where the final (ie. the top-level MiningModel element) class probability distribution is calculated by applying the average aggregation function over all member (ie. nested TreeModel elements) class probability distributions. JPMML uses interface org.jpmml.evaluator.HasProbability to expose that information to interested parties.

Both of your problems can be solved by upgrading to the latest JPMML-Evaluator library version, which is 1.2.11 at the moment. Also, the upgrade should give you a considerable performance boost.

from sklearn2pmml.

koinadn commented on August 20, 2024

Thank you for the quick response.

That was a good call. I was using JPMML 1.1.17. However, it seems a new error has occurred after upgrading to 1.2.11 as soon as it hits the first field name:

org.jpmml.evaluator.MissingFieldException: x3
    at org.jpmml.evaluator.ModelEvaluationContext.resolve(ModelEvaluationContext.java:150)
    at org.jpmml.evaluator.EvaluationContext.evaluate(EvaluationContext.java:64)
    at org.jpmml.evaluator.PredicateUtil.evaluateSimplePredicate(PredicateUtil.java:106)
    at org.jpmml.evaluator.PredicateUtil.evaluatePredicate(PredicateUtil.java:63)
    at org.jpmml.evaluator.PredicateUtil.evaluate(PredicateUtil.java:51)
    at org.jpmml.evaluator.TreeModelEvaluator.evaluateNode(TreeModelEvaluator.java:188)
    at org.jpmml.evaluator.TreeModelEvaluator.handleTrue(TreeModelEvaluator.java:205)
    at org.jpmml.evaluator.TreeModelEvaluator.evaluateTree(TreeModelEvaluator.java:146)
    at org.jpmml.evaluator.TreeModelEvaluator.evaluateClassification(TreeModelEvaluator.java:118)
    at org.jpmml.evaluator.TreeModelEvaluator.evaluate(TreeModelEvaluator.java:87)
    at org.jpmml.evaluator.MiningModelEvaluator.evaluateSegmentation(MiningModelEvaluator.java:349)
    at org.jpmml.evaluator.MiningModelEvaluator.evaluateClassification(MiningModelEvaluator.java:176)
    at org.jpmml.evaluator.MiningModelEvaluator.evaluate(MiningModelEvaluator.java:143)
    at org.jpmml.evaluator.MiningModelEvaluator.evaluate(MiningModelEvaluator.java:115)
    at org.jpmml.evaluator.MiningModelEvaluator.evaluate(MiningModelEvaluator.java:110)
    at com.ea.eadp.risk.service.pmml.impl.PMMLEvaluatorImpl.evaluate(PMMLEvaluatorImpl.java:111)
    at com.ea.eadp.risk.service.pmml.impl.PMMLLoadTestServiceImpl.evaluate(PMMLLoadTestServiceImpl.java:167)
    at com.ea.eadp.risk.service.pmml.impl.PMMLLoadTestServiceImpl.runLoadTest(PMMLLoadTestServiceImpl.java:99)
    at com.ea.eadp.risk.service.pmml.impl.PMMLLoadTestServiceImpl.runLoadTest(PMMLLoadTestServiceImpl.java:31)
    at com.ea.eadp.test.jpmml.Program.main(Program.java:22)

However, the PMML (as generated from the code above) does include the names as derived fields:


    <DataDictionary>
        <DataField name="Species" optype="categorical" dataType="string">
            <Value value="0"/>
            <Value value="1"/>
            <Value value="2"/>
        </DataField>
        <DataField name="Sepal.Length" optype="continuous" dataType="double"/>
        <DataField name="Sepal.Width" optype="continuous" dataType="double"/>
        <DataField name="Petal.Length" optype="continuous" dataType="double"/>
        <DataField name="Petal.Width" optype="continuous" dataType="double"/>
    </DataDictionary>
    <MiningModel functionName="classification">
        <MiningSchema>
            <MiningField name="Species" usageType="target"/>
            <MiningField name="Sepal.Length"/>
            <MiningField name="Sepal.Width"/>
            <MiningField name="Petal.Length"/>
            <MiningField name="Petal.Width"/>
        </MiningSchema>
        <Output>
            <OutputField name="probability_0" feature="probability" value="0"/>
            <OutputField name="probability_1" feature="probability" value="1"/>
            <OutputField name="probability_2" feature="probability" value="2"/>
        </Output>
        <LocalTransformations>
            <DerivedField name="x1" optype="continuous" dataType="float">
                <FieldRef field="Sepal.Length"/>
            </DerivedField>
            <DerivedField name="x2" optype="continuous" dataType="float">
                <FieldRef field="Sepal.Width"/>
            </DerivedField>
            <DerivedField name="x3" optype="continuous" dataType="float">
                <FieldRef field="Petal.Length"/>
            </DerivedField>
            <DerivedField name="x4" optype="continuous" dataType="float">
                <FieldRef field="Petal.Width"/>
            </DerivedField>
        </LocalTransformations>
        <Segmentation multipleModelMethod="average">
            <Segment id="1">
                <True/>
                <TreeModel functionName="classification" splitCharacteristic="binarySplit">
                    <MiningSchema>
                        <MiningField name="Sepal.Width"/>
                        <MiningField name="Petal.Length"/>
                        <MiningField name="Petal.Width"/>
                    </MiningSchema>
                    <Node id="1">
                        <True/>
                        <Node id="2" score="0" recordCount="54.0">
                            <SimplePredicate field="x3" operator="lessOrEqual" value="2.5999999046325684"/>

Any idea on the cause of this?

Thanks!

from sklearn2pmml.

vruusmann commented on August 20, 2024

This is a legitimate bug now. The derived field x3 depends on the user-supplied field Petal.Length, but the latter is not "imported" by the MiningSchema element of the TreeModel element.

Probably, this happens because your DataFrameMapper object does not specify any transformations for input fields.

from sklearn2pmml.

koinadn commented on August 20, 2024

Thanks again for the quick repsonse.

I tested the same code with the original PCA transformation in the example:

iris_mapper = sklearn_pandas.DataFrameMapper([
    (["Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"], PCA(n_components = 3)),
    ("Species", None)
])

And the PMML was consumed and evaluated properly so that does seem to be the cause of the error.

I do believe there are some cases where no transformations would be used so that would be nice to have.

Thank you.

from sklearn2pmml.

vruusmann commented on August 20, 2024

The conversion produces an invalid PMML document, because your DataFrameMapper object contains a duplicate mapping for the Petal.Width field (and no mapping for the Petal.Length field). If this typo is corrected, then the mapping to None transform works as intended.

I've updated the JPMML-SkLearn library to do extra sanity checking along those lines: jpmml/jpmml-sklearn@7d0578a

from sklearn2pmml.

tbayrak commented on August 20, 2024

Hi,

I've tried the same code above to create pmml file but got the following error;
TypeError: The pipeline object is not an instance of PMMLPipeline

any suggestions? Thanks

from sklearn2pmml.

Random Forest Conversions and Consumption about sklearn2pmml HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs