Hi, Not sure if this is intended behavior, but the following initialization fails

LookupTransformer TypeError when default value is not exactly the type from the mapping about sklearn2pmml HOT 2 CLOSED

danmar3 commented on August 20, 2024

LookupTransformer TypeError when default value is not exactly the type from the mapping

from sklearn2pmml.

Comments (2)

vruusmann commented on August 20, 2024

This type check is intended to block situations, where the mapped values and the default values are of completely different data type. For example, the former are floats, and then the latter is an int or str.

Mis-matching data types can cause some hard-to-debug issues, because they render differently when printed to text documents. Very rare (affects only a small number of values), but when it does happen, it's very tricky to debug.

Howver, in the this issue, I see that you're suggesting that the type check should be relaxed, so that in addition to strict type equality check, also type-is-a-subtype-of checks should be enabled. The case in point is that numpy.float64 should be considered a subclass of builtins.float?

import numpy

print(type(1.0)) # <class 'float'>
print(type(numpy.float64(1.0))) # <class 'numpy.float64'>

print(type(1.0) == type(numpy.float64(1.0))) # False

Makes sense. Will fix in the next release, but need to study Python's builti-ns vs Numpy type compatibility first.

from sklearn2pmml.

vruusmann commented on August 20, 2024

I've been thinking about this issue, and I've decided to keep the current behaviour unchanged (ie. "won't fix").

It still makes sense to require absolute type identity both on the key side and on the value side. Once we start mixing types, it becomes possible that the dict lookup-behaviour breaks down (ie. a float dict key may or may not match a numpy.float64 lookup key), or that the returned values become variable (ie. sometimes you get float values, some other time numpy.float64 values) - the effects will cascade down the pipeline, and break some unsuspecting step much farther down.

If there's a chance of mixed type objects - most notably a mix of Python and Numpy scalars - then they should be normalized to either representation. I'd personally unpack Numpy scalars using the following helper:

def _normalize(x):
  if hasattr(x, "item"):
    return x.item()
  return x

raw_mapping = ...
raw_default_value = ...

transformer = LookupTransformer(mapping = {k : _normalize(v) for k, v in raw_mapping.items()}, default_value = _normalize(raw_default_value))

from sklearn2pmml.

Recommend Projects

LookupTransformer TypeError when default value is not exactly the type from the mapping about sklearn2pmml HOT 2 CLOSED

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs