GithubHelp home page GithubHelp logo

kiraplenkin / woe_scoring Goto Github PK

View Code? Open in Web Editor NEW
8.0 3.0 4.0 178 KB

Monotone Weight Of Evidence Transformer and LogisticRegression model with scikit-learn API

License: MIT License

Python 100.00%
woe woebinning monotonic scorecard scorecard-model scorecards

woe_scoring's Introduction

Code style: black

WOE-Scoring

Monotone Weight Of Evidence Transformer and LogisticRegression model with scikit-learn API

Quickstart

  1. Install the package:
pip install woe-scoring
  1. Use WOETransformer:
import pandas as pd
from woe_scoring import WOETransformer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

df = pd.read_csv("titanic_data.csv")
train, test = train_test_split(
    df, test_size=0.3, random_state=42, stratify=df["Survived"]
)

special_cols = [
    "PassengerId",
    "Survived",
    "Name",
    "Ticket",
    "Cabin",
]

cat_cols = [
    "Pclass",
    "Sex",
    "SibSp",
    "Parch",
    "Embarked",
]

encoder = WOETransformer(
    max_bins=8,
    min_pct_group=0.1,
    diff_woe_threshold=0.1,
    cat_features=cat_cols,
    special_cols=special_cols,
    n_jobs=-1,
    merge_type="chi2",
)

encoder.fit(train, train["Survived"])
encoder.save_to_file("train_dict.json")

encoder.load_woe_iv_dict("train_dict.json")
encoder.refit(train, train["Survived"])

enc_train = encoder.transform(train)
enc_test = encoder.transform(test)

model = LogisticRegression()
model.fit(enc_train, train["Survived"])
test_proba = model.predict_proba(enc_test)[:, 1]
  1. Use CreateModel:
import pandas as pd
from woe_scoring import CreateModel
from sklearn.model_selection import train_test_split

df = pd.read_csv("titanic_data.csv")
train, test = train_test_split(
    df, test_size=0.3, random_state=42, stratify=df["Survived"]
)

special_cols = [
    "PassengerId",
    "Survived",
    "Name",
    "Ticket",
    "Cabin",
]

model = CreateModel(
    max_vars=5,
    special_cols=special_cols,
    selection_method="sfs",
    model_type="sklearn",
    gini_threshold=5.0,
    n_jobs=-1,
    random_state=42,
    class_weight="balanced",
    cv=3,
)
model.fit(train, train["Survived"])
test_proba = model.predict_proba(test[model.feature_names_])

print(model.coef_, model.intercept_)
print(model.feature_names_)

woe_scoring's People

Contributors

kiraplenkin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

woe_scoring's Issues

Change cat-binning

q_list = list(sorted(set(q_list)))

new_bins = [copy.deepcopy(bad_rates[0]["bin"])]
start = 1
for i in range(len(q_list) - 1):
    for n in range(start, len(bad_rates)):
        if (bad_rate_list[n] >= q_list[i]) & (
            bad_rate_list[n] < q_list[i + 1]
        ):
            try:
                new_bins[i] += bad_rates[n]["bin"]
                start += 1
            except IndexError:
                new_bins.append([])
                new_bins[i] += bad_rates[n]["bin"]
                start += 1

Bug in mono check

while (_mono_flags(bad_rates) is False) and (len(bad_rates) > 2):

Bug calc woe

(good + 0.5 / all_good) / (bad + 0.5 / all_bad)

Bins refactoring

bins.extend(
np.nanquantile(x, quantile / max_bins, axis=0)
for quantile in range(1, max_bins)
)

->

_, nbins = pd.qcut(x, q= max_bins, retbins=True)
bins += list(nbins)

Fix check_mono func

Num binning ->
while (_mono_flags(bad_rates) is False) and (len(bad_rates) > 2): and to top

_mono_flags ->
return True in [positive_mono_diff, negative_mono_diff]

Bug with checking Wald threshold

retrain = False
to_drop = []
for i, pvalue in enumerate(self.model.wald_test_terms().table["pvalue"]):
    if pvalue > 0.05:
        to_drop.append(self.model.wald_test_terms().table.index[i])
        retrain = True
if retrain:
    self.feature_names_ = [feature for feature in self.feature_names_ if feature not in to_drop]
    self.model = sm.Logit(y, sm.add_constant(x[self.feature_names_])).fit()

Save reports

434 if self.save_reports:
435 try:
436 with open(

AttributeError: 'CreateModel' object has no attribute 'save_reports'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.