GithubHelp home page GithubHelp logo

logicjake / 2020-yizhifu-credit-risk-user-identification-top2 Goto Github PK

View Code? Open in Web Editor NEW
33.0 33.0 11.0 18 KB

第二届翼支付杯大数据建模大赛-信用风险用户识别Top2

Home Page: https://www.logicjake.xyz/2020/10/01/%E7%AC%AC%E4%BA%8C%E5%B1%8A%E7%BF%BC%E6%94%AF%E4%BB%98%E6%9D%AF%E5%A4%A7%E6%95%B0%E6%8D%AE%E5%BB%BA%E6%A8%A1%E5%A4%A7%E8%B5%9B-%E4%BF%A1%E7%94%A8%E9%A3%8E%E9%99%A9%E7%94%A8%E6%88%B7%E8%AF%86%E5%88%AB-%E8%B5%9B%E5%90%8E

Jupyter Notebook 100.00%

2020-yizhifu-credit-risk-user-identification-top2's Introduction

🔭 I’m currently working on and learning:

  • Network Embedding
  • Recommender System

🏆 Competition

  • 科大讯飞移动广告反欺诈算法挑战赛 Top2
  • 图灵联邦视频点击预测大赛 Top3 Code
  • KDD Cup 2020 Challenges for Modern E-Commerce Platform: Debiasing Top13 Code
  • 2020腾讯广告算法大赛 Top12 Code
  • 第二届翼支付杯大数据建模大赛-信用风险用户识别 Solo Top2 Code
  • 零基础入门推荐系统 - 新闻推荐 Top2 Code
  • Kaggle Riiid! Answer Correctness Prediction 前2% Code
  • 一点资讯技术编程大赛CTR赛道 Top1 Code

🔧 Project

  • MLCompetitionHub: 机器学习竞赛信息聚合(Machine learning competition information aggregation) Code Homepage
  • WebMonitor: 实时监控网页变化并发送通知 (Real-time web page monitoring and change notifications) Code Homepage

2020-yizhifu-credit-risk-user-identification-top2's People

Contributors

logicjake avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

2020-yizhifu-credit-risk-user-identification-top2's Issues

feature中的一些函数好像并没有用到

例如select——feature、w2v_emb、tfidf_emb等函数好像没有再使用过,请问作者这是什么原因?
`def select_feature(df, select_feature, ycol, p):
X = df[select_feature]
X.fillna(0, inplace=True)
Y = df[ycol]

selectChi2 = SelectPercentile(chi2, percentile=p).fit(X, Y)
selectF_classif = SelectPercentile(f_classif, percentile=p).fit(X, Y)

chi2_selected = selectChi2.get_support()
print('Chi2 selected {} features.'.format(chi2_selected.sum()))
f_classif_selected = selectF_classif.get_support()
print('F_classif selected {} features.'.format(f_classif_selected.sum()))
selected = chi2_selected & f_classif_selected
print('Chi2 & F_classif selected {} features'.format(selected.sum()))
selected_features = [f for f, s in zip(select_feature, selected) if s]

del_features = list(set(select_feature) - set(selected_features))
del_features.sort()
return del_features`

关于特征选择的一些问题

作者你好,自己是个新手想问一些问题,在做聚合的时候,我们有summeanmaxmin等聚合函数可以选择,我注意到你对amount的一些聚合有时候只使用了sum,比如下面这段代码

for col in ['platform', 'type1', 'type2', 'hour']:
    df_temp = df_trans.pivot_table(
        index='user', columns=col, values='amount', aggfunc='sum').reset_index()
    df_temp.columns = [c if c == 'user' else 'trans_{}_{}_amount_sum'.format(
        col, c) for c in df_temp.columns]
    df_feature = df_feature.merge(df_temp, how='left')

for col in ['type1', 'hour']:
    df_temp = df_trans.pivot_table(
        index='user', columns=col, values='amount', aggfunc='mean').reset_index()
    df_temp.columns = [c if c == 'user' else 'trans_{}_{}_amount_mean'.format(
        col, c) for c in df_temp.columns]
    df_feature = df_feature.merge(df_temp, how='left')

for col in ['type1']:
    df_temp = df_trans.pivot_table(
        index='user', columns=col, values='amount', aggfunc='max').reset_index()
    df_temp.columns = [c if c == 'user' else 'trans_{}_{}_amount_max'.format(
        col, c) for c in df_temp.columns]
    df_feature = df_feature.merge(df_temp, how='left')

for col in ['type1']:
    df_temp = df_trans.pivot_table(
        index='user', columns=col, values='amount', aggfunc='min').reset_index()
    df_temp.columns = [c if c == 'user' else 'trans_{}_{}_amount_min'.format(
        col, c) for c in df_temp.columns]
    df_feature = df_feature.merge(df_temp, how='left')

我想问一下,这些聚合函数的选择有什么规律嘛?是要根据产生的数据再进行判断筛选吗?
感谢作者的开源

EDA的代码

比较想看到是怎么分析过程的,EDA有相关的代码吗

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.