Light

uic-paper / mimn Goto Github PK

View Code? Open in Web Editor NEW

281.0 6.0 92.0 68.86 MB

License: MIT License

Shell 0.52% Python 99.48%

mimn's Introduction

Practice on Long Sequential User Behavior Modeling for Click-Through Rate Prediction

Implementation of Practice on Long Sequential User Behavior Modeling for Click-Through Rate Prediction using tensorflow

Prerequisites

Python 2.x
Tensorflow 1.4

Data

Getting Started

First we need to prepare data.

Amazon Prepare

You can get the raw Amazon data prepared

sh prepare_amazon.sh

Because getting and processing the data is time consuming，we had processed Amazon data and upload it for you.

sh prepare_ready_data.sh

Taobao Prepare

First download Taobao Data to get "UserBehavior.csv.zip", then execute the following command.

sh prepare_taobao.sh

Running

usage: train_book.py|train_taobao.py  [-h] [-p TRAIN|TEST] [--random_seed RANDOM_SEED]
                     [--model_type MODEL_TYPE] [--memory_size MEMORY_SIZE]
                     [--mem_induction MEM_INDUCTION]
                     [--util_reg UTIL_REG]

Base Model

The example for DNN

python script/train_book.py -p train --random_seed 19 --model_type DNN
python script/train_book.py -p test --random_seed 19 --model_type DNN

The model below had been supported:

DNN
PNN
DIN
GRU4REC
ARNN
RUM
DIEN
DIEN_with_neg

MIMN

You can train MIMN with different parameter setting:

MIMN Basic

python script/train_taobao.py -p train --random_seed 19 --model_type MIMN --memory_size 4 --mem_induction 0 --util_reg 0

MIMN with Memory Utilization Regularization

python script/train_taobao.py -p train --random_seed 19 --model_type MIMN --memory_size 4 --mem_induction 0 --util_reg 1

MIMN with Memory Utilization Regularization and Memory Induction Unit

python script/train_taobao.py -p train --random_seed 19 --model_type MIMN --memory_size 4 --mem_induction 1 --util_reg 1

MIMN with Auxiliary Loss

python script/train_taobao.py -p train --random_seed 19 --model_type MIMN_with_neg --memory_size 4 --mem_induction 0 --util_reg 0

If you want to train Amazon Data, you just need replace above train_taobao.py to train_book.py

mimn's People

Contributors

Stargazers

Watchers

Forkers

zhaoyangliu-leo longchuan1985 akyoung wakeupbuddy siqi13579 mindis princewen xuelun gusuperstar huyiyun james-fu sandy4321 zyj0704033 buptygz danifree zwcdp xuejj wukailun wweic asd8360 fword 466152112 151232summerday gasia44 liangsheng bigbear2017 hpzhao qihouying btbujiangjun kiminh likeucode xrosliang crystal22 xgodlike tandychao githubruowong danni9594 xiaofangchen nipengmath seeker1943 tjumeijianjun lu-bing-hua qianrenjian nwf5d winsonguo alanhome fangzheng354 oddecust renlang97 tingw0w modriczhou xuqiong1989 aliang-cn yafeiwu foreverqing xuguangshu huangjinsuzhou shudaonan9 joannapang guoshenli kayaktel lebinlebin extreme-lxh 10152672 xuxueshan88 halimiqi nnnnwang world4jason xuxinrui30 zhangcheng-007 joseph-chan hannawong chaoongithub louiszango rcdnn yqpub wangxuekui semutter leeflora kingleao shencangblue luluchengll renyi533 mukul54 shubhampachori12110095 nyb wurentidai py-zhai xiaolongc929 xjzhou cherish6092

mimn's Issues

split_by_user.py的脚本有问题吧？

import random

fi = open("local_test", "r")
ftrain = open("local_train_splitByUser", "w")
ftest = open("local_test_splitByUser", "w")

while True:
    rand_int = random.randint(1, 10)
    noclk_line = fi.readline().strip()
    clk_line = fi.readline().strip()
    if noclk_line == "" or clk_line == "":
        break
    if rand_int == 2:
        print >> ftest, noclk_line
        print >> ftest, clk_line
    else:
        print >> ftrain, noclk_line
        print >> ftrain, clk_line

这个脚本对测试集划分为train和test，写的有问题吧？不过看起来之前的步骤local_aggretor.py里就已经划分好了吧，

关于训练数据其中的行为类型

请问，我看了在taobao的训练数据中，数据得预处理，其中行为类型好像没有用到？是这样吗？另外时间数据，只是作为一个排序来使用的，并没有考虑行为之间的时间差是吗？

训练集和测试集文档可以放一下嘛？

淘宝数据预处理会卡死？有木有TF2的版本？

UserBehavior.csv共3G，跑了一晚上还是没完成，早上看卡在了3一动不动：
D:\Anaconda3\envs\TF2\python.exe F:/python/MIMN-master/preprocess/taobao_prepare.py
4162024 987994 9439 4
feature_size 4162024 5159462
group completed
987994
get user last touch time completed
1
2
3

然后一个白天也一动不动。有没有童鞋们把mimn跑起来呢？本想改成TF2的版本的，发现_Linear比较麻烦，然后放弃了

"i==1" in mimn.py Line84 should be changed by "i>=self.read_head_num"

I think "i==1" in mimn.py Line84 should be changed by "i>=self.read_head_num".
From my point of view, this "i==1" is used to limit the operation only on write head weight when the self.read_head_num==1 and self.write_head_num==1.
But these two parameter can be changes in the code, so this "i==1" can cause problem when self.read_head_num!=1 or self.write_head_num!=1.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs