GithubHelp home page GithubHelp logo

mimn's Introduction

Practice on Long Sequential User Behavior Modeling for Click-Through Rate Prediction

Implementation of Practice on Long Sequential User Behavior Modeling for Click-Through Rate Prediction using tensorflow

Prerequisites

  • Python 2.x
  • Tensorflow 1.4

Data

Getting Started

First we need to prepare data.

Amazon Prepare

  • You can get the raw Amazon data prepared
sh prepare_amazon.sh
  • Because getting and processing the data is time consuming,we had processed Amazon data and upload it for you.
sh prepare_ready_data.sh

Taobao Prepare

First download Taobao Data to get "UserBehavior.csv.zip", then execute the following command.

sh prepare_taobao.sh

Running

usage: train_book.py|train_taobao.py  [-h] [-p TRAIN|TEST] [--random_seed RANDOM_SEED]
                     [--model_type MODEL_TYPE] [--memory_size MEMORY_SIZE]
                     [--mem_induction MEM_INDUCTION]
                     [--util_reg UTIL_REG]

Base Model

The example for DNN

python script/train_book.py -p train --random_seed 19 --model_type DNN
python script/train_book.py -p test --random_seed 19 --model_type DNN

The model below had been supported:

  • DNN
  • PNN
  • DIN
  • GRU4REC
  • ARNN
  • RUM
  • DIEN
  • DIEN_with_neg

MIMN

You can train MIMN with different parameter setting:

  • MIMN Basic
python script/train_taobao.py -p train --random_seed 19 --model_type MIMN --memory_size 4 --mem_induction 0 --util_reg 0
  • MIMN with Memory Utilization Regularization
python script/train_taobao.py -p train --random_seed 19 --model_type MIMN --memory_size 4 --mem_induction 0 --util_reg 1
  • MIMN with Memory Utilization Regularization and Memory Induction Unit
python script/train_taobao.py -p train --random_seed 19 --model_type MIMN --memory_size 4 --mem_induction 1 --util_reg 1
  • MIMN with Auxiliary Loss
python script/train_taobao.py -p train --random_seed 19 --model_type MIMN_with_neg --memory_size 4 --mem_induction 0 --util_reg 0

If you want to train Amazon Data, you just need replace above train_taobao.py to train_book.py

mimn's People

Contributors

rocket-launching avatar uic-paper avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

mimn's Issues

split_by_user.py的脚本有问题吧?

import random

fi = open("local_test", "r")
ftrain = open("local_train_splitByUser", "w")
ftest = open("local_test_splitByUser", "w")

while True:
    rand_int = random.randint(1, 10)
    noclk_line = fi.readline().strip()
    clk_line = fi.readline().strip()
    if noclk_line == "" or clk_line == "":
        break
    if rand_int == 2:
        print >> ftest, noclk_line
        print >> ftest, clk_line
    else:
        print >> ftrain, noclk_line
        print >> ftrain, clk_line

这个脚本对测试集划分为train和test,写的有问题吧? 不过看起来之前的步骤local_aggretor.py里就已经划分好了吧,

关于训练数据其中的行为类型

请问,我看了在taobao的训练数据中,数据得预处理,其中行为类型好像没有用到?是这样吗?另外时间数据,只是作为一个排序来使用的,并没有考虑行为之间的时间差是吗?

淘宝数据预处理会卡死?有木有TF2的版本?

UserBehavior.csv共3G,跑了一晚上还是没完成,早上看卡在了3一动不动:
D:\Anaconda3\envs\TF2\python.exe F:/python/MIMN-master/preprocess/taobao_prepare.py
4162024 987994 9439 4
feature_size 4162024 5159462
group completed
987994
get user last touch time completed
1
2
3

然后一个白天也一动不动。有没有童鞋们把mimn跑起来呢?本想改成TF2的版本的,发现_Linear比较麻烦,然后放弃了

"i==1" in mimn.py Line84 should be changed by "i>=self.read_head_num"

I think "i==1" in mimn.py Line84 should be changed by "i>=self.read_head_num".
From my point of view, this "i==1" is used to limit the operation only on write head weight when the self.read_head_num==1 and self.write_head_num==1.
But these two parameter can be changes in the code, so this "i==1" can cause problem when self.read_head_num!=1 or self.write_head_num!=1.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.