mquad / hgru4rec Goto Github PK

Code for our ACM RecSys 2017 paper "Personalizing Session-based Recommendation with Hierarchical Recurrent Neural Networks"

License: MIT License

Python 97.75% Shell 2.25%

hgru4rec's Introduction

HGRU4Rec

Code for our ACM RecSys 2017 paper "Personalizing Session-based Recommendation with Hierarchical Recurrent Neural Networks". See the paper: https://arxiv.org/abs/1706.04148

Setup

This code is based of GRU4Rec (https://github.com/hidasib/GRU4Rec). As the original code, it is written in Python 3.4 and requires Theano 0.8.0+ to run efficiently on GPU. In addition, this code uses H5Py and PyTables for efficient I/O operations.

We suggest to use virtualenv or conda (preferred) together with requirements.txt to set up a virtual environment before running the code.

Experiments on the XING dataset

This repository comes with the code necessary to reproduce the experiments on the XING dataset. This dataset was released to the participants of the 2016 Recsys Challenge.

Download the dataset (see here, though it is no longer available. See format in this comment). You will only need the file interactions.csv.
cd data/xing, then run python build_dataset.py <path_to_interactions> to build the dataset. It will be saved under data/xing/dense/last-session-out/sessions.hdf.
To run HGRU on this dataset, go to scripts folder. Then run sh xing_dense_small.sh to execute small HRNN networks, or run sh xing_dense_large.sh to execute large HRNN networks. See the paper for further details (notice that we used random seeds in {0..9} in our experiments).

NOTE: These experiments run quite efficiently on CPU too (small networks train and evaluate in ~20 minutes on a 8-core Intel(R) Xeon(R) CPU E3-1246 v3 @ 3.50GHz).

hgru4rec's People

Stargazers

Watchers

hgru4rec's Issues

'HGRU4Rec' object has no attribute 'predict_next'

Hi, I am trying to experience your work on my dataset, but I find that when I use evaluate_sessions() in evalutaion.py, something is wrong with "preds = pr.predict_next(sid, prev_iid, prev_uid, items_to_predict)", do you notice that?

questions about the training process

hi，
I'm not familiar with Theano, so I have some questions about the training process.

According to the code in line 919-936 in hgru4rec.py, it seems that the input length of data is set to 1 in each mini-batch, which means each mini-batch only consists of data from one time step. I am wondering, in this way, could the error back propagation through time?

The original data set can't be found.

I want to get the dataset and create a team as the instructions illustrated, but no one ever approved my team. Kindly provide me if you have it...

Hi, I would like to get this dataset for research. Could you help me provide it? email : [email protected]

Incremental Memory

I add logger.info(memmory_use) in the iterate function which is called in HGRU4REC.fit and get the log below. Is this normal?

other information

I run this code with CPU.
I run this code on my own dataset (about 28 million interaction)
I notice this because the process is out of memory.

training log of the first epoch

2018-09-29 19:57:07,118: main: INFO: Training started
2018-09-29 19:57:38,527: hgru4rec: INFO: Epoch Begin!
2018-09-29 19:57:38,530: hgru4rec: INFO: 227.51(GB) memory left.
2018-09-29 19:57:38,600: hgru4rec: INFO: 227.51(GB) memory left.
..........
2018-09-29 20:01:47,200: hgru4rec: INFO: 209.16(GB) memory left.
..........
2018-09-29 20:05:31,439: hgru4rec: INFO: 191.81(GB) memory left.
..........
2018-09-29 20:08:45,621: hgru4rec: INFO: 177.09(GB) memory left.
..........
2018-09-29 20:11:37,108: hgru4rec: INFO: 164.1(GB) memory left.
..........
2018-09-29 20:14:05,733: hgru4rec: INFO: 152.86(GB) memory left.
..........
2018-09-29 20:16:07,679: hgru4rec: INFO: 143.68(GB) memory left.
..........
2018-09-29 20:17:50,016: hgru4rec: INFO: 136.02(GB) memory left.
..........
2018-09-29 20:19:15,243: hgru4rec: INFO: 129.7(GB) memory left.
..........
2018-09-29 20:20:31,730: hgru4rec: INFO: 124.03(GB) memory left.
..........
2018-09-29 20:22:54,691: hgru4rec: INFO: 114.76(GB) memory left.
..........
2018-09-29 20:26:05,653: hgru4rec: INFO: 106.84(GB) memory left.
..........
2018-09-29 20:28:25,798: hgru4rec: INFO: 96.48(GB) memory left.
..........
2018-09-29 20:31:37,315: hgru4rec: INFO: 82.28(GB) memory left.
....
2018-09-29 20:32:39,487: hgru4rec: INFO: Epoch 0 - train cost: 0.8193

Concatenating Additional Features

Hi,

Your paper is really interesting and thank you for providing your code.
I am currently trying to add additional features to the RNN input. I don't get any errors but the gpu utilisation drops massively and the computations get very slow. All I do is concatenating a batchsize x nr features tensor.matrix to embedding (SE_item) and provide the input respectively.

My questions are:
Have you tried something similar?
Am I missing something in the concatenation? do I need to adapt "Sin" aswell?

Code snippet of concatenation:
SE_item = self.E_item[X] # sampled item embedding
input_vec = T.concatenate([SE_item, X_additional], axis=1) # X_additonal -> tensor.matrix
vec = T.dot(input_vec, self.Ws_in[0]) + self.Bs_h[0]
Sin = SE_item

Thank you so much for your help

About the data statistics and performance

First, thank you for sharing a good paper and code.

We have inquiries about data and performance.

data statistics
When xing data was preprocessed based on your code, 58,035 items were derived.
The number of users and the number of events are the same as mentioned in your paper, but the number of items is different.
I wonder if you can recheck the number of items.
Performance
When I experiment through your scripts, I couldn't get as much performance as in your paper.
When I run the model with ten epochs, Recall@5 values of HRNN-All(#hidden=100) were 0.1280 and HRNN-All(#hidden=500) was 0.1355.
(the performance were 0.1334 and 0.1482 respectively in your paper)
Increasing the epoch did not change the outcome significantly.
Please check if the script you uploaded is correct for getting the best performance.

where is download interactions.csv please

Doubt on reset session_level init Hs[0]

In the initialization of session_level you use a mask for resetting the hidden state of the cell for some items of the batch and generate h_s = Hs[0] * (1 - Sstart[:, None]) + h_s_init * Sstart[:, None] but then you use the same old state Hs[0] to calculate the final output state of the current cell. Why is Hs[0] used instead of h_s in h_s = (1.0 - z) * Hs[0] + z * h_s_candidate.T?. Thanks in advance.

How to show Training accuracy

Hi, is there any way where we can print the training accuracy percentage ?

Is there any way of downloading the dataset XING?

hi, i have noticed that your work is very interesting. I want to reproduce the results. Since the recsys challenge has closed. I have no idea to download the dataset. I would appreciate a lot if you have any suggestions?

how to get the dataset interactions.csv

hi , @mquad ,i can't get dataset from XING.com ,could you tell how to get the dataset?
thank you!

If hgru4rec has been applied in the real scene？

Hi, your job is very amazing. I want to know if this method has been applied in the real recommendation scene? What is the effect?

How to understand the loss function in the training process?

Hi,
I am not goot at Theano, so I cannot understand this clearly. Recently I am using Pytorch to reproduce this paper. About the loss function, I have some questions:
(1) Do you calculate the loss for each hidden state of GRU? Or only the last state?
(2) Do you regard this session-based recommendation as one multi-classification problem in which the class number is the number of items and BPR loss is applied here? If this, when the item number are too large, does this way still work?
Thank you very much.

Request for recsys challenge 2016 data

The Recsys Challenge 2016 data is no longer available. Is there any link that you can share the data? Thanks.

mquad / hgru4rec Goto Github PK

hgru4rec's Introduction

HGRU4Rec

Setup

Experiments on the XING dataset

hgru4rec's People

Stargazers

Watchers

Forkers

hgru4rec's Issues

other information

training log of the first epoch

Recommend Projects

Recommend Topics

Recommend Org

Jobs