songgc / tf-recomm Goto Github PK

Tensorflow-based Recommendation systems

License: Apache License 2.0

Python 97.27% Shell 2.73%

tf-recomm's Introduction

TF-recomm

Tensorflow-based Recommendation systems

Factorization models are very popular in recommendation systems because they can be used to discover latent features underlying the interactions between two different kinds of entities. There are many variations of factorization algorithms (SVD, SVD++, factorization machine, ...). When implementing them or developing new ones, you probably spend a lot of time on the following areas rather than modeling:

Derivative calculation
Variant SGD algorithm exploration
Multi-thread acceleration
Vectorization acceleration

Tensorflow is a general computation framework using data flow graphs although deep learning is the most important application of it. With Tensorflow, derivative calculation can be done by auto differentiation, which means that you only need to write the inference part. It provides variant fancy SGD learning algorithms, CPU/GPU acceleration, and distributed training in a computer cluster. Since Tensorflow has some embedding modules for word2vc-like application, it is supposed to be a good platform for factorization models as well, even in production. Please note that embedding in deep learning is equivalent to factorization in shallow learning!

Requirements

Tensorflow >= r0.12 (Since Tensorflow is changing high-level APIs frequently, please check its version if errors happen.)
Numpy
Pandas

Data set

MovieLens 1M data set is used. It looks as follows. The columns are user ID, item ID, rating, and timestamp

1::1193::5::978300760
1::661::3::978302109
1::914::3::978301968
1::3408::4::978300275
1::2355::5::978824291

The problem is to predict the rating given by user u and item i. The metric is typically RMSE between the true ratings and predictions.

SVD implementation

Graph

Given by user u and item i, the inference of the classic SVD is

y_pred[u, i] = global__bias + bias_user[u] + bias_item_[i] + <embedding_user[u], embedding_item[i]>

The objective is to minimize

\sum_{u, i} |y_pred[u, i] - y_true[u, i]|^2 + \lambda(|embedding_user[u]|^2 + |embedding_item[i]|^2)

The above can be directly written by Tensorflow DSL as the operations. The Adam algorithm is used for the optimizer. The TF graph would be like

Run

./download_data.sh
python svd_train_val.py

The results would be as follows. The validation RMSE is around or below 0.850.

epoch train_error val_error elapsed_time
  0 2.637530 2.587753 0.129696(s)
  1 1.034569 0.908686 4.110165(s)
  2 0.859582 0.887105 4.137014(s)
  3 0.835467 0.878246 4.132146(s)
  ...
 97 0.742144 0.849553 4.114120(s)
 98 0.742159 0.849624 4.120170(s)
 99 0.742281 0.849537 4.133140(s)

Speed tuning

To enable GPU for math operations, just set DEVICE="/gpu:0". The computation time of one epoch reduced from 4.1 to 3.4s.

Changing the batch size from 1000 to 10000. One epoch only requires 1.1s.

If you have large data, it is better to use TF data pipeline.

Others:

SVD++ will be provided soon.

Reference

Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model

tf-recomm's People

Contributors

Stargazers

Watchers

Forkers

techstone hitluobin rockteen yydxlv wzkg2012 xypan1232 anhmike tengshan alanguo001 roshanraj stephen-xu infinitedreams9586 veterun woodstone121 laisun ahangchen pklfz ljdawn youngjt yangwithtao giserh benjamesbabala undarmaa chengniu ltouati kantkun artorian timwee furongpeng ratulghosh lenovor chao-jiang vtkingdom johnnyhuo pengzhang123 jacktang billhongs wangluyi1982 lqleeqee yebi2013 sweaterr redroostermobile ericyue yourens kero13 brahmaslee lihao1992 vyraun howardyan93 nilopc-tensorflow-learning fanyn0 prm10 switchfootsid kelvict dtsukiyama haohanchen owenjsw qicst23 spark727 squiba lilin201501 gamegrd acobley fx-cc perterest xianfengju namyunz alexseong imutlab amano-ginji jackyu86 mdiby xwc940512 evangeliab ajoeajoe bowenbao sam186 cnglen xielm12 libardo1 tony32769 think-station melody-xiaomi reilf maggie0830 tmrocha89 piandpower coocoky oooqqqooo zhuyuuyuhz timscholtes georgiosp powercolors goustzhu jiangzhonglian shashank-bhatt-07 stevenlee-belief bdqnghi zhouyonglong bigzihao

tf-recomm's Issues

Do Item Item recommendation

Hi,

is it possible to do do recommendation based on similar items only without any user-item ranking?

How to prove it?

In the README, there is a statement: "Please note that embedding in deep learning is equivalent to factorization in shallow learning!". But how to prove this is true? Do you have any paper on this?

SVD++

Are you still working on an SVD ++ implementation, and if so, when do you expect that to be released?

How to use predictions after training the model

Hi Guocong Song,

I do not know if this is the right place to ask questions but I did not find a better place.
I'm trying to figure out your code to integrate a recommender system in my application.
I downloaded the code and trained the model.
After training the model I would need to make the predictions, that is, get suggestions for each user so that they can be saved in a DB from which I access with a NODE.js application.

How do I retrieve this list of suggestions for each user when the training is done?
I also did the same question on stack overflow: https://stackoverflow.com/questions/44898080/recommender-system-svd-with-tensorflow

Thanks in advance if you will dedicate some time to help me.

why change optimizer from ftrl to adam?

saving the model after training

Hi,
how to save the model after training it on custom dataset and use the saved model for predictions

Thanks in advance

when restart tf-recomm

this error shows

ValueError: Variable bias_global does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=None in VarScope?

error while training on custom data

Hello,
I am trying to train the model on custom dataset but in my case the user_id and the item_id are too long (more than 20digit) ,
so i am facing an error that int is too long to convert to c long , i tried to convert them in to float but it seems tensorflow.placeholder takes only int32 and int 64 as input. can you please help me in this regard.

Thank You in advance

Is there recommend?

When I checked TF-recomm, I couldn't find out recommend api.
As you know, movie recommend engine can recommend movie like crab recommender.

#Recommend items for the user 5 (Toby)
recommender.recommend(5)
[(5, 3.3477895267131013), (1, 2.8572508984333034), (6, 2.4473604699719846)]

Is there any recommend api like above example?

Why is it SVD?

Hi,

I'm trying to learn this project and have a basic question. As said in the project description, the inference is y_pred[u, i] = global__bias + bias_user[u] + bias_item_[i] + <embedding_user[u], embedding_item[i]>. Why is it SVD? SVD is the factorization of matrix, I don't see the identical between SVD and the algorithm used in this project. Appreciate if you can explain.

Thanks
Yang

big number of users

Hello,
What happens when the number of users is very large ( e.g. 1 billion ) -
The embedding matrix would be gigantic.
This seems unfeasible - or am I wrong ?

Thanks

I have close to 60 million rows in my dataframe with around 5 million users and 2k items. How do you suggest i go about my implentation.

AttributeError: 'module' object has no attribute 'get_global_step'

rzai@rzai00:/prj/TF-recomm$ python svd_train_val.py
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so locally
Traceback (most recent call last):
File "svd_train_val.py", line 97, in
svd(df_train, df_test)
File "svd_train_val.py", line 60, in svd
_, train_op = ops.optimization(infer, regularizer, rate_batch, learning_rate=0.001, reg=0.05, device=DEVICE)
File "/home/rzai/prj/TF-recomm/ops.py", line 27, in optimization
global_step = tf.train.get_global_step()
AttributeError: 'module' object has no attribute 'get_global_step'
rzai@rzai00:/prj/TF-recomm$

Issue with data set

./download_data.sh
python svd_train_val.py

currently running on 64 bit windows 10 system having an issue with running the data set portion it creates the folder I need but it is empty not sure if this might be just a windows issue.

songgc / tf-recomm Goto Github PK

tf-recomm's Introduction

TF-recomm

Tensorflow-based Recommendation systems

Requirements

Data set

SVD implementation

Graph

Run

Speed tuning

Others:

Reference

tf-recomm's People

Contributors

Stargazers

Watchers

Forkers

tf-recomm's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs