GithubHelp home page GithubHelp logo

recbole-pjf's Introduction

RecBole-PJF

RecBole-PJF is a library built upon PyTorch and RecBole for reproducing and developing person-job fit algorithms. Our library includes algorithms covering three major categories:

  • CF-based Model make recommendations based on collaborative filtering;
  • Content-based Model make recommendations mainly based on text matching;
  • Hybrid Model make recommendations based on both interaction and content.

Highlights

  • Unified framework for different methods, including collaborative methods, content-based methods and hybrid methods;
  • Evaluate from two perspective for both candidates and employers, which is not contained in previous frameworks;
  • Easy to extend models for person-job fit, as we provide multiple input interfaces for both interaction and text data. And our library shares unified API and input (atomic files) as RecBole.

Requirements

recbole>=1.0.0
pytorch>=1.7.0
python>=3.7.0

Implemented Models

We list currently supported models according to category:

CF-based Model:(take follows as example, as these models are implement in RecBole and we just use them)

Content-based Model:

Hybrid Model:

Dataset and Quick-Start

  • zhilian from TIANCHI data contest.
  • kaggle from kaggle Job Recommendation Case Study.-Start

We provide processing scripts in the corresponding folder (e.g. /dataset/zhilian/) and if you want to run experiments with these two datasets, the first step is to download the source files and then run the processing script, converting it to atomic files. The script is as following (take zhilian for example):

cd dataset/zhilian
python prepare_zhilian.py

With the source code, you can use the provided script for initial usage of our library:

python run_recbole_pjf.py

If you want to change the models or datasets, just run the script by setting additional command parameters:

python run_recbole_pjf.py -m [model] -d [dataset]

Atomic Files and Data Formats

Suffix Content Example Format
.inter User-job interaction user_id, job_id, direct, timestamp, label
.user User feature user_id, age, gender
.item Job feature job_id, category
.udoc Text description of user user_id, user_doc
.idoc Text description of job job_id, job_doc
.uvec Text encoding of user (optional) Generated by Numpy on the first run
.ivec Text encoding of job (optional) Generated by Numpy on the first run
  • .inter / .user / .item: The formats of these three atomic files are the same as that in RecBole.
  • .udoc / .idoc: These two atomic files contain text descriptions of users and jobs. The data in the file should be the result of sentence and word segmentation, which means you need to preprocess the text in advance. Each line represents a sentence for a user and words should be separated by SPACE (" "). And there will be many rows per user in this files.
  • .uvec / .ivec: These two atomic files contain a more advanced text encoding format: text vector. We provide a text encoding method with BERT in the code. If you try to use this tool (with the parameter ADD_BERT) , the BERT model will be called on the first run to get the text encoding, and save it as "*.uvec / *.ivec". That is the source of this file. After the first run, it will read this file directly and avoid running again. Besides, you can also encode the text in your own way as long as the resulting format is the same as ours.

Hyper-tuning

We tune the hyper-parameters of the implemented models of each category and release the adjustment range for reference:

For fair comparison, we set embedding_size to 128 for all models and tune other parameters.

  • zhilian
model Best Parameter Parameter Range
BPRMF learning_rate = 1e-3 learning_rate in [1e-3, 1e-4, 1e-5]
NCF learning_rate = 1e-3, map_hidden_size = [64] learning_rate in [1e-3, 1e-4, 1e-5], mlp_hidden_size in [[64], [64, 32], [64, 32, 16]]
LightGCN learning_rate = 1e-4, n_layers = 3 learning_rate in [1e-3, 1e-4, 1e-5], n_layers in [2, 3, 4]
LFRR learning_rate = 1e-4 learning_rate in [1e-3, 1e-4, 1e-5]
BERT learning_rate = 1e-3 learning_rate in [1e-3, 1e-4, 1e-5]
PJFNN learning_rate = 1e-3, max_sent_num = 20, max_sent_len = 20 learning_rate in [1e-3, 1e-4, 1e-5], max_sent_num in [10, 20, 30], max_sent_len in [10, 20, 30]
BPJFNN learning_rate = 1e-3, max_sent_num = 20, max_sent_len = 20, hidden_size = 64 learning_rate in [1e-3, 1e-4, 1e-5], max_sent_num in [10, 20, 30], max_sent_len in [10, 20, 30], hidden_size in [64, 32]
APJFNN learning_rate = 1e-3, num_layers = 1, hidden_size = 32 learning_rate in [1e-3, 1e-4, 1e-5], num_layers in [1, 2], hidden_size in [32, 64]
PJFFF-BERT learning_rate = 1e-4, hidden_size = 32, history_item_len = 20 learning_rate in [1e-3, 1e-4, 1e-5], hidden_size in [32, 64], history_item_len in [20, 50]
IPJF-BERT learning_rate = 1e-3, max_sent_num = 20, max_sent_len = 30, learning_rate in [1e-3, 1e-4, 1e-5], max_sent_num in [10, 20, 30], max_sent_len in [10, 20, 30]

The Team

RecBole-PJF is developed and maintained by members from RUCAIBox, the main developers are Chen Yang (@flust), Yupeng Hou (@hyp1231), Shuqing Bian (@fancybian).

Acknowledgement

The implementation is based on the open-source recommendation library RecBole.

Please cite the following paper as the reference if you use our code or processed datasets.

@inproceedings{zhao2021recbole,
  title={Recbole: Towards a unified, comprehensive and efficient framework for recommendation algorithms},
  author={Wayne Xin Zhao and Shanlei Mu and Yupeng Hou and Zihan Lin and Kaiyuan Li and Yushuo Chen and Yujie Lu and Hui Wang and Changxin Tian and Xingyu Pan and Yingqian Min and Zhichao Feng and Xinyan Fan and Xu Chen and Pengfei Wang and Wendi Ji and Yaliang Li and Xiaoling Wang and Ji-Rong Wen},
  booktitle={{CIKM}},
  year={2021}
}

recbole-pjf's People

Contributors

flust avatar hyp1231 avatar zhengbw0324 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.