GithubHelp home page GithubHelp logo

andreeaiana / newsreclib Goto Github PK

View Code? Open in Web Editor NEW
29.0 3.0 4.0 691 KB

PyTorch-Lightning Library for Neural News Recommendation

Home Page: https://newsreclib.readthedocs.io/en/latest/

License: MIT License

Makefile 0.12% Python 99.88%
content-based-recommendation deep-learning evaluation hydra machine-learning news-recommendation optuna pytorch pytorch-lightning recommendation

newsreclib's Introduction

NewsRecLib: A PyTorch-Lightning Library for Neural News Recommendation

python pytorch lightning torchmetrics hydra optuna Template black license

NewsRecLib is a library based on PyTorch Lightning and Hydra for the development and evaluation of neural news recommenders (NNR). The framework is highly configurable and modularized, decoupling core model components from one another. It enables running experiments from a single configuration file that navigates the pipeline from dataset selection and loading to model evaluation. NewsRecLib provides implementations of several neural news recommenders, training methods, standard evaluation benchmarks, hypeparameter optimization algorithms, extensive logging functionalities, and evaluation metrics (ranging from accuracy-based to beyond accuracy performance evaluation).

The foremost goals of NewsRecLib are to promote reproducible research and rigorous experimental evaluation.

NewsRecLib schema

Installation

NewsRecLib requires Python version 3.9 or later.

NewsRecLib requires PyTorch, PyTorch Lightning, and TorchMetrics version 2.0 or later. If you want to use NewsRecLib with GPU, please ensure CUDA or cudatoolkit version of 11.8.

Install from source

CONDA

   git clone https://github.com/andreeaiana/newsreclib.git
   cd newsreclib
   conda create --name newsreclib_env python=3.8
   conda activate newsreclib_env
   pip install -e .

Quick Start

NewsRecLib's entry point is the function train, which accepts a configuration file that drives the entire experiment.

Basic Configuration

The following example shows how to train a NRMS model on the MINDsmall dataset with the original configurations (i.e., news encoder contextualizing pretrained embeddings, model trained by optimizing cross-entropy loss), using an existing configuration file.

    python newsreclib/train.py experiment=nrms_mindsmall_pretrainedemb_celoss_bertsent

In the basic experiment, the experiment configuration only specifies required hyperparameter values which are not set in the configurations of the corresponding modules.

    defaults:
        - override /data: mind_rec_bert_sent.yaml
        - override /model: nrms.yaml
        - override /callbacks: default.yaml
        - override /logger: many_loggers.yaml
        - override /trainer: gpu.yaml
    data:
        dataset_size: "small"
    model:
        use_plm: False
        pretrained_embeddings_path: ${paths.data_dir}MINDsmall_train/transformed_word_embeddings.npy
        embed_dim: 300
        num_heads: 15

Advanced Configuration

The advanced scenario depicts a more complex experimental setting. Users cn overwrite from the main experiment configuration file any of the predefined module configurations. The following code snippet shows how to train a NRMS model with a PLM-based news encoder, and a supervised contrastive loss objective instead of the default settings.

    python newsreclib/train.py experiment=nrms_mindsmall_plm_supconloss_bertsent

This is achieved by creating an experiment configuration file with the following specifications:

    defaults:
        - override /data: mind_rec_bert_sent.yaml
        - override /model: nrms.yaml
        - override /callbacks: default.yaml
        - override /logger: many_loggers.yaml
        - override /trainer: gpu.yaml
    data:
        dataset_size: "small"
        use_plm: True
        tokenizer_name: "roberta-base"
        tokenizer_use_fast: True
        tokenizer_max_len: 96
    model:
        loss: "sup_con_loss"
        temperature: 0.1
        use_plm: True
        plm_model: "roberta-base"
        frozen_layers: [0, 1, 2, 3, 4, 5, 6, 7]
        pretrained_embeddings_path: None
        embed_dim: 768
        num_heads: 16

Alternatively, configurations can be overridden from the command line, as follows:

    python newsreclib/train.py experiment=nrms_mindsmall_plm_supconloss_bertsent data.batch_size=128

Features

Contributing

We welcome all contributions to NewsRecLib! You can get involved by contributing code, making improvements to the documentation, reporting or investigating bugs and issues.

Resources

This repository was inspired by:

Other useful repositories:

License

NewsRecLib uses a MIT License.

Citation

We did our best to provide all the bibliographic information of the methods, models, datasets, and techniques available in NewsRecLib to credit their authors. Please remember to cite them if you use NewsRecLib in your research.

If you use NewsRecLib, please cite the following publication:

@article{iana2023newsreclib,
      title={NewsRecLib: A PyTorch-Lightning Library for Neural News Recommendation},
      author={Iana, Andreea and Glava{\v{s}}, Goran and Paulheim, Heiko},
      journal={arXiv preprint arXiv:2310.01146},
      year={2023}
}

newsreclib's People

Contributors

andreeaiana avatar dependabot[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

newsreclib's Issues

Why the performance is very different to other paper?

Hi Andreea,

I notice that the model performance reported in your paper is very different to the performance in original paper.
For example, MINER (Li et al. 2019) got AUC=69.61 on MIND-small dataset but your reported performance is only AUC=51.2.
Compared to other work reproduced MINER model, this performance is much lower than others. For example, this paper reported that their reproduced MINER model got AUC of 63.88.
In general, most GeneralRec models in your Table 1 got AUC < 52.00, which are largely different to the performance reported in other papers.
Could you give any comments on this?

Can I run a job with pytorch distributed training?

Can I run a job with pytorch distributed training?
If I run this commend, does it work?
torchrun --nproc_per_node=$WORLD_SIZE --master_port=1234 newsreclib/train.py experiment=nrms_mindsmall_pretrainedemb_celoss_bertsent

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.