This repo contains the code for the implementation of Align MacridVAE. If you want to learn more about the paper you can check the article presented at ECIR 2024. This implements a multimodal recommender that can suggest items to users based on their preferences
The project is implemented in Torch and implements a shallow Variational Autoencoder with a pre-training step to Align image and textual representation.
About software, you will need the following tools:
Regarding harwate, this project is meant to run in NVIDIA GPUs, like the ones personal laptops, or in datacenters. It can also run on the CPU but it will be much slower. We tested it in V100, A100 Series and RTX 20 series. The model is relatively simple and small and we don't load larger models like CLIP, BERT or ViT during training or inference. Items are preprocessed before running through the model to simplify training.
First, install the requirements.txt
file which specifies the dependencies
pip install -r requirements.txt
Next fetch the datasets. The datasets are hosted in Kaggle here and it is available to download through the web UI or using the command line tools. For example, if you already have set up your Kaggle credentials.
# Optional, you can download the dataset through the website
kaggle datasets download ignacioavas/alignmacrid-vae
unzip alignmacrid-vae.zip -d RecomData/
rm alignmacrid-vae.zip
The dataset contains data from subcategories Amazon Dataset, Movielens 25M, Bookcrossing.. Those datasets were prepared by adding images and filtering missing items, and then passing textual and visual representation through and encoders like BERT, CLIP or ViT. You can learn more by reading the README.md in the dataset root directory. The preprocessing code for building the datasets is available at Align-MacridVAE-data,
Once you have the datasets downloaded, you can train a model by running the main.py
script with the train
argument. For example, to train the Amazon Musical Instruments dataset encoded with CLIP for visual and textual modality run the following command:
python main.py train --data Musical_Instruments-clip_clip
The training code will generate a file in the run/
directory with a name depending on the dataset and the model parameters, for example: Musical_Instruments-clip_clip-AlignMacridVAE-50E-100B-0.001L-0.0001W-0.5D-0.2b-7k-200d-0.1t-98765s
. The model.pkl
file contains the trained model.
Run python main.py --help
to see all available parameters.
To evaluate a given model, we can pass the test
mode. It will try to load a model from the run
directory provided it was already trained. For example, to evaluate the same model as above run the following command:
python main.py test --data Musical_Instruments-clip_clip