Recommender system algorithms implemented in Python. It includes four different recommendation algorithms: Matrix Factorization model, User-Based Collaborative Filtering, Item-Based Collaborative Filtering, and Dimensionality Reduction Collaborative Filtering.
conda install numpy pandas scikit-learn scipy
- Matrix Factorization
- User-Based Collaborative Filtering
- Item-Based Collaborative Filtering
- Dimensionality Reduction Collaborative Filtering
The Matrix Factorization model is a collaborative filtering technique used for recommender systems. This algorithm uses the idea of machine learning to learn latent factors which captures the underlying patterns of user-item interactions.
-
Initialization: The model takes as input a
$m \times n$ user-item rating matrix$R$ where rows represent users, columns represent items, and the values are user ratings. -
Factorization: The user-item rating matrix
$R$ is factorized into a lower-dimensional$m \times k$ user matrix$U$ , and an$n \times k$ item matrix$V$ , where$k \ll \min{m, n}$ . These matrices will be initialized with random values.$$R \approx UV^T$$ -
Training: The goal of optimazation is to minize error J:
$$J = \frac{1}{2} ||R - UV^T||^2$$ Since not all items are rated in the matrix R, we only consider the items already been rated. For each observed rating in the training data, the model computes a predicted rating by taking the dot product of the corresponding user$u_i$ and item$v_j$ latent factor vectors.The predicted rating of user$i$ to item$j$ is denoted as$\hat{r}_{ij}$ .
Loss function:
Calculate the Gradient for
Update
-
Evaluation: The model's performance is evaluated using metrics such as Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) by comparing predicted ratings to actual ratings.
$$\text{RMSE} = \sqrt{\frac{1}{n} (R - \hat{R})^2}$$ $$\text{MAE} = \frac{1}{n} |R - \hat{R}|$$
User-Based Collaborative Filtering computes recommendations based on user similarity. then recommends items that those similar users have rated highly.
- Calculate Pearson Similarity to find users with similar tastes to the target user, this is based on the items they have both rated.
-
The k-nearest neighbors
$N(u)$ for a target user$u$ are found based on the Pearson similarity scores. -
Predict the potential ratings. Since different users may have different rating scale, ratings are normalized first.
$$\hat{r_{ui}} = \mu_u + \frac{\sum_{v \in N(u)} \text{similarity}(u, v) \cdot (r_{vi} - \mu_v)}{\sum_{v \in N(u)} |\text{similarity}(u, v)|}$$
Item-Based Collaborative Filtering Item-Based Collaborative Filtering focuses on item similarity. It identifies items similar to those the user has interacted with and recommends items that are related to the user's past preferences.
-
Calculate Adjusted Cosine Similarity between two items,
$i$ and$j$ , this is based on the users who have rated both items.$${similarity}(i, j) = \frac{\sum_{u}(r_{ui} - \mu_i)(r_{uj} - \mu_j)}{\sqrt{\sum_{u}(r_{ui} - \mu_i)^2} \sqrt{\sum_{u}(r_{uj} - \mu_j)^2}}$$ -
The k-nearest neighbors
$N(i)$ for a target item$i$ are found based on the Adjusted Cosine Similarity. -
Predict the potential ratings.
$$\hat{r_{ui}} = \frac{\sum_{j \in N(i)} \text{similarity}(i, j) \cdot r_{uj}}{\sum_{j \in N(i)} |\text{similarity}(i, j)|}$$
Dimensionality Reduction techniques such as Singular Value Decomposition (SVD), reduce the dimensionality of the user-item interaction matrix. This can help improve recommendation quality and reduce computational complexity.
-
Fill missing entries in the ratings matrix
$R$ with the mean ratings$\mu$ . This results in a filled matrix$F$ of the same size as$R$ . -
Perform Singular Value Decomposition (SVD) on the filled matrix
$F$ . SVD decomposes$F$ into three matrices:$U$ ,$S$ , and$V^T$ :$$F = U \cdot S \cdot V^T$$
-
$U$ is an$M \times D$ matrix representing user latent factors. -
$S$ is a diagonal$D \times D$ matrix representing singular values in decreasing order. -
$V^T$ is a$D \times N$ matrix representing item latent factors.
To reduce the dimensionality to