Advanced_Machine_Learning

Advanced Machine Learning 1 | Recommender System

aml

1. Trends in Today's Machine Learning

Smart Compose: autocomplete in gmail
AlphaGo: Machine Go player
Text Translation
- Pivot Translation: use English as the pivot for translation
- Zero-Shot Translation: Multilingual Neural Engine
Text Generation
Natural Language Processing
Google Lens
Now Playing
Google Duplex
Transfer Learning: improvement of learning in a new task through the transfer of knowledge from a related task that has already been learned (e.g. works for NLP and CV)
Clickthrough Rate (CTR) Predictions
Recommender Systems (e.g. Netflix, Quora, etc.)

2. Recommender System

(1) The Definition of Recommender System

Application that provide to users personalized recommendations about content they may be interested in.

(2) Types of Recommender System

Content-based System
Content-based System uses the similarity between items to recommend items similar to what a user likes.
- Create content features for every item
- Analysis of content attributes of items
Collaborative Filtering
Collaborative filtering uses similarities between users and items simultaneously to provide recommendations.
- Uses past user behaviors

(3) Types of User Feedbacks

Explicit Feedback: Direct ratings or reviews
Implicit Feedback: User preference by actions (e.g. purchases, clicks, navigations history)

(4) The Definition of Utility Matrix

The utility matrix is a matrix used for showing the degree of preference that as user has for an item.


1
            item A    item B   item C   item D
2
user 1        1                           3
3
user 2                  4        4
4
user 3                  2                 2
5
user 4        2                  5

We can also write this matrix in a tabular form,


xxxxxxxxxx
8
1
user 1      item A      1
2
user 1      item D      3
3
user 2      item B      4
4
user 2      item C      4
5
user 3      item B      2
6
user 3      item D      2
7
user 4      item A      2
8
user 4      item C      5

The downside of this utility matrix is that the data is always very sparse with only 1% valid information. Also, the number of columns (or the number of items) can be very large so that finding matches to less popular items could be difficult.

(5) Two Types of Collaborative Filtering Systems

Memory Based: memorize the utility matrix and make recommendatiions based on the relationship based on the KNN algorithm
Model Based: fit a parameterized model to give utility matrix and then make recommendations based on that model. The common method of fitting this parameterized model is called matrix factorization or UV decomposition.

(6) The Definition of Latent Factors

The latent variables are variables that can not be observed, but they can be inferred from other variables that are observed.

(7) Utility Matrix Factorization

$Y$ $K$ $U$ $M \times K$ $M$ $V$ $N \times K$ $N$ is the number of items). So we have,

Y \approx \hat{Y} = UV^T

$K = 2$ latent factors with 7 users and 5 items. Then the following matrix fractorization can be give through our definition.

Screen Shot 2022-01-27 at 1.00.47 AM

(8) Matrix Fractorization Optimization

To optimize the matrix fractorization, we can construct the loss function as,

Loss = \text{training error} = \frac{1}{N} \sum_{(i, j): r_{i j}=1}\left(y_{i j}-u_{i} v_{j}\right)^{2}

$r_{ij} = 1$ $i$ $j$ $r_{ij} = 0$ .

Therefore, we have to minimize the loss for optimization.

\min _{u_{i}, v_{j}} Loss

Here, the method we are going to use is the gradient descent, which is generally,

w \leftarrow w + \eta\nabla Loss

We have talked about different gradient methods in this article and please review them if you need. Notice that momentum gradient descent is a simple by efficency method that we can use for this optimization. The moving average of the current gradient and the history gradients are defined as,

v \leftarrow \gamma v+(1-\gamma) \nabla Loss

Then,

w \leftarrow w + \eta v

The gradients for the loss function we have above are,

\begin{aligned} &\frac{\partial E}{\partial u_{i k}}=-\frac{2}{N} \sum_{j: r_{i j}=1}\left(y_{i j}-u_{i} \cdot v_{j}\right) v_{j k} \\ &\frac{\partial E}{\partial v_{j k}}=-\frac{2}{N} \sum_{i: r_{i j}=1}\left(y_{i j}-u_{i} \cdot v_{j}\right) u_{i k} \end{aligned}

Which can also be written to the matrix form of

\begin{aligned} &\frac{\partial E}{\partial U}=-\frac{2}{N} \Delta \cdot V \\ &\frac{\partial E}{\partial V}=-\frac{2}{N} \Delta^{T} \cdot U \end{aligned}

$\Delta$ is defined as,

\Delta=\left(Y-U \cdot V^{T}\right) \otimes R

$\otimes$ means element-wise multiplication.

(9) Predictions and Validations

$i$ $j$ based on the model we have fitted,

\text{prediction}_{ij} = \hat{y}_{ij} = u_{i} v_{j}

Based on the loss function above, we have known that the training error is,

\text{training error} = \frac{1}{N} \sum_{(i, j):\ r_{i j}=1}\left(y_{i j}-u_{i} v_{j}\right)^{2}

$r_{ij} = 0$ ). Then with this new validation set, we can calculate the validation error as,

\text{validation error} = \frac{1}{N} \sum_{(i, j):\ val_{i j}=1}\left(y_{i j}-u_{i} v_{j}\right)^{2}

$val_{ij} = 1$ $i$ $j$ in the validation set.