Smart Compose: autocomplete in gmail
AlphaGo: Machine Go player
Text Translation
Text Generation
Natural Language Processing
Google Lens
Now Playing
Google Duplex
Transfer Learning: improvement of learning in a new task through the transfer of knowledge from a related task that has already been learned (e.g. works for NLP and CV)
Clickthrough Rate (CTR) Predictions
Recommender Systems (e.g. Netflix, Quora, etc.)
(1) The Definition of Recommender System
Application that provide to users personalized recommendations about content they may be interested in.
(2) Types of Recommender System
Content-based System
Content-based System uses the similarity between items to recommend items similar to what a user likes.
Collaborative Filtering
Collaborative filtering uses similarities between users and items simultaneously to provide recommendations.
(3) Types of User Feedbacks
(4) The Definition of Utility Matrix
The utility matrix is a matrix used for showing the degree of preference that as user has for an item.
1item A item B item C item D
2user 1 1 3
3user 2 4 4
4user 3 2 2
5user 4 2 5
We can also write this matrix in a tabular form,
xxxxxxxxxx
81user 1 item A 1
2user 1 item D 3
3user 2 item B 4
4user 2 item C 4
5user 3 item B 2
6user 3 item D 2
7user 4 item A 2
8user 4 item C 5
The downside of this utility matrix is that the data is always very sparse with only 1% valid information. Also, the number of columns (or the number of items) can be very large so that finding matches to less popular items could be difficult.
(5) Two Types of Collaborative Filtering Systems
(6) The Definition of Latent Factors
The latent variables are variables that can not be observed, but they can be inferred from other variables that are observed.
(7) Utility Matrix Factorization
In order to make predictions of the ratings, we have to decompose the utility matrix . Let's suppose we have latent factors and the user matrix is an matrix (where is the number of the users), and is an matrix (where is the number of items). So we have,
For example, suppose we have latent factors with 7 users and 5 items. Then the following matrix fractorization can be give through our definition.
(8) Matrix Fractorization Optimization
To optimize the matrix fractorization, we can construct the loss function as,
Where if the user rated item . Otherwise, .
Therefore, we have to minimize the loss for optimization.
Here, the method we are going to use is the gradient descent, which is generally,
We have talked about different gradient methods in this article and please review them if you need. Notice that momentum gradient descent is a simple by efficency method that we can use for this optimization. The moving average of the current gradient and the history gradients are defined as,
Then,
The gradients for the loss function we have above are,
Which can also be written to the matrix form of
Where is defined as,
And means element-wise multiplication.
(9) Predictions and Validations
To make a prediction about the rating of user to item based on the model we have fitted,
Based on the loss function above, we have known that the training error is,
Then, suppose we are given a validation set with some user ratings not existing in training set (means ). Then with this new validation set, we can calculate the validation error as,
Where means the data point of user rated item in the validation set.