Latent Factor based method in collaborative filtering

Rabin Poudyal
4 min readJun 27, 2018

To identify hidden factors that affect the user’s preferences. This is the most widely used collaborative filtering approach.

In content based filtering approach, we feed user’s history and explicitly defined factors to map all products and users to those factors space. But in latent factor based method, we only feed user’s history and we not need to define descriptors or factors. The algorithm will find the hidden factors that influence the user’s preference like brand of product, price of product and so on for us. Here, no product description is required, only user’s history is good.

We have a Ratings matrix R that contains users item ratings. Now the above matrix R is decomposed or factorized into two smaller matrices P and Q.

In first matrix P, is each row is described by the user’s interest in hidden factors F1, F2 and F3. In product factor matrix Q, each column is the product described by the hidden factors.

R -> P * Q

This type of method in which large matrix is decomposed into smaller matrices is called Matrix Factorization and it was invented and popularized by Netfilx prize winners. This method boosted performance of recommendation systems from old methods which were mostly neighbourhood based. This decomposition is similar to what it happens during Principle Component Analysis(PCA) or Singular Value Decomposition(SVD) in linear algebra. The objective of these kind of decomposition is to:

  1. Find latent factors
  2. Reduce the dimensions

Imagine you have the huge users items rating matrix R and you performed dimensionality reduction to the number of dimensions that are hidden factors. You can use PCA or Singular Value Decomposition to do this. The problem of doing this in our users item rating matrix is that we don’t know all the ratings for all the users for all the products. The matrix R is very sparse. PCA would only work if R did not have any missing values. Those missing values are what we are going to solve.

To solve this problem, we first fill in some expected values in those missing values. Then perform the matrix factorization.

Say r u,i is the rating of user u for item i. Then this can be factorized as

r u,i = p u . q i

Here, p u is the uth row of User Factor matrix and q i is the qth column of Product Factor matrix. We can do this for all the ratings in the matrix R. Then we will have set of equations by solving which we will get matrices p and q. Then we can fill in any missing values for the users rating in matrix R. The resulting p and q can be used to solve any rating for any product by any user.

How to solve those set of equations?

r1,1 = p1 . q1

r2,1 = p2 . q1

…….

ru,i = pu . qi

…….

rn,n = pn . qn

Now these equations can be set as a optimization problem. Now we have some ratings available for products by users in matrix R. Now we solve the above equations for only those rating that exists(training set) to find vectors p and q. We find set of factor vectors Pu for each user u and Qi for each product i. Then we minimize the errors on the training set for that equation.

Here, we find all the p and q that minimize the error(difference between rating in training set and predicted rating). The predicted rating is the dot product of (transpose of p) and q vectors.We loop through entire training set for this.

What if we overfit our model by finding lot of hidden factors? If that happens, we can penalize the model by a regularization term.

If vectors Pu and Qi has higher number of factors F1,F2,F3 ….. Fn. The term in right will penalize the model and reduce the error.

To solve these optimization problems, we have two techniques:

  1. Stochastic Gradient Descent
  2. Alternating Least Squares

I will discuss more about these optimization techniques and recommendation systems in my next articles. If you like this post, don’t forget to clap the post and follow me on medium and on twitter.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Rabin Poudyal
Rabin Poudyal

Written by Rabin Poudyal

Software Engineer, Data Science Practitioner. Say "Hi!" via email: rabinpoudyal1995@gmail.com or visit my website https://rabinpoudyal.com.np

Responses (2)

Write a response