Collaborative Filtering Quick Start

Collaborative Filtering (CF) technology is most widely used in developing a recommendation engines. There are two types of CF: 1) user-based, and 2) item-based. When encountering this type of problem, normally you have a n*m matrix, where you have n users, and m items. The value in the matrix shows one user’s rating on specific item. The key idea of CF is to build a similarity matrix between users. Or build a similarity matrix between items.  In order to predict the rating of given item from given user, we first choose the top-N similar users (or items). In the user-based CF, we predict the rating based other similar users who have already rated this item . In the item-based CF, we predict the rating of this item based on other similar items that are rated by such user. The similarity matrix is normally built based on Pearson Correlation, Cosine Similarity, or other methods. More details of such methods can be found on: files.grouplens.org/papers/FnT%20CF%20Recsys%20Survey.pdf

Couple tools for one to build CF in python: graphlab, crab, pysuggest, etc. In R, recommenderlab is one of the tool that can be used. Here is a quick example for using recommenderlab:


library("recommenderlab")
m<-matrix(sample(c(as.numeric(0:5), NA), 50, replace=TRUE, prob=c(rep(.4/6,6),.6)), ncol=10,
dimnames=list(user=paste("u", 1:5, sep=''),item=paste("i", 1:10, sep='')))

r <- as(m, "realRatingMatrix")
getRatingMatrix(r)
r_m<-normalize(r)
getRatingMatrix(r_m)

#User-based CF
r_ubcf<-Recommender(r, method = "UBCF")
recom <- predict(r_ubcf, r, n=5, type="ratingMatrix")
getRatingMatrix(recom)
#item-based CF
r_ibcf<-Recommender(r, method = "IBCF")
recom2 <- predict(r_ibcf, r, n=5, type="ratingMatrix")
getRatingMatrix(recom2)

This entry was posted in Data Mining. Bookmark the permalink.

Leave a comment