Click here to Skip to main content
15,884,388 members
Articles / Programming Languages / C#
Article

Collaborative Filters - How to build a recommendation system

Rate me:
Please Sign up or sign in to vote.
4.78/5 (12 votes)
6 Jul 2011CPOL3 min read 58.5K   2.7K   38   10
An overview on how to implement a recommendation system with C# using the Pearson Correlation Coefficient.

Introduction

Ever browsed Amazon and noticed the "Customers That Bought This Item Also Bought These Items" list, or after you've selected a movie on Netflix, recommendations for similar flicks? The ability to find trends in the browsing habits and choices of users has become a must for many customer facing websites. Collaborative Filters are the magic that makes these features possible. In this article, we'll explore one popular implementation of a recommendation system, how it works, and how to incorporate it into your project.

Background

The algorithm we're going to be using to find the trends in our data is called the Pearson Correlation Score. The reason for this choice is simple - it effectively can handle and balance un-normalized data. For example, movie reviews tend to be more neutral when done by a typical user than a critic. The code samples will be written in .NET 3.5 SP1, and we will be using LINQ to traverse parts of the data set. These examples will be written as a console application in order to keep the code snippets simple and to make the overall concepts easier to grasp.

Using the code

First things first, we'll need data structures to hold the recommendations of each product:

C#
public class Recommendation     
{
    public string Name { get; set; }
    public double Rating { get; set; }
}

For simplicity, we'll keep our products in a dictionary with the product name as the key and a List<Recommendation> as the value.

C#
Dictionary<string, List<Recommendation>> productRecommendations = 
           new Dictionary<string, List<Recommendation>>();

In a real system, data would exist in a database, but for this example, we'll hard code our data into the dictionary:

C#
List<Recommendation> list = new List<Recommendation>();
list.Add(new Recommendation() { Name = "Wile E Coyote", Rating = 4.5 });
list.Add(new Recommendation() { Name = "Bugs Bunny", Rating = 2.5 });
list.Add(new Recommendation() { Name = "Elmer Fudd", Rating = 5.0 });
list.Add(new Recommendation() { Name = "Foghorn Leghorn", Rating = 2.0 });
productRecommendations.Add("ACME Industrial Rocket Pack", list);

list = new List<Recommendation>();
list.Add(new Recommendation() { Name = "Wile E Coyote", Rating = 5.0 });
list.Add(new Recommendation() { Name = "Bugs Bunny", Rating = 3.5 });
list.Add(new Recommendation() { Name = "Elmer Fudd", Rating = 1.0 });
list.Add(new Recommendation() { Name = "Foghorn Leghorn", Rating = 3.5 });
list.Add(new Recommendation() { Name = "Daffy Duck", Rating = 1.0 });
productRecommendations.Add("ACME Super Sling Shot", list);

list = new List<Recommendation>();
list.Add(new Recommendation() { Name = "Wile E Coyote", Rating = 1.0 });
list.Add(new Recommendation() { Name = "Bugs Bunny", Rating = 3.5 });
list.Add(new Recommendation() { Name = "Elmer Fudd", Rating = 5.0 });
list.Add(new Recommendation() { Name = "Foghorn Leghorn", Rating = 4.0 });
list.Add(new Recommendation() { Name = "Daffy Duck", Rating = 4.0 });
productRecommendations.Add("ACME X-Large Catapult", list);

list = new List<Recommendation>();
list.Add(new Recommendation() { Name = "Bugs Bunny", Rating = 3.5 });
list.Add(new Recommendation() { Name = "Elmer Fudd", Rating = 4.0 });
list.Add(new Recommendation() { Name = "Foghorn Leghorn", Rating = 5.0 });
list.Add(new Recommendation() { Name = "Daffy Duck", Rating = 2.5 });
productRecommendations.Add("ACME Super Glue", list);

list = new List<Recommendation>();
list.Add(new Recommendation() { Name = "Wile E Coyote", Rating = 4.5 });
list.Add(new Recommendation() { Name = "Bugs Bunny", Rating = 5.0 });
list.Add(new Recommendation() { Name = "Foghorn Leghorn", Rating = 3.0 });
productRecommendations.Add("ACME Jet Powered Roller Skates", list);

Here, we have a collection of products, and each product has a collection of reviews. Each review has a value from 1 to 5 describing how well they liked the product. This value can be based on what you are trying to profile. For example, you could use a value of 1 to represent a user that bought a particular product (with a 0 showing users who did not purchase a product). As long as we have a way to convert the data to a numerical value, the data can be analyzed.

The ability to recommend products is accomplished by being able to find similarities between them. To do that, we use the Pearson Correlation Score. The Correlation Score is a measure of how well two sets of data fit on a straight line. One interesting aspect of the Pearson Score is that it corrects for grade inflation. That is, if one product has consistently higher scores than another, there can still be a perfect correlation — if the difference between the ratings is consistent.

The code for this algorithm:

  1. first finds the reviewers that reviewed both products. It then
  2. computes the sums and the squared sums of the ratings for the two products, and
  3. computes the sum of the reviews of the products. Finally,
  4. it uses these results to compute the Pearson Correlation Score, shown in the code below:
C#
List<Recommendation> shared_items = new List<Recommendation>();

// collect a list of products have have reviews in common
foreach (var item in productRecommendations[product1])
{
    if (productRecommendations[product2].Where(x => x.Name == item.Name).Count() != 0)
    {
        shared_items.Add(item);
    }
}

if (shared_items.Count == 0)
{
    // they have nothing in common exit with a zero
    return 0;
}

// sum up all the preferences
double product1_review_sum = 0.00f;
foreach (Recommendation item in shared_items)
{
    product1_review_sum += productRecommendations[product1].Where(
             x => x.Name == item.Name).FirstOrDefault().Rating;
}

double product2_review_sum = 0.00f;
foreach (Recommendation item in shared_items)
{
    product2_review_sum += productRecommendations[product2].Where(
             x => x.Name == item.Name).FirstOrDefault().Rating;
}

// sum up the squares
double product1_rating = 0f;
double product2_rating = 0f;
foreach (Recommendation item in shared_items)
{
    product1_rating += Math.Pow(productRecommendations[product1].Where(
             x => x.Name == item.Name).FirstOrDefault().Rating, 2);

    product2_rating += Math.Pow(productRecommendations[product2].Where(
             x => x.Name == item.Name).FirstOrDefault().Rating, 2);
}

//sum up the products
double critics_sum = 0f;
foreach (Recommendation item in shared_items)
{
    critics_sum += productRecommendations[product1].Where(
            x => x.Name == item.Name).FirstOrDefault().Rating *
    productRecommendations[product2].Where(
           x => x.Name == item.Name).FirstOrDefault().Rating;
}

//calculate pearson score
double num = critics_sum - (product1_review_sum * 
  product2_review_sum / shared_items.Count);

double density = (double)Math.Sqrt((product1_rating - 
  Math.Pow(product1_review_sum, 2) / shared_items.Count) * 
  ((product2_rating - Math.Pow(product2_review_sum, 2) / shared_items.Count)));

if (density == 0)
    return 0;

return num / density;

This algorithm will return a value between -1 and 1, where 1 means the two products have exactly the same ratings. One thing to note is that this algorithm only compares two products, so to calculate scores for our entire catalog, we have to loop through each product, calculating the score:

C#
// grab of list of products that *excludes* the item we're searching for
var sortedList = productRecommendations.Where(x => x.Key != name);

sortedList.OrderByDescending(x => x.Key);

List<Recommendation> recommendations = new List<Recommendation>();

// go through the list and calculate the Pearson score for each product
foreach (var entry in sortedList)
{
    recommendations.Add(new Recommendation() { Name = entry.Key, 
                        Rating = CalculatePearsonCorrelation(name, entry.Key) });
}

return recommendations;

As we can see from the given scores, users who were looking at the Industrial Rocket Pack would also be interested in a pair of Jet Powered Roller Skates.

Image 1

History

  • v1.0: Initial release.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
President Forge Ventures
United States United States
A technologist and entrepreneur, Joel is the president and founder of Forge Ventures. Founded in 2009, Forge Ventures started with a simple goal: to improve the way people run their business. Many businesses are started with very simple tools and processes...sometimes as simple as a whiteboard or spreadsheet to manage mission critical aspects of their company. But as your business grows, the very tools that got you off the ground could actually be holding you back, preventing you from reaching your full potential. We believe that just because something isn't broken doesn't mean it can't be improved.

Comments and Discussions

 
QuestionUser based similiarity using Pearson Correlation Pin
Marco Antonio Enciso Salinas30-May-16 8:39
Marco Antonio Enciso Salinas30-May-16 8:39 
SuggestionWriting CalculatePearsonCorrelation in Linq Pin
arjanhulshoff20-May-14 2:17
professionalarjanhulshoff20-May-14 2:17 
GeneralRe: Writing CalculatePearsonCorrelation in Linq Pin
Joel Cogley20-May-14 5:36
Joel Cogley20-May-14 5:36 
GeneralRe: Writing CalculatePearsonCorrelation in Linq Pin
arjanhulshoff20-May-14 23:19
professionalarjanhulshoff20-May-14 23:19 
Suggestion- Pin
davynaessens8-Jul-13 1:20
davynaessens8-Jul-13 1:20 
GeneralRe: Nice Job.. a similar project is available @ wesiem.be Pin
Marco Antonio Enciso Salinas30-May-16 12:24
Marco Antonio Enciso Salinas30-May-16 12:24 
GeneralRe: Nice Job.. a similar project is available @ wesiem.be Pin
Nguyễn Thành Thịnh12-Jan-17 8:48
Nguyễn Thành Thịnh12-Jan-17 8:48 
QuestionThanks! Pin
alexandrelima.neo26-May-12 10:15
alexandrelima.neo26-May-12 10:15 
QuestionList productRecommendations Pin
pian_christian9-Nov-11 20:39
pian_christian9-Nov-11 20:39 
Hi Joel,

I still dont understand List productRecommendations. Why any object item (in this case movie title) separated in productRecommendations ? What the meaning of productRecommendations ?

Can we say productRecommendations like movie genre ?

Tks.
GeneralMy vote of 5 Pin
Filip D'haene6-Jul-11 12:54
Filip D'haene6-Jul-11 12:54 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.