Download source files - 29.5 Kb
Do you find the "customers that bought this item also bought these items" feature of sites such as Amazon.com useful? Would you like to add this capability to your e-commerce web site in order to increase cross-sell opportunities? Perhaps you are just interested in recommending pages to visitors browsing your site. Regardless of the specific type of item you want to recommend or cross-sell, the technique can be the same. There are many algorithms that could be leveraged to implement a recommender system. In this article, I will describe one such implementation based on a straight-forward graph structure and a simple algorithm.
Armed with the content of this article and the downloadable source code, you will be able infuse a simple recommender algorithm into your web site.
This article is organized as follows:
Recommending systems are the bread and butter of large e-commerce web sites. Potential customers are presented with highly relevant product recommendations based on their demonstrated likes; thus increasing the probability of purchase. Customers purchasing one or more items are presented with like-items increasing the likelihood of cross-selling. And returning customers are retained by the quality of recommendations based on their previous purchases. In general, the goal of any recommendation system is to present users with a highly relevant set of items.
There are two general types of recommendation algorithms. One is based on the similarity between the currently active user and previous users. The similarity is usually computed using information about the items each user demonstrated interest (or purchased). The other type is based on the similarity of items selected by the current user to other items. Usually described as user-based and item-based collaborative filtering, respectively, these algorithms differ in the data collected and the associations formed between the data items.
Both types of systems can be further classified as either memory- or model-based collaborative filters. The memory-based approach relies on the entire database of observations being in memory when a recommendation is requested. The request is then satisfied by searching for neighbors who exhibit similar behavior and then combining the preferences of these neighbors into a set of ranked items. The model-based systems rely on building a model offline (usually by means of a machine learning algorithm) and then using the learned model to classify the user and recommend relevant items.
The Collaborative Network Library described in this article is a simplistic item-centric, memory-based, collaborative filtering algorithm. It is not a probabilistic approach and is solely based on observed associations between pairs of items; thus, it does not handle unobserved items or features. This means that if the user selects an item that has not been a selection of any previous users, then the system will not be able to produce a recommendation. For an item to be part of a recommendation, it must have been associated with other items by one or more previous users.
There are two major aspects to a memory-based collaborative filter. The first involves updating the in-memory representation of user interaction with your products or site pages. The second is to obtain a recommendation. This is no different in the case of the Collaborative Network Library described in this article. Both steps are achieved via a single class. The primary interaction between your code and the library will be via the
ItemGraphSingleton class in the
Platypus.Collaborator namespace. The action of updating the item-based graph is accomplished by invoking the
void AddLink( string strSource, string strTarget) method, where
strSource is the previously viewed product and
strTarget is the newly requested/viewed/selected item.
In the case of the Collaborative Network Library, the in-memory data collected is based on the sequence in which items are viewed or purchased. In order to capture any meaningful association, a user must view or purchase more than one item. However, in order to mitigate this requirement you could always provide a dummy start point for all associations. For example, if a user visits your site and views or purchases a single item name 'foo product', you could create a link between a dummy item (say, 'home') and the product purchased. This allows the library to recommend items from the beginning of any user session.
Obtaining a recommendation is done via the
Item NextItems( string strItem, int nTopN ) method. It returns an array of, at most
nTopN Item objects in descending ranking order. These are instance methods, so they must be accessed via the static
ItemGraphSingleton.GetInstance() method. These are the two primary operations on the library.
In addition to the primary tasks of updating the graph and retrieving recommendations, one must also persist the graph to a durable media since all updates are in-memory. This can be achieved in many different ways. For example, we could persist the graph to an SQL Server database or to the disk. The current implementation (Version 1.0) supports persistence to disk using the .NET binary serialization mechanism. For this purpose the utility class
ItemGraphSerializer is provided. This class has two overloaded serialization and deserialization methods. More details on this class are provided in the next section, but suffice it to say here that, these methods can be invoked at regular time intervals in order to persist the latest state of the graph to a file on disk.
How to use the library
Now that we have covered the general, let's get into the specifics. In this section, I will show you how to incorporate the Collaborative Network Library into your web site in a minimally intrusive manner.
Let's start by the initialization code. In the context of a web site, you might want to load a previously serialized version of the graph in the application startup code. This can be done in the
Global.Application_Start method as such:
fileName = Server.MapPath(@"\collab_graph.graph");
if( File.Exists( fileName ) )
sr = Platypus.Collaborator.ItemGraphSerializer.Deserialize(fileName);
if( !sr.Succeeded )
Correspondingly, one could serialize the graph to disk on the application end event:
protected void Application_End(Object sender, EventArgs e)
fileName = Server.MapPath(@"\collab_graph.graph");
Platypus.Collaborator.ItemGraphSerializer.Serialize( fileName );
With this code in place, the graph is serialized to disk every time the application shuts down and deserialized on start up. Of course, you could decide on a different serialization schedule; for example, when each session terminates.
With serialization out of the way, we can move to the interesting part of the library - obtaining recommendations and updating the graph. In the Library Overview section I briefly described the two methods involved in this process. Using these two methods in your application is simple. The
AddLink(string strSource, string strTarget) method takes two parameters, a source and a target item. The source is the item previously selected by the current user. The target is the item the user has just requested. By calling this method with these two parameters we are instructing the graph library to update the directed association from item
strSource to item
strTarget, or to create it if the association has not been observed before. This implies that you are maintaining session information about the current user, a common practice in e-commerce sites. However you are maintaining the state of the current user, you will need to identify the sequence in which items are selected.
In a typical scenario, you will respond to a user's choice to select an item via some server-side event handler. The action of updating the graph and obtaining a recommendation can be combined in such a handler. Let's assume that you are using a
DataGrid to display recommended items and that the user's previous selection is stored in session. The following code demonstrates how to use the
AddLink() method and obtain a set of recommendations:
protected void UpdateGraphAndRecommendNextItems(string strPrev,string strNext )
ItemGraphSingleton.GetInstance().AddLink( strPrev, strNext );
Item items = ItemGraphSingleton.GetInstance().NextItems(strNext , 5 );
dgRecommendedItems.DataSource = items;
And that's all there is to it.
So far I have given you a broad overview of collaborative filtering and a bird's eye view of a simple implementation. In this section I will go into inner workings of the library and the assumptions I made about how it would be used.
First, let us look at the static structure of the library. The following UML diagram depicts the relationship between the classes in the library.
The first thing to notice is that there is an
ItemGraph and an
ItemGraphSingleton class. The singleton is a simple wrapper around the
ItemGraph class. It enforces a single instance of the
ItemGraph to exist in your application domain. You could, if you chose to, instantiate an
ItemGraph object and manage its lifetime yourself. The singleton is provided for convenience.
Once the concept of generics is incorporated in the .NET framework (version 2.0 due out this year) this design could be improved by using a generic singleton template whose type is
One important aspect of the
ItemGraph class is that, since its instance is shared across multiple sessions, its operations must be thread-safe. To this end, you will see that the
void AddLink() method synchronizes the process of updating the internal structure of the graph. Internally, the graph is represented as an adjacency list . The
ItemGraph object holds a collection of
Item objects added via the
AddLink() method. As new items are encountered, they are added to this collection. The collection is a strongly-typed and is based on the
System.Collections.CollectionBase class. So how exactly is this a graph? Each
Item object has a collection of
DirectedEdge objects. A
DirectedEdge object has a weight and contains a reference to another
Item. As the name implies each
DirectedEdge object represents a directional link between the containing
Item and the
Item referenced by the
DirectedEdge (property of type
Weight property represents the number of times an association between these two items has been added. In the context of an e-commerce site this could represent the number of times customers have shown interest in the source item followed by the target item. The following figure demonstrates graphically a hypothetical instance of the graph.
In this hypothetical graph, the first item in the
ItemGraph collection of items would be
Item A. This
Item object would have a collection of three
DirectedEdge objects pointing to Items
E, with weights 3, 2, 3 respectively.
The current version has a very simplistic approach for recommending items. It finds the requested
Item in the Graph and then enumerates its outward links, up to the requested number. A potentially better approach, especially early in the evolution of the graph, would be to implement a depth traversal algorithm to find the 'heaviest' path with the number of requested items. For example, if the requested item has only 2 outward links and the request is for 4, the algorithm would inspect the children of the 2 outward links and find the heaviest of them. This approach can include a traversal cost that reduces the weight of outbound links based on the number of steps away from the requested node. This adjusted weight would represent the indirect association between the target in the link and the requested node based on their separation.
The remaining graph-related aspects of the library are fairly simple. A cursory inspection of the source code would yield a much better understanding than the written word.
A brief description of the serialization code is in order. The class
ItemGraphSerializer encapsulates the hydration and dehydration of the graph to and from a file stream. This process leverages the .NET binary serialization mechanism. The
ItemGraphSerializer.Serialize() method takes a file name and a reference to an
ItemGraph object. If you are interacting with the graph via the
ItemGraphSingleton class, then it provides a public property
Graph to access its private
ItemGraph instance. When this method is invoked, a file stream (
System.IO.FileStream) is opened with
FileAccess.Write. Then a
BinaryFormatter is used to serialize the
ItemGraph object to this stream. Deserialization is similarly performed using a
BinaryFormatter via a
ItemGraphSerializer.Deserialize() supports this operation. This method takes a reference to an
ItemGraph variable to which it deserializes the content of the file. A
SerializationResult structure is returned which specifies the results of this process. If a failure is encountered, the structure's
Succeeded boolean member will be set to
In this article, I have introduced a simple collaborative filtering recommending system using a graph-centric design. It is simple to use and does not require any significant effort to integrate into a .NET web site. It offers a file-based persistence model which can be easily modified to be SQL Server based.
This library can be used to recommend products in an e-commerce site or to help users browse relevant pages or content on a site. It generically evolves weighted relationships between pairs of items forming a cyclical associative graph.
A simple recommendation algorithm is included which simply returns the items associated or referenced from the selected item in the graph.
Future improvements to the recommending algorithm were alluded to in the previous section. A subsequent release of this product will include code to write out graph visualization data. To learn more about this future enhancement visit our site.
- More Effective C++: 35 New Ways to Improve Your Programs and Designs, Scott Meyers.
- Item-based Collaborative Filtering Recommendation Algorithms; Sarwar et al.
- Introduction To Algorithms; Cormen, Leiserson and Rivest.