Unraveling the Netflix API - Part II - Accessing the Netflix Title Catalog

NetDave

5.00/5 (1 vote)

Sep 4, 2009

CPOL

10 min read

45526

470

Demonstrates how to access the Netflix Catalog resources

Introduction
Background
Title Search Results Structure
Search Results Parser
- Catalog Title Parser
Title Details Retrieval
Example Application
Conclusion
History

Introduction

In the first article in this series, I showed you the basics of constructing a signed request to search the Netflix catalog for titles that matched a given search term. In this second article I will demonstrate how to parse the search results, and how to request and parse the details for a specific title.

Background

This article will introduce you to:

Accessing the Netflix Catalog Resource using Signed requests
Understanding and parsing Netflix data documents

Title Search Results Structure

To describe the Netflix catalog I will start by analyzing the results of at title search. In the previous article, I showed you how to perform a title search and the result of that request was an XML document. The top-level document structure looks like this, as viewed in XML Notepad:

Right away it is obvious that several discrete variables and couple of objects will be needed to store the results of parsing this information. To make a long story short by illustration, the following figure shows examples of each of the various object types contained in the search results expanded to show their contents. These will be translated into local data storage structures by a parser, which will in turn be used for containing and displaying information in our client application.

The following data structures correspond to the objects in the preceding illustration:

Object	Description
link	Contains basic information about a data resource, with additional information available through a hyperlink.
catalog_title	Contains all the information about a title.
title	Contains a short and long form of a title's name.
box_art	Contains hyperlinks to three versions of the title's cover image.
category	Contains information about a title's MPAA rating or genre.

Rather than describe the mundane implementation of these data storage objects here, I will refer you to the contents of the NetflixParser.cs code module.

Search Results Parser

The parser is a rather simple implementation that uses the Microsoft XmlDocument class to do a lot of the work. The parser actually does double duty, parsing both the search results and the title details results. I'll describe the search results parsing first and describe the title details parsing when I cover the title information retrieval processing.

The search response parser takes its input directly from the HTTP response stream and loads it into an XmlDocument object. As shown in the following code, the parser then:

Skips past the preliminary information, which you will find documented in the Netflix API documentation if it isn't already obvious, looking for the first <catalog_title> node.
Processes each <catalog_title> node and inserts the resulting CatalogTitle object into a list that is returned to the caller upon completion.

private List<CatalogTitle> _titleList = new List<CatalogTitle>();

public bool ParseSearchResults(Stream str)
{
    XmlDocument xDoc = new XmlDocument();
    xDoc.Load(str);

    int rank = 0;
    _titleList.Clear();

    XmlNode xNode = xDoc.DocumentElement;
    if (xNode.Name != "catalog_titles" || !xNode.HasChildNodes)
        return false;	// not a catalog search result document

    foreach (XmlNode subNode in xNode)
    {
        if (subNode.Name == "catalog_title" && subNode.HasChildNodes)
        {
            CatalogTitle title = ParseSingleCatalogTitle(subNode);
            if (title != null)
            {
                title.rank = rank++;
                _titleList.Add(title);
            }
        }
    }
    return true;
}

Catalog Title Parser

The catalog title parser method, ParseSingleCatalogTitle, walks the <catalog_title> node extracting data into a CatalogTitle object. The extraction is quite simple; a switch statement identifies the data from an element or node and populates the CatalogTitle object. Note that there is a special case for Link objects that handles "title expansion", which I'll be describing in the next section.

Title Details Retrieval

At this point in this application development, there are three possible ways of retrieving details about a title. But before I tell you about that, I want to explain the options for detail retrieval.

The Link objects returned in the catalog search represent items that "link" to additional details and they contain three attributes:

href - A hyperlink that can be used as a request URL for directly obtaining the details associated with the Link object.
rel - A relative URL for the class of detail for the Link object.
title - A type name for the Link object.

The purpose and usage of these attributes are described in the following sections.

Title Expansion Requests

A "title expansion" request is a request that has an additional query string parameter named "expand" specified. The value of the parameter is one or more of the Link object title attributes. For example, a search request might produce a response that contains the following Link objects:

<link href="http://api.netflix.com/catalog/titles/movies/60030118/synopsis"
    rel="http://schemas.netflix.com/catalog/titles/synopsis" title="synopsis" />

<link href="http://api.netflix.com/catalog/titles/movies/60030118/cast" 
    rel="http://schemas.netflix.com/catalog/people.cast" title="cast" />

<link href="http://api.netflix.com/catalog/titles/movies/60030118/directors" 
    rel="http://schemas.netflix.com/catalog/people.directors" title="directors" />

To request "expansion" of these links in a subsequent request, you can include the following additional query string parameter in your request. Note that the values are a) comma delimited, and b) correspond to the "title" attributes of the <link> elements in the preceding example.

expand=synopsis,cast,directors

You will see this implemented in the code for the lvResults_DoubleClick handler as:

request.AddQueryParameter("expand", "synopsis,cast,directors");

The result of including this title expansion specification in the request is that the links in the response now include the details to which the links refer.

<link href="http://api.netflix.com/catalog/titles/movies/60030118/synopsis" 
    rel="http://schemas.netflix.com/catalog/titles/synopsis" title="synopsis">
  <synopsis><![CDATA[Title synopsis here...]]></synopsis>
</link>

<link href="http://api.netflix.com/catalog/titles/movies/60030118/cast" 
    rel="http://schemas.netflix.com/catalog/people.cast" title="cast">
  <people>

    <link href="http://api.netflix.com/catalog/people/77805" 
    rel="http://schemas.netflix.com/catalog/person" title="Christina Ricci"></link>
    <link href="http://api.netflix.com/catalog/people/75365" 
    rel="http://schemas.netflix.com/catalog/person" title="Bill Pullman"></link>
    <link href="http://api.netflix.com/catalog/people/65690" 
    rel="http://schemas.netflix.com/catalog/person" title="Cathy Moriarty"></link>
    <link href="http://api.netflix.com/catalog/people/44495" 
    rel="http://schemas.netflix.com/catalog/person" title="Eric Idle"></link>
    <link href="http://api.netflix.com/catalog/people/20048954" 
    rel="http://schemas.netflix.com/catalog/person" title="Malachi Pearson"></link>

    <link href="http://api.netflix.com/catalog/people/20013712" 
    rel="http://schemas.netflix.com/catalog/person" title="Ben Stein"></link>
    <link href="http://api.netflix.com/catalog/people/20022666" 
    rel="http://schemas.netflix.com/catalog/person" title="Don Novello"></link>
    <link href="http://api.netflix.com/catalog/people/20048955" 
    rel="http://schemas.netflix.com/catalog/person" title="Joe Nipote"></link>
    <link href="http://api.netflix.com/catalog/people/20015820" 
    rel="http://schemas.netflix.com/catalog/person" title="Joe Alaskey"></link>
    <link href="http://api.netflix.com/catalog/people/20015245" 
    rel="http://schemas.netflix.com/catalog/person" title="Brad Garrett"></link>

    <link href="http://api.netflix.com/catalog/people/20008532" 
    rel="http://schemas.netflix.com/catalog/person"
    title="Chauncey Leopardi"></link>
  </people>
</link>

<link href="http://api.netflix.com/catalog/titles/movies/60030118/directors" 
    rel="http://schemas.netflix.com/catalog/people.directors" title="directors">
  <people>
    <link href="http://api.netflix.com/catalog/people/140587" 
    rel="http://schemas.netflix.com/catalog/person" title="Brad Silberling"></link>

  </people>
</link>

There are several features to note in the preceding example:

The synopsis detail is returned in a ![CDATA[]] segment because the returned data contains embedded hyperlinks for references to cast members, directors, etc. Fortunately, the XmlDocument class handles unformatting this for us when it loads the XML document. However, you should be aware that the synopsis is not plain text, and you may wish to strip the embedded HTML before using it, as I do in the example code for this article.
Note that both the cast and directors links contain people objects. When parsing the expanded data you need to be aware of the context in which an object is returned, as in this case where it represents either cast members or directors.

Link Details Retrieval

The other option for detail retrieval is now quite simple to understand now that I have described how title expansion works. As noted earlier, the Link object contains a fully qualified URL in the href attribute that can be used in a separate request to obtain the same detail information that was returned by the title expansion in a single request.

Detail Retrieval Options

Now that I've explained title expansion and link detail retrieval, I'll return to the point I was making on the three potential options for retrieving the title details, which option I chose for the example code, and why.

We could have specified title expansion in the original search request to obtain the details of interest for every title returned.
We can use the information in the Link objects returned in the catalog search to fill in the details for each of the parts of the particular title of interest.
We can submit a new request for information about the one title we're interested in, specifying title expansion for the details we need.

The pros and cons of each of these methods are summarized in the following table:

Method	Pros	Cons
Search with title expansion	Retrieves detailed title information in a single request.	Returns additional unneeded data if only one title is of interest.
Link detail retrieval	Retrieves only the additional information needed.	Requires caching the catalog search results in order to obtain the links for the title of interest.
Title request with title expansion	Does not require caching the search results other than the title retrieval information.	Returns information that was already returned by the search request.

Example Application

Program Design

The example application consists of two activities:

The main application form is used to submit a title search request and to show the results of the search.
When a title is clicked on the search results on the main form, a separate dialog window is displayed that shows the details for the title.

NetflixRequest Class

The Netflix service requests are handled a little more elegantly than in the previous article in this series. The features of the NetflixRequest class are:

The class is derived from the OAuth.OAuthBase class, so the additional step of instantiating an OAuthBase class is not required. Because the public OAuthBase class members are inherited, they are also publicly available from the NetflixRequest class.
There are two general purpose methods in the NetflixRequest class, one for Non-Authenticated requests and one for Signed requests. See the previous article in this series for further information about these two types of service requests.
The service request results are streamed into an XmlDocument, which is the input parameter format for the parser.
You will notice there is no exception handling in the NetflixRequest class. Exceptions are intentionally unhandled so that the application (or the caller) can handle them appropriately.
The HTTP request is still synchronous, meaning that the service request does not return until the request has been received and loaded into the XmlDocument for return.

NetflixParser Class

The parser class is designed to receive an XmlDocument object that is loaded with the results of a service request. There are two public methods ParseSearchResults and ParseTitleInfo, which are called by the application to parse the date returned from a catalog search or a title information search, respectively. Because both of these are dealing with catalog_title XML objects, they both funnel into a common set of private functions starting with ParseCatalogTitle.

ParseCatalogTitle walks through the catalog_title object and extracts the information for each node, and in the cases where the node is an object, constructs an equivalent data object for the node contents. A special case is the Link object which may contain "expanded" title information. If this is the case, the title parser hands off to the ParseExpansion function. Note that the ParseExpansion function is only partially implemented in this example with just a few of the link expansions for the purpose of demonstrating the technique. I didn't want the code to be overly complicated and confusing, so I leave the full title expansion parsing implementation up to you.

Program Operation

For this example I chose to use separate requests for the catalog search and the title details (remember the three detail retrieval options I was just talking about?). For the catalog search I chose to not use any title expansion, but rather request the minimum amount of information for the results. However, you can add a title expansion request parameter to see how title expansion works in a search request.

Title expansion is specified on the title details request in the lvResults_DoubleClick handler. To accommodate the title expansion results, the basic parser for the catalog search results was also extended to accommodate the additional data contained by expanded Link objects. As previously mentioned, not all of the title expansions are supported in the parser for this code example, but rather just the few link types that have been discussed in this article, for the purpose of demonstration.

Running the Example Application

The main form for the example application is based on the code from the previous article in this series as an application that searches for Netflix titles. It requires three inputs: your consumer key, your consumer secret, and the term for which to search. Optionally, you can specify the maximum number of results to return (up to the Netflix-imposed limit of 100), or choose zero to return the default limit of 25 items.

The results are returned in a ListView, showing a few key elements of each catalog item that was returned: the relevance (Rank), Netflix's title identity (ID), the title name, and the year the title was released. Double-clicking on a title launches the TitleInfo dialog which then requests the title information, this time asking for expanded information for the synopsis and cast. Extended details for the title are then displayed on the dialog form.

Note: Any resemblance between my example color scheme and that used by Netflix is purely coincidental.

Conclusion

Now I have to admit that brute force approach to parsing I took in this article may not be very elegant, but it was sufficient to illustrate how the Netflix catalog data is structured. There are many other ways the XML results can be processed, of course, so I will leave it up to you as an exercise now that I've explained the fundamentals.

This article has described and demonstrated how to submit search and title detail requests using the Netflix API, and how to parse and use the results. It also explains some of the concepts, and the options, for performing these tasks. In the next article in this series, I'll show you how to access a Netflix subscriber's account information using Protected Requests.

History

September 3, 2009 - Original submission
September 8, 2009 - Updated source code