No doubt you have seen many web pages in which the results of a keyword-search highlights the keyword in yellow, making it easy for the reader to find the keyword in the context in which it was found. There are of course many ways to approach this task.
This article discusses:
- Implementation of the (mostly) undocumented
- Implementation of a simple search box to highlight a word or phrase on a page
- Use of
Regex.Replace with a
This week when I approached the implementation of keyword highlighting, I considered a few possible ways:
- Search and replace on the text to which I have programmatic access
- An ASP.NET HTTP Module or HTTP Handler, compiled as a standalone assembly and installed in Web.config
- Manipulating the output stream, similar to output buffering in PHP
It was the last method that I decided to pursue, because it had the potential to operate independently of the page's code (unlike #2), wouldn't require processor-intensive client-scripting (unlike #1), and wouldn't require any server-side configuration (unlike #3).
The example site consists of a web page that displays the text from Charles Dickens' Great Expectations. In the upper-right corner of the page floats a search box into which you can enter a word or phrase. It also presents some options, such as case-sensitive searching, whole-word searching, and searching using regular expressions instead of literal text.
When a word or phrase is entered into the search box and the button clicked, the page is shown again with the search term highlighted throughout the document.
For the sake of clarity, I'll refer to the search term or keywords as the needle. Likewise, I'll refer to the text that is being searched as the haystack. This nomenclature is also used throughout the code for consistency.
Using the Code
Earlier in the article, I promised to add highlighting to a page with one line of code. Here is the code in context:
protected void Page_Load(object sender, EventArgs e)
Content.Text = Properties.Resources.Great_Expectations__by_Charles_Dickens;
Response.Filter = new HighlightFilter(Response, Needle.Text)
IsHtml5 = false,
MatchCase = MatchCase.Checked,
MatchWholeWords = MatchWholeWords.Checked,
UseRegex = UseRegularExpressions.Checked
Needle.Text = string.Empty;
As you can see, when the Web Form is posted back, the needle is retrieved from
Needle.Text. In the code-behind, we construct a
HighlightFilter, passing it the
HttpResponse object and the needle.
I have also set some of the properties of
HighlightFilter using an object initializer. Most of the properties should be self-explanatory, like
IsHtml5 property wraps instances of the needle in the
<mark> element, for which it was intended. If it is
div with its class set to "
highlight" is used instead. For greater control, one can explicitly set the values of the
CloseTag properties. For ultimate control, you can subscribe to the
Highlighting event and modify the supplied
Haystack using the supplied
Needle, or even subclass
Of course, the usefulness of post-processing in this manner need not be limited to highlighting. Using the
Filter class, one could subscribe to the
Filtering event to modify the output stream, or subclass
Filter and override the protected
OnFilter method. There are numerous applications including:
- altering the output of sealed classes
- translation (e.g. RSS ? HTML)
- insertion of common code (e.g. reverse master page)
If you find other uses, please share with a comment.
How It Works
I would need to somehow intercept the output stream,
A bit of searching led me to the
Filter property of the
HttpResponse class. The documentation for the property leaves quite a bit to the imagination. The property is assigned a
Stream that filters writes, and the example refers to a magical (i.e. undocumented)
UpperCaseFilterStream that takes the property itself as a parameter to the constructor, and ta da! Hmm… (Had I bothered to find and unpack
Samples.AspNet.CS.Controls maybe I would have solved this one.
I created the
Filter class, which takes the
HttpResponse object as a parameter to the constructor. The class itself inherits
Stream, but the implementation of the
abstract class simply invokes methods and properties of the
OutputStream stream, with the exception of
Write(byte buffer, int offset, int count). The overridden
Write method decodes the buffer to a
string using the response's
ContentEncoding, applies a filter, and re-encodes and writes out the buffer to the
Filter class by itself doesn't do anything useful, but its potential is unlimited. To make it filter something, one needs to subclass it and override
OnFilter, or instantiate it and subscribe to the
Filtering event, which passes a
FilterEventArgs object containing the buffered string to be manipulated.
For example, to implement needle highlighting,
OnFilter and adding some properties and the
OnFilter method uses
Regex.Replace to replace instances of the needle in the haystack. It does this using the invocation that takes a
MatchEvaluator, a delegate that is called for each match that is found. This is perfect for this use because if
true, the characters that bound the needle will be replaced in kind, and the case of the match will not be altered (i.e. using
String.Replace would replace the casing of all matches with that of the needle.
false, the needle is simply escaped with
Regex.Escape instead of using an alternate means of searching and replacing.
I was initially concerned that using
Regex for replacement with a
MatchEvaluator would be prohibitively slow, but replacement of common words in Great Expectations (just over one megabyte) takes a few millisecond on my Core i7-2600K and hopefully not too much more on a typical web server. Interestingly, enabling "Match Whole Word", increases this to several seconds.
Points of Interest
In my first attempt, I derived a new class from
MemoryStream and assigned it to the
Filter property. I overrode the
Write method and manipulated it by wrapping instances of the keyword in a new element to which as CSS style could be assigned.
Inspection of the contents of the
stream demonstrated that it worked quite nicely, and the class called
base.Write to complete the task, but this resulted in zero bytes sent to the client. The sample application suggests maybe one needs to write out the bytes individually. Instead, I used my class to wrap the output
Thank you to The Gutenberg Project for the free distribution of Great Expectations and over 36,000 other works; and of course to Charles Dickens (1812-1870) himself.
- October 31, 2011: Version 1.0.0.x
- January 3, 2013: Modified title to better describe the nature of the topic