Download source code - 7.07 KB

Introduction

As I mentioned in Part 1[^], I have been working on a web project with many pages done up to look like an encyclopedia. Part of generating that look is the use of references: superscript links within the text that point to a list of footnotes at the bottom of the page, which in turn have links back to where the reader had been. If you have ever used the Wikipedia, you know what I mean. Trying to manually manage references in a developing page would have been nightmare, which is why I started looking for a way to generate them automatically.

References are inserted into Wikipedia using two pseudo-tags: <ref> for the reference itself, and <references /> to place the list of references. The inner HTML of <ref> becomes the text of the reference. An optional name attribute allows an editor to reuse the reference later on, without having to copy it all out again. Reusing a reference also means that the reference's entry in the full list has multiple back-links, one for each time the reference is used. This was exactly what I wanted.

This article covers how to implement this kind of filter. If you are looking for information about Response.Filter, Part 1 has some material that you might find useful.

Dealing with the HTML

Because my needs were simple, I implemented a basic subset of the Wikipedia <ref> tag's functionality. In its most basic form, the tag is "anonymous" and useable only once:

<ref>This is an anonymous tag.</ref>

If you look at the image above, references 2 and 3 are both anonymous. If you wanted to repeat the text, you would need to add another tag, which would have its own index number and its own entry in the reference list.

If you want to reuse a reference, use the name attribute. The name needs to be unique and parsable by XML. A named reference would look like this:

<ref name="RefName">This is a named tag.</ref>

In the sample image, reference 1 is named. Notice that its entry in the references list has two back-links, a and b, which will return the viewer to the first or second use of the named reference.

The text inside the first instance of a named reference will be the text for all subsequent uses. This means you can shorten a reused tag to be self-closing:

<ref name="RefName" />

To place the list of references that gets generated by the filter, use the pseudo-tag <references />. If this tag is missing, the filter will skip processing the page and move to the next filter in the chain. Keep in mind that the filter does not support multiple uses of <references />, although with a little bit of work it would be possible.

Also, note that the tags are case-insensitive, so <REF name="rEFnAME" /> is functionally identical to <ref name="RefName" />, and <RefERencES /> will display the list of references just as well as <references />.

One thing to remember is that these are pseudo-tags, meaning that they are NOT part of the X/HTML standard. They will be replaced by the time everything is rendered, and should not cause any problems when validating the finished page, but they will generate warnings if you are using Visual Studio or some other parser during development. If this really bothers you, you can rewrite the filter to use square brackets or some other delimiter; [ref name="RefName" /] will work fine with some adjustment. The advantage to using X/HTML-like tags is that you can use the Framework's XML handler objects and, if anything goes wrong, the browser will automatically hide any pseudo-tags that manage to slip through.

Auxiliary Class ReferenceClass

This helper class holds the information about a single reference.

Private Class ReferenceClass
    Private pBackLinks As List(Of String)
    Private pInnerXml As String

    Public ReadOnly Property BackLinks() As List(Of String)
        Get
            If pBackLinks Is Nothing Then pBackLinks = New List(Of String)
            Return pBackLinks
        End Get
    End Property

    Public ReadOnly Property InnerXml() As String
        Get
            Return pInnerXml
        End Get
    End Property

    Public Sub New(ByVal InnerXml As String)
        pBackLinks = New List(Of String)
        pInnerXml = InnerXml
    End Sub
End Class

Additional Methods

In addition to the StringContains function used in Part 1, the filter has two other helper methods, one that extracts the value of an attribute, and one that retrieves the inner XML of the element.

'Allow us to find an attribute using a case insensitive method
'Returns the value, or an empty string if the value is not found
Private Function GetAttributeValue(ByVal Elem As XElement, _
ByVal AttributeName As String) As String
    For Each a As XAttribute In Elem.Attributes
        If String.Compare(a.Name.ToString, AttributeName, ignoreCase:=True) = 0 Then
            Return a.Value
        End If
    Next

    Return ""
End Function

'If we just look at the value of an XElement, markup gets removed.
'This allows us to retain any markup within a reference.
Private Function GetInnerXml(ByVal Elem As XElement) As String
    Dim Reader As XmlReader = Elem.CreateReader
    Reader.MoveToContent()
    Return Reader.ReadInnerXml
End Function

The References

After the filter does its work, all of the <ref> tags will look something like this:

<sup id="REF_1"><a href="#reference_Lorem">[1]</a></sup>

The pseudo-tag is replaced with a superscript tag having a unique id. The tag contains a link to the entry in the reference list, which will be wherever you put the <references /> tag. The index number is generated automatically, in the order the references appear in the text; reusing a named reference will reuse the original reference's index. This is the primary advantage to using this kind of filter: you can rearrange the text, move references around, etc., and everything will still be correctly indexed and referenced without any additional work on your part.

The reference list is implemented as an unordered list to control the index number that appears, rather than letting the browser decide. In theory, it should not matter, but explicit control gives a bit more flexibility if you want to add new features in the future. It also has the advantage of putting the reference index in the page source, which can make debugging things a bit easier.

<ul class="References">

    <li>1. ^ <sup><a href="#REF_1">a</a> <a href="#REF_3">b</a></sup>
        <a id="reference_Lorem"></a>The traditional phrasing for blocks of irrelevant text.
    </li>

    <li>2. <a href="#REF_2">^</a> <a id="reference_REF_2"></a>
        The <em>Lorem Ipsum</em> text is based on a <strong>Latin</strong> text by Cicero.
    </li>

    <li>3. <a href="#REF_4">^</a> <a id="reference_REF_4"></a>Something to do with 
    "wisdom", I think.</li>

</ul>

The destination anchor -- the empty <a> tag -- has an id derived from the reference's name, either the one given in the name attribute or the one that was automatically generated. With single entry references, like 2 and 3, the back-link is placed on the carat; this will be the case whether the reference is anonymous or is named and used only once. If a named references is reused, the back-links are letters of the alphabet, with each letter linking back to a different use of the reference. Right now, the filter is not designed to have more than 26 back-links, but that can be changed if needed.

Overrides Sub Write

Now we can look at what the filter actually does. The first thing to do is make sure that the <references /> tag is present. If it is not, then why waste cycles creating a list that will not be rendered?

Public Overrides Sub Write(ByVal buffer() As Byte, _
ByVal offset As Integer, ByVal count As Integer)
    Dim BufferStr As String = UTF8Encoding.UTF8.GetString(buffer)

    HTML.Append(BufferStr)
    If EOP.IsMatch(BufferStr) Then
        Dim PageContent As String = HTML.ToString

        'Perform this loop only if there is a place to put the list
        If StringContains(PageContent, "<references") Then
            Dim i As Integer = 0
            Dim FrontStr As String = ""
            Dim Token As StringBuilder = Nothing
            Dim name As String = ""
            Dim value As String = ""
            Dim x As Integer = 0
            Dim RefCount As Integer = 0
            Dim RefStr As StringBuilder = Nothing
            Dim References As New Dictionary(Of String, ReferenceClass) _
                (StringComparer.CurrentCultureIgnoreCase)

            'Stuff happens

        End If

        'Finally, re-encode PageContent and hand it back to ASP.Net
        Output.Write(UTF8Encoding.UTF8.GetBytes(PageContent), offset, _
            UTF8Encoding.UTF8.GetByteCount(PageContent))
    End If
End Sub

The class constructor made a note of the old Response.Filter, using a System.IO.Stream object named Output. We assume that the page is encoded using UTF8 (if not, substitute the correct encoding method), and accumulate the page into a StringBuilder object named HTML. EOP is a Regex set to do a case-insensitive check for </html>; once it is found, we have the full page and can process it. After that, the buffer is converted back into an array of bytes and passed on to the next filter in the chain via Output. The Dictionary object, which will hold the references, is initialized to use a case-insensitive comparer.

The first thing to do is scan the page for occurrences of <ref, excluding the situation where that is actually <references. After getting the entire tag and making sure it is valid, it gets converted into a XElement and the name (if any) and value are extracted. If there is no name attribute, one is generated; then the reference is added to the dictionary if it's not already there, using its name as the dictionary key. The replacement <sup> link is generated and inserted, and the reference's id is noted so we can create the back-link later. Then we splice everything together and find the next reference, looping until none are found.

i = PageContent.IndexOf("<ref", StringComparison.CurrentCultureIgnoreCase)
Do While i > 0
    'Extract the <ref> pseudo-tag
    FrontStr = PageContent.Substring(0, i)
    Token = New StringBuilder
    Token.Append(PageContent.Substring(i, 4)) 'Get the tag
    i += 3 'And move past it

    'Then read the rest of the string
    Do While Not Token.ToString.EndsWith(">")
       i += 1
       Token.Append(PageContent(i))
    Loop

    'If the tag is not self-closing
    If Not Token.ToString.EndsWith("/>") Then
        'Keep reading until we have the closing tag
        Do While Not StringContains(Token.ToString, "</ref>")
            i += 1
            Token.Append(PageContent(i))
        Loop
    End If
    i += 1 'i now points to the character after the complete tag

    'If this is the <references /> tag
    If StringContains(Token.ToString, "<references") Then
        'Ignore it, find the next <ref> after it, and continue the loop
        i = PageContent.IndexOf("<ref", i, StringComparison.CurrentCultureIgnoreCase)
        Continue Do
    End If

    RefCount += 1

    'Attempt to convert Token into an XElement
    Try
        Dim Ref As XElement = XElement.Parse(Token.ToString.Trim)
        name = GetAttributeValue(Ref, "name").Trim
        value = GetInnerXml(Ref)
    Catch ex As Exception
        'Ignore errors
    End Try

    'We need a name, whether it has one or not
    If name = "" Then name = String.Format("REF_{0}", RefCount)

    'Save the reference to the References list, if not already there
    If Not References.Keys.Contains(name) Then
        References.Add(name, New ReferenceClass(value))
    End If

    'Begin construction of the pseudo-tag's replacement
    RefStr = New StringBuilder

    'Find the index we need
    For x = 0 To References.Keys.Count - 1
        If String.Compare(References.Keys(x), name, ignoreCase:=True) = 0 Then Exit For
    Next

    'Create the superscript indicating a reference.
    'The id attribute allows us to link back from the reference list.
    'The id is the same as the name generated for an anonymous reference;
    'this will not cause any problems.
    RefStr.AppendFormat("<sup id=""REF_{0}"">", RefCount)
    RefStr.AppendFormat("<a href=""#reference_{0}"">[{1}]</a>", _
        References.Keys(x).Replace(" ", "_"), x + 1)
    RefStr.Append("</sup>")

    'Then we add the back reference to the superscript
    References.Values(x).BackLinks.Add(String.Format("REF_{0}", RefCount))

    'This replaces PageContent with the spliced-in superscript
    PageContent= FrontStr + RefStr.ToString + PageContent.Substring(i)

    'And we look for the next reference
    i = PageContent.IndexOf("<ref", StringComparison.CurrentCultureIgnoreCase)
Loop

It may not be obvious from looking at the code, but the variable RefCount keeps an absolute count of the references, which guarantees that a unique name can be generated for each one if necessary. The number used to render the reference is the index of References, held in the variable x.

Now, all of the <ref> tags have been replaced with a linked superscript. The next thing to do is to generate the reference list. This is done by looping through the dictionary and adding one list item for every entry. All references will have at least one back-link; if it has more than one, then the links are applied to letters rather than the carat. Note that the list is given a class of References: this will make it easy to style the list to your liking.

RefStr = New StringBuilder
RefStr.AppendLine("<ul class=""References"">")
x = 0

'Loop through every reference
For Each kvp As KeyValuePair(Of String, ReferenceClass) In References
    RefStr.Append("<li>")
    RefStr.AppendFormat("{0}. ", x + 1)

    If kvp.Value.BackLinks.Count = 1 Then
        'If there is only one backlink, use the carat
        RefStr.AppendFormat("<a href=""#{0}"">^</a> ", kvp.Value.BackLinks(0))
    ElseIf kvp.Value.BackLinks.Count > 0 Then
        'If there is more than one backlink, use lower case alpha characters
        'for each reference use, so the user can return to the text.
        RefStr.Append("^ <sup>")
        For i = 0 To kvp.Value.BackLinks.Count - 1
            RefStr.AppendFormat("<a href=""#{0}"">{1}</a> ", kvp.Value.BackLinks(i), _
                Chr(97 + i)) 'Start with lower case a
        Next
        RefStr.Append("</sup> ")
    End If

    'The id attribute here is the forward link given by the inline references
    RefStr.AppendFormat("<a id=""reference_{1}""></a>{0}", kvp.Value.InnerXml, _
        References.Keys(x).Replace(" ", "_"))
    RefStr.AppendLine("</li>")
    x += 1
Next
RefStr.AppendLine("</ul>")

Finally, we replace the <references /> tag with the reference list.

i = PageContent.IndexOf("<references", StringComparison.CurrentCultureIgnoreCase)
If i > -1 Then
    FrontStr = PageContent.Substring(0, i)
    Token = New StringBuilder
    Token.Append(PageContent.Substring(i, 11)) 'Get the tag
    i += 10 'And move past it

    'Then read the rest of the string
    Do While Not Token.ToString.EndsWith(">")
        i += 1
        Token.Append(PageContent(i))
    Loop

    'If the tag is not self-closing
    If Not Token.ToString.EndsWith("/>") Then
        'Keep reading until we have the closing tag
        Do While Not StringContains(Token.ToString, "</ref>")
            i += 1
            Token.Append(PageContent(i))
        Loop
    End If
    i += 1 'i now points to the character after the complete tag
    PageContent = FrontStr + RefStr.ToString + PageContent.Substring(i)
End If

The last step is to write PageContent back out, and we're done.

A Matter of Style

The <sup> tag is a pain. In its default styling, the tag adds padding to the line height, causing lines to appear uneven, with some having more space between them than others. To even things out, I had to add to make a few changes.

The most important change was to <sup>. I styled the tag to have a relative position 0.5em above the baseline top. The tag is typically rendered as small (which seems to be 0.8em); I left that alone.

sup {
    vertical-align:top; 
    position:relative;
    top:-0.5em;
}

To prevent the superscripts from overlapping the line above, I had to increase the line height.

p {
    line-height:1.5em;
}

The other bit of styling I used was for the reference list. I removed the list style type and all of the padding and margins on the list itself, and padded the list items to make them easier to navigate.

ul.References {
    list-style-type:none;
    margin:0;
    padding:0;
}
        
ul.References li {
    padding-bottom:0.5em;
}

Moving On

If you are knowledgeable with Wikipedia references, there are a number of ways this filter can be expanded. One would be to add a group attribute to <ref> and <references />, which would let you get several different reference lists, such as footnotes and a bibliography. You could also implement Wikipedia-style {{cite}} templates, which would mean your coders would not have to worry about how things are to be styled, or the differences between citing a newspaper article and a book. If you do anything interesting in this regard, I would like to hear about it; or better yet, write an article so others can learn from your work. And as always, if you found this article useful, please vote it up.

History

Version 1 2011-03-22 Initial release
Version 2 2011-03-29 Rewrote the filter to handle the situation where the page content does not come in all at once