Click here to Skip to main content
Click here to Skip to main content

Quoting the Quotes

, 26 Nov 2003
Rate this:
Please Sign up or sign in to vote.
The article describes a client-side workaround using JScript for the missing functionality in Microsoft Internet Explorer to add language-dependant quotation marks around, in particular [q], but also [blockquote] HTML elements.

Introduction

This article describes a client-side workaround using JScript for the missing functionality in Microsoft Internet Explorer to add language-dependant quotation marks around, in particular <q>, but also <blockquote> HTML elements.

Motivation

It happens frequently in the daily life of a webmaster that you need to cite an external source, that you need to write spoken dialogue, or any such related activity. HTML 4.01 and because of it, XHTML 1.0 defines the elements <q>, <blockquote>, and <cite> to accomplish said tasks.

HTML 4.01 defines <q> and <blockquote> as follows.

Visual user agents generally render BLOCKQUOTE as an indented block.

Visual user agents must ensure that the content of the Q element is rendered with delimiting quotation marks. Authors should not put quotation marks at the beginning and end of the content of a Q element.

User agents should render quotation marks in a language-sensitive manner (see the lang attribute). Many languages adopt different quotation styles for outer and inner (nested) quotations, which should be respected by user-agents.

So, according to the specification the user agent, in our case Microsoft Internet Explorer 6, should automatically add quotation marks before and after the text, even language dependant, but alas, life isn't always as rosy as the specifications picture it to be, because Microsoft Internet Explorer 6 doesn't support this behaviour, at all. Not only does it ignore the language-dependance, but it also ignores adding quotation marks.

There are, of course, ways to remedy this shortcoming, but they all require some work, and they may not always be equally useful, and some may even cause inconsistencies with other user agents, e.g. Mozilla and Opera. The list below sums up a few different approaches.

  1. Manually add quotation marks at the beginning and end of the quote.
    This would interfere with user agents adhering to the specification in that they will display two sets of quotation marks, the ones given they will place around such elements in the specification, and the ones that have been manually placed.
  2. Perform the addition of quotes server-side based on user agent info submitted to the server.
    This requires that the hosting provider of your site supports a server-side scripting language of some kind, for instance PHP, ASP.NET, or JSP. However, there are still numerous hosting providers that do not provide such a solution, so while it may be useful for those who have the means, it is not for everyone.
  3. Perform the addition of quotes client-side using a script.
    This is the solution I have opted for in this article. While it does not work for every single user of Microsoft Internet Explorer (it does not work if the user has turned off JScript support), it will work for everyone else. What is more important it will work for everyone who hasn't actively made changes to their Internet Explorer settings as JScript support is activated by default.
  4. Force Microsoft to remedy the oversight.
    This is hardly a viable solution, even for a larger group of people to do.

Locating the Hunting Grounds

Before we can start building the script we need to be aware of a few things from the specification:

  • By default <q> should have quotation marks added.
  • By default <blockquote> should not have quotation marks added as it has previously been used to inset text. This behaviour has been deprecated.
  • The specification states that it should be possible to add quotation marks to blockquote elements using Cascading Style Sheets, however, conveniently, Microsoft Internet Explorer doesn't support this either.
  • The quotation marks added should be language-dependant. This means that we need to pay attention to the lang and xml:lang attributes of elements, and further to process these in order of precedence. According to the XHTML 1.0 specification §C.7 then xml:lang always takes precedence over lang.

As supporting content generation with Cascading Style Sheets would require writing a script which will accurately parse CSS and apply formatting and content generation to the document elements, we will settle with a simpler solution: being able to specify whether to add quotation marks to blockquotes in the script. This should pose no greater problem for the webmaster, but you lose a bit of flexibility.

Investigating the Document Structure

With what we've summarised so far, we should be able to figure out what the script should do in relation to <q> and <blockquote> elements, but we also need to get to the elements somewhere, somehow. If we, for a moment, presume that our webpage is well-formed XML then we might have a structure much like this:

The more programming inclined of us will invariably recognize this as a tree, and what better way is there to traverse a tree than to use recursive functions? In particular, I will be using a preorder traversal of the tree.

The diagram above doesn't actually entirely depict the internal document tree that Microsoft Internet Explorer generates from the page source, as it also has text-nodes for text in elements (as far as I can deem it is only for block elements that text-nodes are generated, e.g. they should never be generated for <q>, <a>, etc.). These text-nodes are characterized by having their nodeName variable set to #text.

As we can see from the diagram we could be in the situation where an image is the first element of a blockquote. Incidentally an image element cannot contain HTML, so we need to take an alternate course of action in this case: inserting an extra text-node before the image. Fortunately this can be achieved easily using methods on the blockquote element. Likewise if the image element is the last child of the blockquote element.

Languages/Sprache/Sprog

The next big deal to cover before we go overboard and code happily through the night, is the tiny little phrase in the specification: User agents should render quotation marks in a language-sensitive manner.

What language-dependance is there to this? Quotation marks are just "..." and '...', are they not?

It would be much too simple if all languages used the same quotation marks — life just doesn't work like that! It is, of course, easier for those of us who speak more than one language to notice this difference in behaviour between languages.

For instance in Denmark text is quoted like this: At være eller ikke at være., or using one of the alternative forms: »At være eller ikke at være.« In French they use guillemets to quote text: Le roi est mort, vive le roi! Progressing to other languages the quotation marks keep changing. I haven't had the inclination of constructing an exhaustive list of quotation marks based on various languages, nor have I made the script support languages that are written right-to-left.[1]

It is also possible to have quotations inside quotations. In general this means using a single-sign version of the outer quotation (except in English). To simplify matters I have chosen just to alternate the quotation mark as quotes are nested, and not to support any of the alternate quotation styles for the various languages. For instance Danish and Norwegean both have two commonly used alternatives than the one presented in the table below.

Language Begin outer End outer Begin inner End inner
American (en-us)
Dansk (da)
Deutsch (de)
English (en)
Français (fr) « »
Norsk (no)
Svenska (se)

The table above has been constructed from the following references: English/American, Norwegean, German/French, Swedish. Only the Norwegean reference is an official reference, most language councils do not publish the language's grammar and usage online (at least not what I was able to locate). The Danish quotation marks have been taken by the official publication by the Danish Language Council. If you want to make corrections, give references to further languages, etc., feel free to contact me.

Harvesting the Fruits

Now that we have come all this way, from reading the specification to linguistic analysis we are finally able to construct the script. There are a few things that we would like to keep optional, and thus we support configuring the script by placing a few global variables at the top of the script, this includes: whether to use xml:lang (so this script can be used with HTML 4.01 as well), what elements to modify (whether to add quotation marks to both <q> and <blockquote>), what the default language should be, and finally whether to reset the quotation depth if we change the language of contents of some element through the document tree. These four configurations will be kept in the elements: reset_level_on_new_lang, use_xml_lang, modify_elements, and default_language.

Apart from the configurability the script isn't much more than a few functions: get_quotes which gets the quotation mark characters based on a language string, parse_element which is the work-horse of the program, this is the function that takes care of everything, but I will cover this in greater detail in a few moments. Finally there is q_fix, which is the entry-level function. This sets up the initial language and begins the tree descent.

get_quotes

get_quotes is at large fairly uninteresting as it merely builds an array with begin/end quotes for both nesting levels and return this.

q_fix

As I have only had the time to test the script with Microsoft Internet Explorer 6 the function will limit the script to work with this. It should be fairly straightforward to extend it to other versions if they support the full range of methods and properties as well.

Following, it queries whether the <html> element has the xml:lang (if used) or lang attributes set and use them in order of precedence. Then it proceeds to examine the <body> tag for the same. Lastly it passes the <body> element to parse_element.

parse_element

This is probably the most interesting part of the script as this is the thing that resolves all elements, place all quotation marks, and well... you get the picture.

The first part examines the language of the passed element. If it is different from the language of the parent the new language will be used (xml:lang or lang, in order of precedence).

The second part examines whether the current element is one of the elements listed in the modify_elements variable at the top of the script. If it is we roll out the core logic. Providing it is a <q> element we just add the begin and end quotation mark to its innerHTML property. The benefit of <q> is that its contents are severely limited by the DTD (I am presuming that we are using a strict document model, I haven't tested how well it holds up to more relaxed DTDs).

<blockquote> on the other hand is a great deal trickier as it is a block element and as such can contain a lot of elements, including elements that cannot contain HTML/text themselves, e.g. <img>. The problematics with placing the first quotation marks are mirrored in placing the last quotation mark within a <blockquote> element, so I will settle with explaining the first: If the element has no children then its innerHTML property will have the beginning quotation mark added, else if the first child is a text-node it will have the quotation mark added, else if the first child element can contain HTML it will have the quotation mark added. As a last resort we will add a text-node as a first child element to the <blockquote> element.

Lastly the quotation level will be increased if the element was in modify_elements. Regardless we will continue with the child elements of the current element with the newest language and quotation level.

That is all there is to it, really.

Integrating the Script

Integrating the script into your own pages is fairly painless, all it takes is an extra line added to your <head> section and calling q_fix in the onload event of <body>. The following excerpt of an HTML file shows this:

<html>
  <head>
    <title>My Page Title</title>
    <script type="text/javascript" src="q_fix.js"></script>
  </head>
  
  <body onload="q_fix();">
    ...
  </body>
</html>

That should be doable even for the most JScript-phobic webmasters out there (I hope).

Customizing the script

If you do not wish to reset the quotation nesting if you change language somewhere down through the document, then find the variable reset_level_on_new_lang and replace the 1 with a 0.

If you are only using HTML 4.01 and thus don't want to support the xml:lang attribute then find use_xml_lang and change 1 to 0.

If you do not wish to have quotation marks added to <blockquote> then find the line modify_elements = new Array('q', 'blockquote'); and change it to modify_elements = new Array('q');

Lastly, if you write your pages in a different language than English and don't want to place manual lang attributes everywhere you can find default_language and change en to the language code of your choice.

Adding Languages

If the need arises you can manually add language definitions to the script, or change existing ones. If you navigate to the get_quotes function you should be able to see something like this:

case 'en': quotes[0] = '\u2018'; quotes[1] = '\u201c';
           quotes[2] = '\u2019'; quotes[3] = '\u201d';
           break;

First off you will want to copy this to a new block and change 'en' to the language code of the language you wish to add, for instance es for Spanish. quotes[0] defines the beginning outer quotation mark, quotes[1] the beginning inner quotation mark, quotes[2] the finishing outer quotation mark, and quotes[3] the finishing inner quotation mark.

The '\uXXXX' refers to a UNICODE character definition. The UNICODE site contains charts which lists the various characters and their number. If you have a new language to add, find the characters in the UNICODE charts and then copy their numbers over the existing numbers.

The break; statement must remain there. It tells the script not to overwrite your settings for that language with the settings of the next language.

Future Pursuits

There are, of course, always things to improve, always things to add, always things to do, and never really enough time to do it in — ah, the joys of having a job. I rarely work with JScript so I can only presume what the efficiency of the script will be, but as far as I can reckon it should only touch any element once, so it should be fairly efficient (we do need to touch every element down the tree to see whether the language changes). This might be extremely inefficient if you only have few quotes on a page, then it might be more efficient just finding the quotation elements and walking up the document tree to determine the language.

The next step would be to automatically support for alternate quotation marks for various languages, and also to expand the list of quotation marks for languages. The current amount of languages is still fairly limited, but with a bit of luck it can increase steadily. If you want to contribute knowledge of quotation marks for some language, please include a book and/or web reference so that I can validate your claims.

Of course, the big pursuit would be to write a custom CSS parser in JScript that will override the computations by IE so that we can support the content generation capabilities of CSS2, in particular the :before and :after pseudo elements. This is, however, a large endeavour to take on and not one that I am prepared to spend a lot of time on.

Notes and Acknowledgements

  1. Technically speaking we can circumvent this by specifying the end quotes as the begin quotes in the script, and specify the begin quotes as the end quotes in the script. This might, depending on your point of view, be a slight hack, but as far as I can see, it should work.

Thanks to Sean Kent for reviewing the article prior to submission.

References

Specifications

Language-related pages

Not all of the references above are formal, and some even contain errors, but in general they are informative and have been, in some form or another, useful.

Development-related pages

History

  • 25th Nov. 2003: Initial release.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

About the Author

Henrik Stuart

Denmark Denmark
No Biography provided

Comments and Discussions

 
GeneralMooi, but one suggestion PinsitebuilderPaul Watson27-Nov-03 0:10 
GeneralRe: Mooi, but one suggestion PinmemberHenrik Stuart27-Nov-03 5:47 
GeneralRe: Mooi, but one suggestion PinsitebuilderPaul Watson27-Nov-03 6:13 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web04 | 2.8.140721.1 | Last Updated 27 Nov 2003
Article Copyright 2003 by Henrik Stuart
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid