Introduction
This article describes a client-side workaround using JScript for the missing
functionality in Microsoft Internet Explorer to add language-dependant quotation
marks around, in particular <q>, but also <blockquote> HTML
elements.
Motivation
It happens frequently in the daily life of a webmaster that you need to cite
an external source, that you need to write spoken dialogue, or any such related
activity. HTML 4.01 and because of
it, XHTML 1.0 defines the elements
<q>, <blockquote>, and <cite> to accomplish said tasks.
HTML 4.01 defines <q> and
<blockquote> as follows.
Visual user agents generally render BLOCKQUOTE as an indented block.
Visual user agents must ensure that the content of the Q element is rendered
with delimiting quotation marks. Authors should not put quotation marks at the
beginning and end of the content of a Q element.
User agents should render quotation marks in a language-sensitive manner (see
the lang attribute). Many languages adopt different quotation styles for outer
and inner (nested) quotations, which should be respected by
user-agents.
So, according to the specification the user agent, in our case Microsoft
Internet Explorer 6, should automatically add quotation marks before and after
the text, even language dependant, but alas, life isn't always as rosy as the
specifications picture it to be, because Microsoft Internet Explorer 6 doesn't
support this behaviour, at all. Not only does it ignore the language-dependance,
but it also ignores adding quotation marks.
There are, of course, ways to remedy this shortcoming, but they all require
some work, and they may not always be equally useful, and some may even cause
inconsistencies with other user agents, e.g. Mozilla and Opera. The list below
sums up a few different approaches.
- Manually add quotation marks at the beginning and end of the
quote.
This would interfere with user agents adhering to the
specification in that they will display two sets of quotation marks, the ones
given they will place around such elements in the specification, and the ones
that have been manually placed.
- Perform the addition of quotes server-side based
on user agent info submitted to the server.
This requires that the
hosting provider of your site supports a server-side scripting language of some
kind, for instance PHP, ASP.NET, or JSP. However, there are still
numerous hosting providers that do not provide such a solution, so while it may
be useful for those who have the means, it is not for everyone.
- Perform the addition of quotes client-side using
a script.
This is the solution I have opted for in this article. While it
does not work for every single user of Microsoft Internet Explorer (it does not
work if the user has turned off JScript support), it will work for everyone
else. What is more important it will work for everyone who hasn't actively made
changes to their Internet Explorer settings as JScript support is activated by
default.
- Force Microsoft to remedy the
oversight.
This is hardly a viable solution, even for a larger group of
people to do.
Locating the Hunting Grounds
Before we can start building the script we need to be aware of a few things
from the specification:
- By default <q> should have quotation marks added.
- By default <blockquote> should not have
quotation marks added as it has previously been used to inset text. This
behaviour has been deprecated.
- The specification states that it should be possible
to add quotation marks to blockquote elements using Cascading Style Sheets, however,
conveniently, Microsoft Internet Explorer doesn't support this either.
- The quotation marks added should be
language-dependant. This means that we need to pay attention to the lang and
xml:lang attributes of elements, and further to process these in order of
precedence. According to the XHTML
1.0 specification §C.7 then xml:lang always takes precedence over lang.
As supporting content generation with Cascading Style Sheets would require
writing a script which will accurately parse CSS and apply formatting and
content generation to the document elements, we will settle with a simpler
solution: being able to specify whether to add quotation marks to blockquotes in
the script. This should pose no greater problem for the webmaster, but you lose
a bit of flexibility.
Investigating the Document Structure
With what we've summarised so far, we should be able to figure out what the
script should do in relation to <q> and <blockquote> elements, but
we also need to get to the elements somewhere, somehow. If we, for a moment,
presume that our webpage is well-formed XML then we might have a structure much
like this:
The more programming inclined of us will invariably recognize this as a tree,
and what better way is there to traverse a tree than to use recursive functions?
In particular, I will be using a preorder traversal of the tree.
The diagram above doesn't actually entirely depict the internal document tree
that Microsoft Internet Explorer generates from the page source, as it also has
text-nodes for text in elements (as far as I can deem it is only for block
elements that text-nodes are generated, e.g. they should never be generated for
<q>, <a>, etc.). These text-nodes are characterized by having their
nodeName
variable set to #text.
As we can see from the diagram we could be in the situation where an image is
the first element of a blockquote. Incidentally an image element cannot contain
HTML, so we need to take an alternate course of action in this case: inserting
an extra text-node before the image. Fortunately this can be achieved easily
using methods on the blockquote element. Likewise if the image element is the
last child of the blockquote element.
Languages/Sprache/Sprog
The next big deal to cover before we go overboard and code happily through
the night, is the tiny little phrase in the specification: User agents should
render quotation marks in a language-sensitive manner
.
What language-dependance is there to this? Quotation marks are just "..." and
'...', are they not?
It would be much too simple if all languages used the same quotation marks —
life just doesn't work like that! It is, of course, easier for those of us who
speak more than one language to notice this difference in behaviour between
languages.
For instance in Denmark text is quoted like this: At være eller
ikke at være.
, or using one of the alternative forms: »At være eller ikke at
være.« In French they use guillemets to quote text: Le roi est mort,
vive le roi!
Progressing to other languages the quotation marks keep
changing. I haven't had the inclination of constructing an exhaustive list of
quotation marks based on various languages, nor have I made the script support
languages that are written right-to-left.[1]
It is also possible to have quotations inside quotations. In general this
means using a single-sign version of the outer quotation (except in English). To
simplify matters I have chosen just to alternate the quotation mark as quotes
are nested, and not to support any of the alternate quotation styles for the
various languages. For instance Danish and Norwegean both have two commonly used
alternatives than the one presented in the table below.
Language | Begin outer | End outer | Begin inner | End inner |
American (en-us) | “ | ” | ‘ | ’ |
Dansk (da) | „ | ” | ‚ | ’ |
Deutsch (de) | „ | “ | ‚ | ‘ |
English (en) | ‘ | ’ | “ | ” |
Français (fr) | « | » | ‹ | › |
Norsk (no) | „ | ” | ‚ | ’ |
Svenska (se) | ” | ” | ’ | ’ |
The table above has been constructed from the following references: English/American,
Norwegean, German/French, Swedish. Only the
Norwegean reference is an official reference, most language councils do not
publish the language's grammar and usage online (at least not what I was able to
locate). The Danish quotation marks have been taken by the official publication
by the Danish Language Council. If you want to make corrections, give references
to further languages, etc., feel free to contact me.
Harvesting the Fruits
Now that we have come all this way, from reading the specification to
linguistic analysis we are finally able to construct the script. There are a few
things that we would like to keep optional, and thus we support configuring the
script by placing a few global variables at the top of the script, this
includes: whether to use xml:lang
(so this script can be used with
HTML 4.01 as well), what elements to modify (whether to add quotation marks to
both <q> and <blockquote>), what the default language should be, and
finally whether to reset the quotation depth if we change the language of
contents of some element through the document tree. These four configurations
will be kept in the elements: reset_level_on_new_lang
,
use_xml_lang
, modify_elements
, and
default_language
.
Apart from the configurability the script isn't much more than a few
functions: get_quotes
which gets the quotation mark characters
based on a language string, parse_element
which is the work-horse
of the program, this is the function that takes care of everything, but I will
cover this in greater detail in a few moments. Finally there is
q_fix
, which is the entry-level function. This sets up the initial
language and begins the tree descent.
get_quotes
get_quotes
is at large fairly uninteresting as it merely builds
an array with begin/end quotes for both nesting levels and return this.
q_fix
As I have only had the time to test the script with Microsoft Internet
Explorer 6 the function will limit the script to work with this. It should be
fairly straightforward to extend it to other versions if they support the full
range of methods and properties as well.
Following, it queries whether the <html> element has the
xml:lang
(if used) or lang attributes set and use them in order of
precedence. Then it proceeds to examine the <body> tag for the same.
Lastly it passes the <body> element to
parse_element
.
parse_element
This is probably the most interesting part of the script as this is the thing
that resolves all elements, place all quotation marks, and well... you get the
picture.
The first part examines the language of the passed element. If it is
different from the language of the parent the new language will be used
(xml:lang
or lang, in order of precedence).
The second part examines whether the current element is one of the elements
listed in the modify_elements
variable at the top of the script. If
it is we roll out the core logic. Providing it is a <q> element we just
add the begin and end quotation mark to its innerHTML
property. The
benefit of <q> is that its contents are severely limited by the DTD (I am
presuming that we are using a strict document model, I haven't tested how well
it holds up to more relaxed DTDs).
<blockquote> on the other hand is a great deal trickier as it is a
block element and as such can contain a lot of elements, including elements that
cannot contain HTML/text themselves, e.g. <img>. The problematics with
placing the first quotation marks are mirrored in placing the last quotation
mark within a <blockquote> element, so I will settle with explaining the
first: If the element has no children then its innerHTML
property
will have the beginning quotation mark added, else if the first child is a
text-node it will have the quotation mark added, else if the first child element
can contain HTML it will have the quotation mark added. As a last resort we will
add a text-node as a first child element to the <blockquote> element.
Lastly the quotation level will be increased if the element was in
modify_elements
. Regardless we will continue with the child
elements of the current element with the newest language and quotation
level.
That is all there is to it, really.
Integrating the Script
Integrating the script into your own pages is fairly painless, all it takes
is an extra line added to your <head> section and calling
q_fix
in the onload
event of <body>. The
following excerpt of an HTML file shows this:
<html>
<head>
<title>My Page Title</title>
<script type="text/javascript" src="q_fix.js"></script>
</head>
<body onload="q_fix();">
...
</body>
</html>
That should be doable even for the most JScript-phobic webmasters out there
(I hope).
Customizing the script
If you do not wish to reset the quotation nesting if you change language
somewhere down through the document, then find the variable
reset_level_on_new_lang
and replace the 1
with a
0
.
If you are only using HTML 4.01 and thus don't want to support the
xml:lang
attribute then find use_xml_lang
and change
1
to 0
.
If you do not wish to have quotation marks added to <blockquote> then
find the line modify_elements = new Array('q', 'blockquote');
and
change it to modify_elements = new Array('q');
Lastly, if you write your pages in a different language than English and
don't want to place manual lang
attributes everywhere you can find
default_language
and change en
to the language code of
your choice.
Adding Languages
If the need arises you can manually add language definitions to the script,
or change existing ones. If you navigate to the get_quotes
function
you should be able to see something like this:
case 'en': quotes[0] = '\u2018'; quotes[1] = '\u201c';
quotes[2] = '\u2019'; quotes[3] = '\u201d';
break;
First off you will want to copy this to a new block and change
'en'
to the language code of the language you wish to add, for
instance es for Spanish. quotes[0]
defines the beginning outer
quotation mark, quotes[1]
the beginning inner quotation mark,
quotes[2]
the finishing outer quotation mark, and
quotes[3]
the finishing inner quotation mark.
The '\uXXXX'
refers to a UNICODE character definition. The UNICODE
site contains charts which lists
the various characters and their number. If you have a new language to add, find
the characters in the UNICODE charts and then copy their numbers over the
existing numbers.
The break;
statement must remain there. It tells the script not
to overwrite your settings for that language with the settings of the next
language.
Future Pursuits
There are, of course, always things to improve, always things to add, always
things to do, and never really enough time to do it in — ah, the joys of having
a job. I rarely work with JScript so I can only presume what the efficiency of
the script will be, but as far as I can reckon it should only touch any element
once, so it should be fairly efficient (we do need to touch every element down
the tree to see whether the language changes). This might be extremely
inefficient if you only have few quotes on a page, then it might be more
efficient just finding the quotation elements and walking up the document tree
to determine the language.
The next step would be to automatically support for alternate quotation marks
for various languages, and also to expand the list of quotation marks for
languages. The current amount of languages is still fairly limited, but with a
bit of luck it can increase steadily. If you want to contribute knowledge of
quotation marks for some language, please include a book and/or web reference so
that I can validate your claims.
Of course, the big pursuit would be to write a custom CSS parser in JScript
that will override the computations by IE so that we can support the content
generation capabilities of CSS2, in particular the :before and :after pseudo
elements. This is, however, a large endeavour to take on and not one that I am
prepared to spend a lot of time on.
Notes and Acknowledgements
- Technically speaking we can circumvent this by specifying the
end quotes as the begin quotes in the script, and specify the begin quotes as
the end quotes in the script. This might, depending on your point of view, be a
slight hack, but as far as I can see, it should work.
Thanks to Sean Kent for reviewing the article prior to submission.
References
Specifications
Language-related pages
Not all of the references above are formal, and some even contain errors, but
in general they are informative and have been, in some form or another,
useful.
Development-related pages
History
- 25th Nov. 2003: Initial release.
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.