Click here to Skip to main content
Click here to Skip to main content

General Purpose Colorizer

, 7 Oct 2007 LGPL3
Rate this:
Please Sign up or sign in to vote.
A rule driven engine for colorizing HTML, CSS, JavaScript, etc.
Colorizer Banner

Introduction

The General Purpose Colorizer colorizes an input file so the syntax is easier to read. Here's a simple example. It's also used extensively by AJAX-LWB. The overall project will be housed at SourceForge, so check this link for the latest version. GPC is released under the GNU Library or Lesser General Public License (LGPL) GNU Library or Lesser General Public License (LGPL).

In Brief

There are a lot of code colorizers out there. To help you choose, these are the most significant features of GPC:

  • The appearance of Syntax Pieces is entirely customizable using a stylesheet
  • The parsing is entirely customizable using rules stored in an XML file
  • The parser takes any file as input and produces HTML as output.
  • In addition to syntax coloring, pieces can be highlighted so they can be referred to in documentation
  • The parser itself should not need to be changed when adding support for another language- just the rules

Rule Definition

To give you a better idea how it operates, here is the set of rules that govern the colorizing of CSS (stylesheets):

<rule from="css" pattern="\{" to="cssAttribList" />
<rule from="cssAttribList" pattern=":" transient="cssAttribList" to="cssAttribValue" />
<rule from="cssAttribList" pattern="[_A-Za-z0-9\-]+" transient="cssAttribName" />
<rule from="cssAttribValue" pattern="[^;}]+" 
    transient="cssAttribValue" to="cssAttribList" />
<rule from="cssAttribList" pattern="\}" transient="cssAttribList" to="css" />

The colorizer itself is a state machine. These rules govern how transitions occur between the various states as can be seen in the diagram below. The rule can be applied if its "From State" matches the current state and its pattern (regular expression) matches text at the current position.

The name of the current state is also used as a stylesheet class name in the output from the colorizer:

.css <span class="code-none">{ color<span class="code-none">:Maroon <span class="code-none">}
.cssAttribName <span class="code-none">{ color<span class="code-none">:Red <span class="code-none">}
.cssAttribList <span class="code-none">{ color<span class="code-none">:Black <span class="code-none">}
.cssAttribValue <span class="code-none">{ color<span class="code-none">:Blue <span class="code-none">}</span></span></span></span></span></span></span></span></span></span></span></span>

Rule Attributes

In the example above, the attributes from, pattern, to and transient made an appearance. The complete list is as follows:

Name Type Description Required?
Pattern Regular Expression The pattern that must be found for this rule to execute Yes
From State The initial state to which this rule applies Yes
To State The final state when this rule completes its transition No
Transient State The state that exists during this rule's transition No
Push State Push the specified state on the stack No
Pop Flag Pop a state from the stack No
Add State Add the specified state to the set of current states No
Remove Flag Remove this rule's initial state from the set of current states No
Debug Flag Invoke the debugger before this rule is executed No

More Advanced Features

If you just want to use the colorizer, you can skip this section, but if curiosity gets the better of you, read on...

The rule engine is capable of managing multiple concurrent states. An additional state is used to handle the highlighting of code marked sections of code. The two rules below operate as a pair. The first adds hilite1 to the set of current states when /*[hilite1]*/ is encountered. The second removes this state when /*[/hilite1]*/ is encountered. (noemit is a special state that produces no output for the syntax piece.) There are similar sets of rules to cover four different types of highlighting for the three languages which bulk out the rules a bit (45 rules with almost half- 24 for highlighting). As these rules are all very similar, a smarter definition can reduce this in the future.

<rule from="js" pattern="/\*\[hilite1\]\*/" add="jsHilite1" transient="noemit"/>
<rule from="jsHilite1" pattern="/\*\[/hilite1\]\*/" transient="noemit" remove="true" />

It is possible to embed two other languages within HTML, namely: CSS (stylesheets) and JavaScript. These are triggered by the tags: <style> and <script> respectively. When these tags are encountered, HTML mode must continue in order to colorize any attributes properly. The switch to CSS or JavaScript should only take place when the tag is closed. This behaviour is managed by the following pair of rules. The first rule pushes a state corresponding to the embedded language that is about to be encountered. The second rule pops this state and makes it current when the tag is closed.

<rule from="htmlOpenTagName" pattern=" |>" 
    to="htmlTag" push="=style:css,script:js,html"/>
<rule from="htmlTag" pattern="/?>" transient="htmlEndTag" pop="true" />

Help Wanted

Currently the rule file supports HTML, CSS and JavaScript (although XML can also be colorized with HTML rules). I'll be extending it to address my own needs from time to time. However if you add an additional language yourself, please email me the rules so I can maintain a master copy. Similarly, please let me know about any bugs or suggested features in the Tracker section located here.

The rule execution engine itself is currently C# only. However it's a fairly concise piece of code (<300 lines excluding comments) that is crying out to be ported to other environments such as Java and JavaScript.

Of course, if you just want to use the colorizer as is, that's fine too.

History

  • 7th October, 2007: Initial post

License

This article, along with any associated source code and files, is licensed under The GNU Lesser General Public License (LGPLv3)

Share

About the Author

Declan Brennan
Web Developer
Ireland Ireland
Declan Brennan is Chief Architect for Sekos Technology (www.sekos.com). Over the years he has worked in a huge range of technologies and environments and still gets a great thrill from the magic of computers.

Comments and Discussions

 
GeneralRe: Here's a simple example. PinmemberShamilS9-Oct-07 21:57 
GeneralRe: Here's a simple example. PinmemberDeclan Brennan9-Oct-07 23:44 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web03 | 2.8.1411023.1 | Last Updated 7 Oct 2007
Article Copyright 2007 by Declan Brennan
Everything else Copyright © CodeProject, 1999-2014
Layout: fixed | fluid