Multiple Language Syntax Highlighting, Part 2: C# Control






3.18/5 (16 votes)
Mar 11, 2003
3 min read

250575

3018
Fast and furious colorizing library for source code (C, C++, VBScript, JScript, xml, etc.)
Introduction
This article is an upgrade of the code submitted in Multiple Language Syntax Highlighting, Part 1: JScript, where a syntax highlighting scheme was proposed.
The technique and ideas for parsing have not change and, therefore, I will not explain the parsing/rendering process in this article. The user who would need more detailled can refer the article cited above. I must also point out that this article is intended to replace entirely the Javascript code in a ( near ?) future.
As the previous article was an exercice to learn JScript, XSL and regular expression, I used this one to get a first contact with C#.
In the rest of the article, I will refer to the Javascript version as v1.0 and the C# as v2.0.
Moving to C#
As a C++ developper, I can tell you I was glad to quit JavaScript and get started with C# who had a much better (C++) flavour.
Wrapping of the JScript methods in a single C# was quite straightforward and doesn't not deserve much comments.
CodeColorizer Class
This class is the kernel of the parser. You can colorize code using CodeColorizer.ProcessAndHighlightCode( string )
.
Having that job done and the ported code running after fairly small time, it was time to use the power of C# and get things better.
New features
Avoiding Regular Expression Object Construction
In the v1.0, regular expression objects were created each time the parser would change context
, although the regular expression string was remaining the same. This was leading to a great number of allocation-compilation of Regex
objects (although I have question about object pooling, see Open question below).
A first improvement of the library was to store the Regex
objects into a HashTable
when parsing the syntax. The class implementing this dictionary is Collections.RegexDictionary
.
Hence, when parsing, regular expression object do not need to be built and can be retreived in constant time from the table.
Open Question: does .NET cache regular expression strings in a pool ?
Handling the Case
The case sensitivity of a language can be specified using the argument not-case-sensitive={"yes" or "no" (default)}
with the node language
.
Bencharkming
The parser contains a timer/counter ( see [1] for details ) to bench the transformation. At the end of the article, some benchmarking results are presented.
Bencharkming quantities are:
CodeColorizer.BenchmarkPerChar
who returns the number of second to parse a character.CodeColorizer.BenchmarkAvgSec
, the parsing time average,CodeColorizer.BenchmarkSec
, the last job parsing time
Easier Integration
The library comes with a custom web control that colorizes text.
The Project:
The projects shows the usage of the custom colorizer control. For further details, NDOC documentation has been generated.
You must modify web.config
to specify where the xml, xsl files are. See ColorizerLibrary
section.
TODO List
- The next big todo is to hangle "multiple language" such as ASP pages.
- As pointed out in post http://www.codeproject.com/jscript/highlight.asp#xx441163xx, case sensitivity can be set to a subset of a regular expression. Hence, case sensitivity handling could be refined.
Reference
[1] | High Performance timer in C# |
[2] | Multiple Language Syntax Highlighting, Part 1: JScript |