Posted 26 Nov 2007


Making the <b>Syntax highlighting textbox written in C#</b> component work

26 Nov 2007
The component by Uri Guy almost worked, it now does.

Screenshot - All_Is_Good.gif
A preview of the new and improved tester-application.
Note the number/parameter recognition.

Whilst looking for an easy way to do some Syntax-Highlighting I came upon the following article by Uri Guy:
Syntax highlighting textbox written in C#.
It was a good start but it had a few flaws. Since the article has not been updated for some time, I thought I present my corrections to the public in this manner.

This component is an intermediate component. Not a KnowItAll-monster.
It will fill your basic needs the quick (and now dirty) way:

  • Customizable word-seperator. (chars)
  • Parsing of large wordlists. (quick part)
  • Rudementary start/stop token search. (no escape chars, I think)
  • RegEx evaluation to do some tricky stuff. (dirty part)
    RegEx is slow(ish) so use carefully, reason explained below.
    Do not highlight every whitspace with a regex.


Key reason why I picked up this source was because I (thought I) was looking for something simple to use, contradictory to some other HighLighters on this site it looked simple enough, and some people were actually using it. This article is for those who did.

The sourcecode submitted is a continuation of the Unicode-download made available by petrusek. All bugs and fixes mentioned in the MessageBoard are implemented. This still left enough improvements to make, these are listed below.

Remember, this is just a continuation of someone else's project. I am not responsible for the (lack of) general design and/or documentation. Neither do I plan to be (just fixin' the thing.)
Nor did I plan to make all the modifications I did, but sometimes I just can't help myself.

Most of the correctons made have a comment, stating the reason for the correction. A little refactoring has been done to:
a) Move duplicate code into a function (GetSelectedWordBounds for example).
b) Just get the thing readable.

Using the code

This article will not explain how to use the original component, we have the original article for that. However the following code demonstrates the features I have add to the component:

This code is from the tester-application demonstrated above.
(modified for small viewing)

// TODO: format code
//New feature ToEndOfWord
       DescriptorRecognition.StartsWith, "@", DescriptorType.ToEOW, 
       Color.Firebrick, null, true); 

//strings, almost readable.
        DescriptorRecognition.StartsWith, "\"", 
        DescriptorType.ToCloseToken, "\"", 
        Color.Red, null, true);

// RegEx to do the same exact thing... almost. Only highlights if a closing " 
// is found. also allows for escaping the ", which 
// DescriptorType.ToCloseToken does not do.
string regBase = "b[^ex]*(?:x.[^ex]*)*[e]"; //Generic StartsStopToken expr.

//Fill in the blanks
string sEx = regBase.Replace("e", "<a>\\\""); //End
       sEx = regBase.Replace("b</a>", "<a>\\\""); //Begin
       sEx = regBase.Replace("x</a>", "<a>\\\\</a>"); //Escape
//The testapplication actually does not use this one, as I want to select all
//text if the string is not terminated. The DescriptorType is isgnored.
//but I thought another overload would make things less clear.
       DescriptorRecognition.RegEx, sEx, DescriptorType.Word, 
       Color.Red, tmp, false);

tmp = new Font(Font, FontStyle.Bold);
//highlight numbers, 
       DescriptorRecognition.RegEx, "<a href="file://b(/?:[0-9]*\\.)?[0-9]+\\b">\\b(?:[0-9]*\\.)?[0-9]+\\b</a>",
       DescriptorType.Word, Color.Magenta, tmp, false);
//NOTE: this is not exactly right, 
//it incorrectly highlight a string like "0..0", open to suggestions.

Points of Interest

I discovered that HighLighting is not easy. A simple wordlist might be, but when several 'rules' come into play: order is important. Paying attention to the order where in you define (and therefore execute) your rules will help a lot, but it is likely you still end up with conflicting rules.

There are other HighLighting-Controls out there, (also on this website) which are designed for/ will handle these more complex situations (better).
Complex being a key word. As said before, this is not a KnowItAll.

Some structeres are generated with each call to HighLight, these might be improved to "only when the corresonding property changes", think of the seperator-char-list, RegExList etc.
However, these optimizations currently fall out of the scope of my interests.

I did not know that I was missing RegEx-Evalation untill I wanted to highlight numbers, so I threw it in. (just like that.)

The internal workings of the component are now like this:

  • When highlight is needed (called) clear all previous lists.
  • Create RTFHeader (Colors, start of a FontTable).
  • Split the current Text on \n.
  • Loop through the array locating the first of any defined seperator char.
  • When found, start matching the remainder of the text (between two seperators) to any defined rule except RegEx.
  • The ToEOW option will accept any next SeperatorChar as a closetoken . (new)
  • When a rule applies, add formatting to RFT body.
  • Add font to header if needed.
  • Add recognized text to RTF body.
  • Close this rules Formatting and return to default.
  • Loop until all text is processed.
  • Merge header and body into the RTF property.
  • Loop through RegEx rules.
  • For all matched words, set:

Up till the (*) Thusfar this is the working of the original engine and although it is fast,
(entire text only parsed once) it does have limitations:

  • Can not use a SeperatorChar as part of a Start/StopToken/Word.
    Sounds fair enough, untill you want to use minus and single line comment --.
  • No escapes.
  • Can not detect numbers.

The RegExExtension added is meant to compensate for these shortcommings.
The drawback of the used method (manipulation the selection) is that its SLOW, however even thinking about mixing the original recognition-engine and RegEx gave me a headache. (How to integrate the RegEx and straightlineparsing with the RTFgeneration, article anyone?)

Reason why RegEx is slow are the calls to:
Which will send window messages(including SetText and WM_PAINT and a few others) to manipulate the underlying RTF (for each highlighted selection), instread of creating the entire RTF once as in the first fase. (very open for suggestions to improve on this.)

Since it's not likely I will be using this component on large files/texts, and you have been warned about the limitations, I can live with that.


I've modifed the RegEx to only update the visible portion of the text. Exactly calculating what that was proved to be a major pain. As the following code will describe:

//So far the only way to detect the last visible char, for the last line 
//correctly is to calculate it yourself. Get the last character position. 
int LowerRight = GetCharIndexFromPosition(new Point(Width, Height)); 
//The box is happy to report I have found the last character, testing
//tells me that this is not so: Get the line-index for that position.
int iNextLine = GetLineFromCharIndex(LowerRight); 
//The corrected last character would be the first character of that line 
//with the length of that line added, I really don;t line using the Lines 
//property in this step, but hey, it works. 
LowerRight = GetFirstCharIndexFromLine(iNextLine) + Lines[iNextLine].Length;
Now all we have to do is match the RegEx to these bounds:
Point m_UpperLeftCorner = new Point(0, 0); 
int Upperleft = GetCharIndexFromPosition(m_UpperLeftCorner) - 1; 
int RegExUpperleft = regMatch.Index; 
//Only format text that starts in the visible part 
if ((RegExUpperleft > Upperleft) && (RegExUpperleft < LowerRight)) 


  • Add Seach function.
    Highlights all words that appear in the searchstring.
  • Create option for a 'Transparant' font.
    Color is already transparent, but what if you want your regex to change Color and not font. Specifying null now defaults to the Font-Property. Could be implemented as a flag which tells how to handle a Null-Font.
  • Creating the seperator-char list can be cumbersome.
    Add function to add all characters that are not in : [a-z][A-Z][0-9] and do not appear in defined start/stop-tokens (except regex). (or something like that)
  • Allow EscapeChars between Start- and Stop- token.
    Note: we have a workaround (RegEx), however, the 'linear-search' is theoratically faster than anything you might want to do with a regex.
    Now that RegEx only updates the visible part, I am not so sure.
  • Check generated RTF to RTF-specifications (link needed).
    At the moment it's just reverse engineered from Wordpad. Note that if you compare the RTF-property(after setting it) with the generated RTF, they do not match.
    It LOOKS ok though.
  • Some intializations could be moved from inner loop to when the correspondig property changes, or only when a rule is added.
  • Add timer to cache keystrokes, only call highlight when user stops typing.
    Improves responsiveness.
    Will be released later, but it's easy as pie.

The Code

Version 2.1, to be released later:

  • Catched TextChanged-event with a timer to make control (much) more responsive.
  • Modified RegEx-engine to only format the visible portion of it's results, another major speed boost.

I made the following changes/improvements, let's call it version 2.

  • Fixed bug in fonttable which placed an "{" at the wrong place.
  • Added Defaultvalues to the properties.
  • The testfom now has a PlainText and an RTF-Edit. (makes debugging easier, as screwing up the RTFgeneration will also screw up your working data.
    Now you always have a raw copy.
    Only the PlainText-textbox has an eventhandler (by design).
  • Stole and added the SQL-Wordlist from QueryCommander (very old one), to test long(er) wordlists.
  • Added a new DescriptorType-member: DescriptorType.ToEOW, the result can be previewed in the screenshot. You are looking for @SomeInt.
  • Added a new DescriptorRecognition-member: DescriptorRecognition.RegEx.
  • Added support for the following fontstyles: Bold Italic Underline Strikeout
    Happy to add any mising I could not make out using Wordpad.
  • Tweaked RTF-FontTable generation to include only fonts that are needed for current text. Improves performance if every keyword uses a different font, and you only have a small text to highlight.
  • Calls to mSeperators.GetAsCharArray(); reduced to once for each call to HighLight.
  • Removed duplicate code from inner switch (hd.DescriptorType), Text to be formatted will now be added after the switch with one call to: AddUnicode(sbBody, sSubText);
    This makes it easier to determine where a 'block' should be opened and closed.
  • Added some #regions to code to improve readability of large loops.
    I did/do not understand the code enough to refactor and come up with a proper name.
  • Key.Down with CompleteForm and more than 8 items did not scoll item into view.
  • Used (a little) refacoring and GhostDog to make AutoComplete 'clearer'.
  • AutoComple form shows up on correct monitor, (credits go to: <i forgot, sorry>) NOT TESTED, seems ok though.
  • Added (overloaded) functions AddHighlightDescriptor(<params>) to the component which remove the need to create the descriptor-object yourself. The parameter-order reads more like an english sentence: See first codesample.
  • RTF made more readable for debugging.
  • Color.Transparant now leaves the color for the recognized text alone.
    Use this to let an RegEx modify font, but not color.
  • Removed (flawed) duplicate code from autocomplete. This solved the following:
  • AutoComplete was broken by an Of-By-One-Error.
  • When typing with Completeform open, a better match was not found. (same error)
  • AcceptAutoComplete deleted (wrong) word.(same error)
    Uri, do not Copy-Paste routines. If you recognize that you are about to Copy paste a routine, or that you feel like you are writing it for the second time, refactor the existingcode into a function like: GetSelectedWordByCharIndex().
    This will improve general readablity and design.

I hope with all these changes I did not mess up someone's production code, however, everything the component claimed to do in the first article, it now actually does.

Still, I would like to thank Uri Guy for the Component.
I was using a component that only used the "ManipulatSelection"- and RegEx-method, which proved too slow to work with an SQL-wordlist and I was fearsome to start RTF-parsing myself.

Now I have a working component which is fast and easy enough for my purpose, and I have a basic understanding of RTF.

This is the unofficial sequal to:

Syntax highlighting textbox written in C#.

Regular expressions used are:

Stolen from the internet, proves he doesn't know everything either.

The following code is not my own, but included in the project.
(reference needed)

for (regMatch = regKeywords.Match(sCurrentText); 
     regMatch.Success; regMatch = regMatch.NextMatch()}){ 
  //set selection etc.
Because what is visible can change due to scrolling, we now make good use of the Timer introduced in V2.1 to speed up typing. A scroll action will now reset the timer.
Updating the new visible text eventually.


Written By
Netherlands Netherlands
I've been programming ever since the C64 was high-tech.

I'm ShiftLock+RunStop if you are.

