Click here to Skip to main content
15,881,864 members
Articles / Programming Languages / C#
Article

Making the <b>Syntax highlighting textbox written in C#</b> component work

Rate me:
Please Sign up or sign in to vote.
3.26/5 (12 votes)
26 Nov 20078 min read 93.6K   1.5K   52   30
The component by Uri Guy almost worked, it now does.

Screenshot - All_Is_Good.gif
A preview of the new and improved tester-application.
Note the number/parameter recognition.

Download highlighter2.zip - 37.3 KB

Introduction

Whilst looking for an easy way to do some Syntax-Highlighting I came upon the following article by Uri Guy:
Syntax highlighting textbox written in C#.
It was a good start but it had a few flaws. Since the article has not been updated for some time, I thought I present my corrections to the public in this manner.

This component is an intermediate component. Not a KnowItAll-monster.
It will fill your basic needs the quick (and now dirty) way:

  • Customizable word-seperator. (chars)
  • Parsing of large wordlists. (quick part)
  • Rudementary start/stop token search. (no escape chars, I think)
  • RegEx evaluation to do some tricky stuff. (dirty part)
    RegEx is slow(ish) so use carefully, reason explained below.
    Do not highlight every whitspace with a regex.

Background

Key reason why I picked up this source was because I (thought I) was looking for something simple to use, contradictory to some other HighLighters on this site it looked simple enough, and some people were actually using it. This article is for those who did.

The sourcecode submitted is a continuation of the Unicode-download made available by petrusek. All bugs and fixes mentioned in the MessageBoard are implemented. This still left enough improvements to make, these are listed below.

Remember, this is just a continuation of someone else's project. I am not responsible for the (lack of) general design and/or documentation. Neither do I plan to be (just fixin' the thing.)
Nor did I plan to make all the modifications I did, but sometimes I just can't help myself.

Most of the correctons made have a comment, stating the reason for the correction. A little refactoring has been done to:
a) Move duplicate code into a function (GetSelectedWordBounds for example).
b) Just get the thing readable.

Using the code

This article will not explain how to use the original component, we have the original article for that. However the following code demonstrates the features I have add to the component:

This code is from the tester-application demonstrated above.
(modified for small viewing)

// TODO: format code
//New feature ToEndOfWord
shtb.AddHighlightDescriptor(
       DescriptorRecognition.StartsWith, "@", DescriptorType.ToEOW, 
       Color.Firebrick, null, true); 

//strings, almost readable.
shtb.AddHighlightDescriptor(
        DescriptorRecognition.StartsWith, "\"", 
        DescriptorType.ToCloseToken, "\"", 
        Color.Red, null, true);

// RegEx to do the same exact thing... almost. Only highlights if a closing " 
// is found. also allows for escaping the ", which 
// DescriptorType.ToCloseToken does not do.
string regBase = "b[^ex]*(?:x.[^ex]*)*[e]"; //Generic StartsStopToken expr.

//Fill in the blanks
string sEx = regBase.Replace("e", "<a>\\\""); //End
       sEx = regBase.Replace("b</a>", "<a>\\\""); //Begin
       sEx = regBase.Replace("x</a>", "<a>\\\\</a>"); //Escape
//The testapplication actually does not use this one, as I want to select all
//text if the string is not terminated. The DescriptorType is isgnored.
//but I thought another overload would make things less clear.
shtb.AddHighlightDescriptor(
       DescriptorRecognition.RegEx, sEx, DescriptorType.Word, 
       Color.Red, tmp, false);

tmp = new Font(Font, FontStyle.Bold);
//highlight numbers, 
shtb.AddHighlightDescriptor(
       DescriptorRecognition.RegEx, "<a href="file://b(/?:[0-9]*\\.)?[0-9]+\\b">\\b(?:[0-9]*\\.)?[0-9]+\\b</a>",
       DescriptorType.Word, Color.Magenta, tmp, false);
//NOTE: this is not exactly right, 
//it incorrectly highlight a string like "0..0", open to suggestions.

Points of Interest

I discovered that HighLighting is not easy. A simple wordlist might be, but when several 'rules' come into play: order is important. Paying attention to the order where in you define (and therefore execute) your rules will help a lot, but it is likely you still end up with conflicting rules.

There are other HighLighting-Controls out there, (also on this website) which are designed for/ will handle these more complex situations (better).
Complex being a key word. As said before, this is not a KnowItAll.

Some structeres are generated with each call to HighLight, these might be improved to "only when the corresonding property changes", think of the seperator-char-list, RegExList etc.
However, these optimizations currently fall out of the scope of my interests.

I did not know that I was missing RegEx-Evalation untill I wanted to highlight numbers, so I threw it in. (just like that.)

The internal workings of the component are now like this:

  • When highlight is needed (called) clear all previous lists.
  • Create RTFHeader (Colors, start of a FontTable).
  • Split the current Text on \n.
  • Loop through the array locating the first of any defined seperator char.
  • When found, start matching the remainder of the text (between two seperators) to any defined rule except RegEx.
  • The ToEOW option will accept any next SeperatorChar as a closetoken . (new)
  • When a rule applies, add formatting to RFT body.
  • Add font to header if needed.
  • Add recognized text to RTF body.
  • Close this rules Formatting and return to default.
  • Loop until all text is processed.
  • Merge header and body into the RTF property.
    (*)
  • Loop through RegEx rules.
  • For all matched words, set:
    SelectionStart.
    SelectionLength.
    SelectionFont.
    SelectionColor.

Up till the (*) Thusfar this is the working of the original engine and although it is fast,
(entire text only parsed once) it does have limitations:

  • Can not use a SeperatorChar as part of a Start/StopToken/Word.
    Sounds fair enough, untill you want to use minus and single line comment --.
  • No escapes.
  • Can not detect numbers.

The RegExExtension added is meant to compensate for these shortcommings.
The drawback of the used method (manipulation the selection) is that its SLOW, however even thinking about mixing the original recognition-engine and RegEx gave me a headache. (How to integrate the RegEx and straightlineparsing with the RTFgeneration, article anyone?)

Reason why RegEx is slow are the calls to:
SelectionStart.
SelectionLength.
SelectionFont.
SelectionColor.
Which will send window messages(including SetText and WM_PAINT and a few others) to manipulate the underlying RTF (for each highlighted selection), instread of creating the entire RTF once as in the first fase. (very open for suggestions to improve on this.)

Since it's not likely I will be using this component on large files/texts, and you have been warned about the limitations, I can live with that.

Update:

I've modifed the RegEx to only update the visible portion of the text. Exactly calculating what that was proved to be a major pain. As the following code will describe:

//So far the only way to detect the last visible char, for the last line 
//correctly is to calculate it yourself. Get the last character position. 
int LowerRight = GetCharIndexFromPosition(new Point(Width, Height)); 
//The box is happy to report I have found the last character, testing
//tells me that this is not so: Get the line-index for that position.
int iNextLine = GetLineFromCharIndex(LowerRight); 
//The corrected last character would be the first character of that line 
//with the length of that line added, I really don;t line using the Lines 
//property in this step, but hey, it works. 
LowerRight = GetFirstCharIndexFromLine(iNextLine) + Lines[iNextLine].Length;
Now all we have to do is match the RegEx to these bounds:
Point m_UpperLeftCorner = new Point(0, 0); 
int Upperleft = GetCharIndexFromPosition(m_UpperLeftCorner) - 1; 
int RegExUpperleft = regMatch.Index; 
//Only format text that starts in the visible part 
if ((RegExUpperleft > Upperleft) && (RegExUpperleft < LowerRight)) 

Todo

  • Add Seach function.
    Highlights all words that appear in the searchstring.
  • Create option for a 'Transparant' font.
    Color is already transparent, but what if you want your regex to change Color and not font. Specifying null now defaults to the Font-Property. Could be implemented as a flag which tells how to handle a Null-Font.
  • Creating the seperator-char list can be cumbersome.
    Add function to add all characters that are not in : [a-z][A-Z][0-9] and do not appear in defined start/stop-tokens (except regex). (or something like that)
  • Allow EscapeChars between Start- and Stop- token.
    Note: we have a workaround (RegEx), however, the 'linear-search' is theoratically faster than anything you might want to do with a regex.
    Now that RegEx only updates the visible part, I am not so sure.
  • Check generated RTF to RTF-specifications (link needed).
    At the moment it's just reverse engineered from Wordpad. Note that if you compare the RTF-property(after setting it) with the generated RTF, they do not match.
    It LOOKS ok though.
  • Some intializations could be moved from inner loop to when the correspondig property changes, or only when a rule is added.
  • Add timer to cache keystrokes, only call highlight when user stops typing.
    Improves responsiveness.
    Will be released later, but it's easy as pie.

The Code

"FixingTheCode/highlighter2.zip">Download highlighter2.zip - 37.3 KB

History

Version 2.1, to be released later:

  • Catched TextChanged-event with a timer to make control (much) more responsive.
  • Modified RegEx-engine to only format the visible portion of it's results, another major speed boost.

I made the following changes/improvements, let's call it version 2.

  • Fixed bug in fonttable which placed an "{" at the wrong place.
  • Added Defaultvalues to the properties.
  • The testfom now has a PlainText and an RTF-Edit. (makes debugging easier, as screwing up the RTFgeneration will also screw up your working data.
    Now you always have a raw copy.
    Only the PlainText-textbox has an eventhandler (by design).
  • Stole and added the SQL-Wordlist from QueryCommander (very old one), to test long(er) wordlists.
  • Added a new DescriptorType-member: DescriptorType.ToEOW, the result can be previewed in the screenshot. You are looking for @SomeInt.
  • Added a new DescriptorRecognition-member: DescriptorRecognition.RegEx.
  • Added support for the following fontstyles: Bold Italic Underline Strikeout
    Happy to add any mising I could not make out using Wordpad.
  • Tweaked RTF-FontTable generation to include only fonts that are needed for current text. Improves performance if every keyword uses a different font, and you only have a small text to highlight.
  • Calls to mSeperators.GetAsCharArray(); reduced to once for each call to HighLight.
  • Removed duplicate code from inner switch (hd.DescriptorType), Text to be formatted will now be added after the switch with one call to: AddUnicode(sbBody, sSubText);
    This makes it easier to determine where a 'block' should be opened and closed.
  • Added some #regions to code to improve readability of large loops.
    I did/do not understand the code enough to refactor and come up with a proper name.
  • Key.Down with CompleteForm and more than 8 items did not scoll item into view.
  • Used (a little) refacoring and GhostDog to make AutoComplete 'clearer'.
  • AutoComple form shows up on correct monitor, (credits go to: <i forgot, sorry>) NOT TESTED, seems ok though.
  • Added (overloaded) functions AddHighlightDescriptor(<params>) to the component which remove the need to create the descriptor-object yourself. The parameter-order reads more like an english sentence: See first codesample.
  • RTF made more readable for debugging.
  • Color.Transparant now leaves the color for the recognized text alone.
    Use this to let an RegEx modify font, but not color.
  • Removed (flawed) duplicate code from autocomplete. This solved the following:
  • AutoComplete was broken by an Of-By-One-Error.
  • When typing with Completeform open, a better match was not found. (same error)
  • AcceptAutoComplete deleted (wrong) word.(same error)
    Uri, do not Copy-Paste routines. If you recognize that you are about to Copy paste a routine, or that you feel like you are writing it for the second time, refactor the existingcode into a function like: GetSelectedWordByCharIndex().
    This will improve general readablity and design.

I hope with all these changes I did not mess up someone's production code, however, everything the component claimed to do in the first article, it now actually does.

Still, I would like to thank Uri Guy for the Component.
I was using a component that only used the "ManipulatSelection"- and RegEx-method, which proved too slow to work with an SQL-wordlist and I was fearsome to start RTF-parsing myself.

Now I have a working component which is fast and easy enough for my purpose, and I have a basic understanding of RTF.

This is the unofficial sequal to:

Syntax highlighting textbox written in C#.

Regular expressions used are:

Stolen from the internet, proves he doesn't know everything either.

The following code is not my own, but included in the project.
(reference needed)

for (regMatch = regKeywords.Match(sCurrentText); 
     regMatch.Success; regMatch = regMatch.NextMatch()}){ 
  //set selection etc.
} 
 
Kabwla
Because what is visible can change due to scrolling, we now make good use of the Timer introduced in V2.1 to speed up typing. A scroll action will now reset the timer.
Updating the new visible text eventually.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


Written By
Netherlands Netherlands
I've been programming ever since the C64 was high-tech.

I'm ShiftLock+RunStop if you are.

Comments and Discussions

 
Question[My vote of 1] Doesn't seem to work ! (maybe I have missed something...) Pin
Wrangly3-Dec-13 21:44
Wrangly3-Dec-13 21:44 
QuestionSpace not working Pin
Elegiac12-Jul-13 1:12
Elegiac12-Jul-13 1:12 
GeneralMy vote of 5 Pin
anprox25-Apr-11 21:29
anprox25-Apr-11 21:29 
GeneralSelectionTabs not working Pin
finstterling1-Feb-11 4:22
finstterling1-Feb-11 4:22 
GeneralMy vote of 5 Pin
dmitriy_vn13-Jul-10 1:45
dmitriy_vn13-Jul-10 1:45 
QuestionMissing Parts? Pin
Joshthejest16-Feb-10 5:56
Joshthejest16-Feb-10 5:56 
Generalcomments Pin
Rayan Isran20-Dec-09 1:20
Rayan Isran20-Dec-09 1:20 
AnswerRe: comments Pin
chrisbray8-Jan-10 10:13
chrisbray8-Jan-10 10:13 
GeneralProblem with {} brakets Pin
doc-zoidberg11-Oct-09 9:26
doc-zoidberg11-Oct-09 9:26 
AnswerRe: Problem with {} brakets Pin
chrisbray8-Jan-10 10:20
chrisbray8-Jan-10 10:20 
GeneralRe: Problem with {} brakets [modified] Pin
anprox26-Apr-11 0:00
anprox26-Apr-11 0:00 
GeneralBuilding upon your good work: the ChameleonRichTextBox adds fixes and features Pin
msorens31-Mar-09 8:39
msorens31-Mar-09 8:39 
GeneralRe: Building upon your good work: the ChameleonRichTextBox adds fixes and features Pin
msorens3-May-09 17:17
msorens3-May-09 17:17 
GeneralRe: Building upon your good work: the ChameleonRichTextBox adds fixes and features Pin
msorens1-Sep-09 8:07
msorens1-Sep-09 8:07 
QuestionExplorer flickering Pin
koznar21-Jan-09 0:36
koznar21-Jan-09 0:36 
AnswerRe: Explorer flickering Pin
darko794-Apr-09 13:54
darko794-Apr-09 13:54 
GeneralGreat Component But... [modified] Pin
Caglow2-Nov-08 9:00
Caglow2-Nov-08 9:00 
GeneralHighlight separators Pin
DoubleR7931-Jul-08 11:08
DoubleR7931-Jul-08 11:08 
GeneralAutocomplete doesn't work Pin
Joeltw23-Jul-08 6:16
Joeltw23-Jul-08 6:16 
QuestionHow To Hide Descriptors [modified] Pin
_CloudyOne_14-May-08 11:17
_CloudyOne_14-May-08 11:17 
So i was trying to make an irc client, and this looked like a great way to highlight certain parts of text in color.

But since text that triggers the color in that protocol isn't that pretty looking, and is usually hidden, i wanted to hide it here as well.

So in SyntaxHighlightingTextBox.cs, right before

AddUnicode(sbBody, sSubText);<br />
                                //Close the "block" since the text we wanted to format, is now in the body.<br />
                                sbBody.Append('}');


I added

if (hd.DescriptorRecognition == DescriptorRecognition.StartsWith)<br />
{<br />
    sSubText = sSubText.Replace(compareStr, "").Replace(hd.CloseToken, "");<br />
}


Which takes out the opening Descriptor, and then the closing descriptor.

Now of course this will cause problems with items that you want to keep the triggering descriptors, but there are many creative ways to make the code choose which kinds to keep and which kinds to replace.


This worked great for me, except...now that i removed the descriptor that triggers the text change...it no longer triggered..Crap!

So in order to fix this, i went in SyntaxHighlightingTextBox.cs, and right after

<br />
    public class SyntaxHighlightingTextBox : System.Windows.Forms.RichTextBox<br />
    {


i added

public string text<br />
{<br />
    get<br />
    {<br />
        return _text;<br />
    }<br />
    set<br />
    {<br />
        _text = value;<br />
        this.Text = _text;<br />
    }<br />
}<br />
public string _text;


Then, anytime i wanted to update the text of the custom control, i would use

Object.text

instead of

Object.Text


This would keep my changes as well as allow me to keep my output looking all nice and pretty for the End User Big Grin | :-D

Hope this helps!

modified on Wednesday, May 14, 2008 5:35 PM

Generalbug on copy/paste [modified] Pin
see_seA3-Apr-08 6:01
see_seA3-Apr-08 6:01 
Answerfont size tweak Pin
see_seA2-Apr-08 11:48
see_seA2-Apr-08 11:48 
Questionhow can I make all keywords to upper case? Pin
margiex24-Mar-08 20:57
margiex24-Mar-08 20:57 
AnswerRe: how can I make all keywords to upper case? [modified] Pin
chrisbray8-Jan-10 11:01
chrisbray8-Jan-10 11:01 
Generalproblem! Pin
Indrora13-Feb-08 9:07
Indrora13-Feb-08 9:07 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.