Click here to Skip to main content
15,881,600 members
Articles / Programming Languages / C#
Article

Regular Expressions Syntax Highlighting

Rate me:
Please Sign up or sign in to vote.
4.34/5 (17 votes)
7 Sep 2008CPOL3 min read 40.4K   1K   43   4
Easy to use syntax highlighting class
SynHighlight demo screenshot

Introduction

The classes described in this article provide basic mechanism for colorizing of source code. Highlighting rules are defined by regular expressions and colorizing functionality is easy to implement in your application. There's no multilanguage support (JavaScript and HTML in the same block of text), but simple languages such as SQL could be added seamlessly.

Background

I've been writing an application for database stress testing and felt that it's natural to have some syntax highlighting for the queries written by a user. I was looking for something similar to SynEdit (an advanced edit control for Borland Delphi and Kylix), but could not find a suitable solution in reasonable time, so I decided to write one myself. Maybe I did not look hard enough, but it's always more fun to create something than to use something.

Using the Code

There are four classes in this project:

  • SyntaxHighlightSegment - A simple class similar to System.Text.RegularExpressions.Capture. It represents a substring in a block of text specified by position.
  • SyntaxHighlightSegmentList - A list of SyntaxHighlightSegment introducing methods to remove overlapping segments from itself or another list.
  • SyntaxHighlightItem - Serves like a definition for language entity (literal string, keyword, comment, operator, numerical value, etc.) you wish to highlight.
  • SyntaxHighlightDictionary - A list of SyntaxHighlightItem that provides functionality of handling several items together.

The main class to define your highlighting rules is SyntaxHighlightItem. You'll need to understand regular expressions to do this or you can just look up one. For example, if you wish to highlight comments starting with double dash until the end of the line with gray italic font on transparent background, you would do the following:

C#
SyntaxHighlightItem commentItem = 
	new SyntaxHighlightItem("comments", new string[] { "--.*"},
    FontStyle.Italic, Color.Gray, Color.Transparent);

Once you've created all your definitions, just add them to SyntaxHighlightDictionary instance. It will handle several definitions together, for example it will ignore "literal string" item within "comments" one:

C#
SyntaxHighlightDictionary dic = new SyntaxHighlightDictionary();
SyntaxHighlightItem commentItem = 
	new SyntaxHighlightItem("comments", new string[] { "--.*"},
    FontStyle.Italic, Color.Gray, Color.Transparent);
SyntaxHighlightItem literalStringItem = new SyntaxHighlightItem("strings",
    new string[] { "'[^'\r\n]*'" },
    FontStyle.Regular, Color.Blue, Color.Transparent);
dic.Add(commentItem);
dic.Add(literalStringItem);
dic.CreateSnapshot("unmarked text--'string' inside comments");

The same functionality could probably be achieved by specifying regular expressions more carefully, but for those who (like myself) can't get a hold of Regex strings, it's much easier this way.

CreateSnapshot method in the example above will analyze the text and fill the information about items found into segments structure. So, to highlight contents of RichTextBox textBox, you would do something like this:

C#
Dictionary.CreateSnapshot(textBox.Text);
foreach(SyntaxHighlightItem synItem in Dictionary) {
    foreach(SyntaxHighlightSegment segment in synItem.AllSegments) {
        textBox.Select(segment.OrderedStart, segment.Length);
        textBox.SelectionFont = new Font(Dictionary.Font, synItem.FontStyle);
        textBox.SelectionColor = synItem.ForegroundColor;
        textBox.SelectionBackColor = synItem.BackgroundColor;
    }
}

You will find full code for a class to colorize RichTextBox in the demo project.

Known Issues

Regular expressions in the demo project are far from perfect and if used as is could produce unexpected results. Performance is fine for my needs (up to 500 lines of code), but can certainly be improved.

Points of Interest

As you will see from the comments, I tried to work around flickering when highlighting RichTextBox. There's no BeginUpdate()/EndUpdate() methods, so they are emulated by WM_SETREDRAW message. The problem here is to determine the original state of the RichTextBox and not allow it to redraw if it was disabled by another call external to your code. You might want to search for AdvRichTextBox that extends RichTextBox to address this problem.

History

  • 7th September, 2008 - Initial release

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
United Kingdom United Kingdom
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
Generalnew SyntaxHighlightingItem Pin
AdidasSG211-Nov-08 4:56
AdidasSG211-Nov-08 4:56 
GeneralRe: new SyntaxHighlightingItem Pin
Stanislav Kniazev12-Nov-08 4:23
Stanislav Kniazev12-Nov-08 4:23 
Your regular expressions will depend on the syntax - do you want to highlight "$PROGRAMFILES" only when it's in the beginning of line, is it case-sensitive etc. Note that "$" is special character in regard to regex, so you'll need to prefix it with backslash. Roughly you will write something like this:

dic.Add(new SyntaxHighlightItem("programfiles",
    new string[] { @"\B\$PROGRAMFILES\b" },
    FontStyle.Bold, Color.Black, Color.Transparent));
dic.Add(new SyntaxHighlightItem("comments",
    new string[] { "#.*", ";.*"},
    FontStyle.Italic, Color.Gray, Color.Transparent));

GeneralJust what I've been looking for Pin
Don Kackman7-Sep-08 17:01
Don Kackman7-Sep-08 17:01 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.