|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Announcements
Chapters
Services
Feature Zones
|
IntroductionMany weblogs I read are highly technical. That doesn't come as a surprise though, as I am software developer and also write about technical issues often. Some observations, tricks or techniques work better with source code to demonstrate them. In a way, it's similar to the old "picture is worth a thousand words" except that it's code that's worth a thousand words this time :-). Modern development environments have spoiled me to the point that I cannot productively work in the Notepad for example, and the features I miss the most are intellisense and color coding of the source. This "bad" habit has extended to the Web and it becomes very hard for me to read non-color coded and poorly formatted source code. I have seen many attempts at dealing with this issue. Most of them are based on regular expressions and color-coding simple things like keywords and strings, maybe comments. But these tools do not understand the structure of the code you're posting and will never be able to properly color-code most if not all of the constructs, nor can they format the code. Enter the Colorizer. Parsing is hardWhy don't existing tools provide better formatting and color coding? Because parsing is hard. I thought I knew well most of the C# language constructs before I started working on the Colorizer. Boy, was I wrong! Throughout the course of this project, I have run into several constructs I have never seen before. I have also learned to appreciate more the work of the guys building the C# compiler. If it takes a team of people in Microsoft to properly deal with this issue, how could I have done it alone and in my spare time? Sir Isaac Newton said “If I have seen farther than others, it is because I was standing on the shoulders of giants”, and in this case, I was standing on the shoulders of Coco-R. What is Coco-RIt's compiler compiler (I guess that's where coco comes from). You have probably heard of tools like lex and YACC - Coco-R is a modern version of these tools. What's really great about it is that there are ports to several languages including C#. This is very convenient because additional processing you may want to do during parsing can be written in the same language tool itself is written in - C#. How does it work? Let's start with an example. Suppose you want to write a compiler/parser for Pascal. This language has its own grammar (just like spoken languages, except this grammar is simpler). There is a well known notation for expressing the grammar called (Extended) Backus-Naur Form. Input for Coco-R is called attributed grammar, it is modeled after EBNF notation and looks something like this: Block = "begin" (. Console.Write("Inside a block!"); .) {Statement} "end" .
VarDeclaration = Ident {',' Ident} ':' Type ';'.
This describes a Authors of Coco-R have produced the grammar file for C#, thus it is trivial to produce a parser for it. Since you can embed your own code in the parser, that's exactly what I did - as each construct is recognized, based on the information I keep internally, I add code with formatting and color information (I just wrap code in
Note that when you have a small snippet of code, it is impossible to do 100% correct parsing so on some places I had to guess based on all the info I got. In any case, my code is separated from the rest of Coco-R code and is marked with "My auxiliary methods" in the file CSharp.atg (attributed grammar for C# 1.1, provided by Coco-R, that I modified) so you can examine it in more detail. The pipeline looks like this:
Coco-R produces a parser that consists of three important classes - internal class Helper
{
private static Object _lock = new Object();
public static String CodePath
{
get { return ConfigurationSettings.AppSettings["CodePath"]; }
}
public static String StylePath
{
get { return ConfigurationSettings.AppSettings["StylePath"]; }
}
public static String FromFile(String path)
{
lock (_lock)
{
Scanner.Init(path);
return Colorize();
}
}
private static String Colorize()
{
Parser.Reset();
Parser.Parse();
if(0 == Errors.count)
return Parser.Colorized;
else
return String.Format(@"Parse complete -- {0} error(s) detected",
Errors.count);
}
public static String FromString(String code)
{
lock (_lock)
{
using(MemoryStream ms = new MemoryStream(Encoding.UTF8.GetBytes(code)))
{
Scanner.Init(ms);
return Colorize();
}
}
}
}
You always need to initialize pre.code .key /* keyword: this, for, if, while... */
{
color:Blue;
}
pre.code .typ /* type: any FCL type or your type */
{
color:Navy;
}
pre.code .met /* method */
{
color:Maroon;
}
pre.code .var /* local variable, or parameter */
{
color:Gray;
}
pre.code .str /* hard-coded string (not variables of type string!) */
{
color:Olive;
}
pre.code .num /* hard-coded number (not variables of numeric types!) */
{
color:Olive;
}
pre.code .val /* enumeration values */
{
color:Purple;
}
Current limitations and extrasSome symbols are not recognized properly because they are defined later. For example, you use an Formatting rules are not customizable. There is no way at the moment to specify whether you like your opening curly braces on the same line, whether you put a space between a function call and opening bracket etc. This is very easy to change if you'd like to play with the code provided. Current defaults are to use as little whitespace as possible while preserving readability, to always put curly braces on a new line, and to indent with two spaces. There is no support for C# 2.0 (generics, iterators, partial classes etc.) at the moment. Coco-R authors have recently produced grammar for C# 2.0 thus making this job a lot easier. Basically, all you'd need to do is use more or less the same code I did for C# 1.1 and integrate it into this grammar. I might do this for one of the later revisions of this code if there's enough interest. In case you have a relatively large C# code snippet, fear not - I have added support for regions. With just a bit of JavaScript (note that it therefore must be enabled on the client), you can condense/expand regions just like in Visual Studio (script is in code.js file)! Plus, the code works in both Internet Explorer and Mozilla Firefox (6.0 and 1.0 versions tested, respectively). Using the parserNow that we have an easy way to do what we want (format and colorize C# source code), how do we use it with as little hassle as possible? Well, it turns out there are four basic ways to use this code:
Let's examine each of these solutions. Static parsingThis is the simplest way - I have provided a trivial console application that accepts a path to the C# source file and produces an HTML file with the desired name. Code consists of the core parsing routine (see above) and a bit of command line options handling - a dozen or so lines of code. The resulting HTML contains a reference to the JavaScript code (region handling) and a reference to the CSS style sheet, thus these two files must be kept in the same directory with the resulting HTML files (both files are provided in the source download archive). While being least flexible, this approach offers the best performance - all files are processed before posting to the Web. ASP.NET HandlerHaving a handler is slightly more flexible than pre-processing a C# source file, but it does incur a performance penalty - now your code is parsed each time a user accesses a C# source file. If your visitors browse the site frequently, you might want to employ some caching to amortize for this performance hit. Why would you want to expose C# source code directly? Well, maybe you have a Web view of your source code repository that is public (or just for you) where files change all the time and you don't want to (re)process them whenever they change. Note that by default ASP.NET explicitly prohibits clients to access *.cs files directly in the URL, presumably so that you don't accidentally reveal your web site source code to the visitors. Take a look at the machine.config, it should be in <windows_folder>\Microsoft.NET\Framework\<framework_version>\config\machine.config. Do not modify this file! ASP.NET architecture allows you to set up configuration on a very fine grained level, up to the last subfolder in the hierarchy of folders of your web site. The machine.config supplies reasonable defaults that you can always override. The setting we are about to override is in the following section: <httpHandlers>
<!-- ... -->
<add verb="*" path="*.config" type="System.Web.HttpForbiddenHandler"/>
<add verb="*" path="*.cs" type="System.Web.HttpForbiddenHandler"/>
<add verb="*" path="*.csproj" type="System.Web.HttpForbiddenHandler"/>
<add verb="*" path="*.vb" type="System.Web.HttpForbiddenHandler"/>
<add verb="*" path="*.vbproj" type="System.Web.HttpForbiddenHandler"/>
<!-- ... -->
</httpHandlers>
As you can see, many of the source code files are associated with <httpHandlers>
<add verb="GET" path="*.cs"
type="NanoBriq.Colorizer.Web.Handler, NanoBriq.Colorizer.Web"/>
</httpHandlers>
Now, any URL ending with *.cs will be handled by namespace NanoBriq.Colorizer.Web
{
public class Handler : IHttpHandler
{
public Boolean IsReusable
{
get { return false; }
}
void ProcessRequest(HttpContext context)
{
String path = context.Server.MapPath(context.Request.FilePath);
context.Response.Write("<html><head><script>");
context.Response.WriteFile(Helper.CodePath);
context.Response.Write("</script><style>");
context.Response.WriteFile(Helper.StylePath);
context.Response.Write("</style></head><body>");
context.Response.Write(Helper.FromFile(path));
context.Response.Write("</body></html>");
}
}
}
It boils down (again) to using core parsing routines from above and not much anything else. We inline JavaScript and CSS in the ASP.NET Web controlColorizing whole C# source files is fine and works great, but sometimes you need just a bit more flexibility. Maybe you frequently write articles, have a blog, or even have a CodeProject-like site where others contribute tips and tricks. If so, you have a lot of text with embedded snippets of code that you still want to format and colorize. For all those cases where the Web server is under your control so that you can frequently do build your own *.aspx pages, this solution fits nicely. Here's what you'd do: <%@ Page language="c#" %>
<%@ Register TagPrefix="nbc"
Namespace="NanoBriq.Colorizer.Web" Assembly="NanoBriq.Colorizer.Web" %>
<html>
<head></head>
<body>
<form id="frm1" runat="server">
<p>Some great programming technique...</p>
<nbc:WebUIControl id="ctlr1" Path="Demo.cs"/>
<p>More of the same...</p>
<nbc:WebUIControl id="ctlr2">// Some inline code
String fileName;
Boolean itIs = Path.IsRooted(fileName);
// ...
</nbc:WebUIControl>
<p>Closing thoughts...</p>
</form>
</body>
</html>
There are two ways you can use this control - by pointing to a file on the disk with the namespace NanoBriq.Colorizer.Web
{
public class WebUIControl : Control
{
private String _path;
public String SourcePath
{
set { _path = value; }
}
protected override void OnInit(EventArgs e)
{
String code, style;
using (TextReader tr = new StreamReader(Helper.CodePath))
code = tr.ReadToEnd();
Page.RegisterClientScriptBlock("CodeClientBlock",
"<script>" + code + "</script>");
using (TextReader tr = new StreamReader(Helper.StylePath))
style = tr.ReadToEnd();
Page.RegisterClientScriptBlock("StyleClientBlock",
"<style>" + style + "</style>");
}
protected override void Render(HtmlTextWriter writer)
{
String toOpen = _path;
if(null != _path && "" != _path)
{
if(!Path.IsPathRooted(_path))
toOpen = Context.Server.MapPath(_path);
writer.Write(Helper.FromFile(toOpen));
}
else if(1 == Controls.Count && Controls[0] is LiteralControl)
{
toOpen = HttpUtility.HtmlDecode(((LiteralControl)Controls[0]).Text);
writer.Write(Helper.FromString(toOpen));
}
}
}
}
As a minimum, you should implement ASP.NET ModuleFinally, if you want the most flexible solution, then you'd go this route. The problem with the last approach is that you can't always make sure that your code snippets are embedded in your control (or referenced from it). For example, you have a blog that you edit via internal control that allows you to use WYSIWYG mode or HTML mode, but neither assumes you'll add code to your aspx pages - it's all just content. The best thing you can do here is to mark your code snippets with a special tag, for example namespace NanoBriq.Colorizer.Web
{
public class Module : IHttpModule
{
public void Init(HttpApplication context)
{
context.BeginRequest +=
new EventHandler(OnBeginRequest);
}
private void OnBeginRequest(Object sender, EventArgs args)
{
HttpApplication context = sender as HttpApplication;
context.Response.Filter = new Filter(context.Response.Filter);
}
public void Dispose()
{
}
}
}
This is a classic approach to output filtering - ASP.NET has built-in support for that. First, you need to subscribe to the internal class Filter : Stream
{
private Stream _inner;
private StringBuilder _toParse = new StringBuilder(1024);
private Int32 _colorized = 0;
internal Filter(Stream inner)
{
_inner = inner;
}
private String AddScriptStyle(Match match)
{
String code, style;
StringBuilder whole = new StringBuilder();
whole.Append("<head>").Append(match.Groups["head"].Value);
using (TextReader tr = new StreamReader(Helper.CodePath))
code = tr.ReadToEnd();
whole.Append("<script>").Append(code).Append("</script>");
using (TextReader tr = new StreamReader(Helper.StylePath))
style = tr.ReadToEnd();
whole.Append("<style>").Append(style).Append("</style></head>");
return whole.ToString();
}
private String ColorizeCodeSegment(Match match)
{
_colorized++;
return
Helper.FromString(HttpUtility.HtmlDecode(match.Groups["toParse"].Value));
}
public override void Write(byte[] buffer, int offset, int count)
{
String piece = Encoding.UTF8.GetString(buffer, offset, count);
_toParse.Append(piece);
if(!Regex.IsMatch(piece, "</html>", RegexOptions.IgnoreCase))
return;
String result = Regex.Replace(_toParse.ToString(),
@"<pre\s+class\s*=\s*['""]csharp_source[""']\s*" +
@">(?<toParse>[\w\s\W\S]*?)</pre>",
new MatchEvaluator(ColorizeCodeSegment), RegexOptions.IgnoreCase);
if(_colorized > 0)
result = Regex.Replace(result,
@"<head>(?<head>[\w\s\W\S]*?)</head>",
new MatchEvaluator(AddScriptStyle), RegexOptions.IgnoreCase);
Byte[] all = Encoding.UTF8.GetBytes(result);
_inner.Write(all, 0, all.GetLength(0));
}
// ... more methods ...
}
Instead of implementing a complex state machine tracking if we are at the beginning, inside or outside of our custom <httpModules>
<add name="ColorizerModule"
type="NanoBriq.Colorizer.Web.Module, NanoBriq.Colorizer.Web"/>
</httpModules>
One last thing - I have assumed that all your files and pages are UTF-8 encoded. If that is not the case, either use What's in the packageThe source code download contains everything you need to build and use the colorizer. Due to complex build requirements - we need to build Coco-R first, then parser sources from the attributed grammar, then the core parsing code, and finally Web components - I have used NAnt for building. The version used was 0.85 RC3, there shouldn't be any significant differences between this and the final version, but keep that in mind. I have also provided a test folder with some .aspx files that exercise the code. All you need to do is to create a virtual directory and point to this test folder and to configure the paths (in web.config) to the JavaScript region code and the CSS style sheet for colors. That's it! I hope you enjoy using this code as much as I enjoyed writing it. History
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||