This article will attempt to cover everything you need to know to work with the Common Gateway Interface in C#, including reading the input of POSTs and GETs. Although this sounds like a lofty goal, CGI is actually quite easy to get into and gains most of its power from its simplicity. Despite that simplicity, I've labeled this as "Intermediate" because in order to take total advantage of CGI, I had to slip in some intermediate things, such as threading. Threading is presented in its most straight forward of terms, however, and only briefly utilized, so don't fret. Even if you have never worked with threading in C# before, I believe you will not have trouble with that part or any other sections that earned the "Intermediate" status.
The Common Gateway Interface, or CGI, is a long standing W3C standard[^] for communicating between a web page and an application available on the web server. Applications on the server communicating in CGI have all the functionality of any other native application, such as database access or reading input files. Nevertheless, in today's world, with ASP.NET, PHP, Perl and many other scripting languages providing essentially the same functionality, you may wonder why you would bother creating your own CGI application with C#. There are a few reasons I can think of that may apply.
First, the interpreters for ASP, PHP, Perl, etc. are essentially applications taking advantage of CGI. In many cases, they are built directly into the web service (such as with ISAPI for IIS), but all of them are also available in their CGI skins (ASP available as CGI via Novel's Mono project). To better understand the capabilities and limitation of your favorite server side scripting language, you may be interested in how it works at a lower level.
Another reason is you may like to write your own mini-scripting interpreter. Languages such as ASP are very general purpose, and although they can do anything, some things could be better accomplished with a more specific purpose script.
For instance, this describes the reason I first delved into the realm of CGI and C#. My company has a very high demand for reporting. Despite a wide range of tools available to the employees, the website remains a top resource for their customized reports due to lots of travel. All of the reports have a variety of very picky graphical requirements, from formatting, to charts, to providing a dynamically generated Excel workbook. The ASP code was starting to look scary so we created a custom C# application to interpret a reporting language so specific to our business needs that even non-programming power users can help out in creating and modifying the reporting needs. (A topic for another article?)
Another benefit to CGI is that it is web server agnostic. The same executable will work for IIS as it does for Apache, as it does for any other W3C compliant web server. And unlike some scripting languages, the executable is compiled so execution is fast and the source is not visible.
C# is an excellent language to use with CGI because of the power of the .NET framework. It is also possible to test a C# application utilizing CGI, even without a web server, because it simply runs as a console application. More on that later.
Finally, there are web service providers who allow you to use custom CGI applications that may not allow you to use Apache modules or ISAPI IIS DLLs. The reason is that an integrated module in a web server can potentially do a lot more damage to the web server than a separate, controllable executable. Now that we've seen some of the reasons to use CGI with C#, let's look at some reasons not to.
One reason you might not is that a CGI application is executed as a separate process every time a web page requests it. This is unlike some forms of Apache modules or ISAPI IIS DLLs where the CGI application is loaded once and stays in memory. Nevertheless, this isn't always a concern. There are plenty of high-end web servers providing Perl as a CGI module that do just fine. Your CGI applications are very unlikely to be nearly as resource consuming as perl.exe. Still, if you find your web server's resources are starting to get low, you may consider switching to a built-in module. Fortunately, almost all of your code will be the same, just substituting different inputs and outputs and adding a little bit more threading.
Another reason you may avoid using CGI is that the burden for security is higher with CGI than it is with a server-side scripting language. There are limitations in scripting languages that are not present in your own application. Because of that, extra caution needs to be applied, especially in how you handle input coming from (you hope) your users. Check out the section on security to learn about what methods are used to make your CGI app tight.
Finally, even if you've decided on using CGI, keep in mind that C# isn't always the best choice. Although C# compiles down to a small executable size, it is usually used in conjunction with the .NET framework, which means the .NET runtime will have to re-loaded now and then, depending upon how often the application is requested. (Yes, I realize it is possible to compile C# as a standalone as well.) In an environment where speed is paramount, you may consider using C or C++, or re-examining the benefits of built-in modules.
Despite what you may assume from a W3C standard, CGI is not a language or a protocol. To borrow some text from the standard's description, it "is an agreement between HTTP server implementers about how to integrate" with your application. In other words, it defines how a web server talks to your program when a web page requests it. Because web servers can span many platforms and many operating system environments, there are three forms of communication the standards committee felt they could rely on. They are:
- Standard Input,
- Standard Output, and
- Environment Variables
If you've worked with a C# console application before, then you may recognize standard input is usually entered through the keyboard. Standard output is what is displayed on the console screen by the program. Environment variables fall into the domain of a function of the operating system most of the time. Environment variables you may recognize are things like the PATH. From a command prompt (Start -> Run -> cmd), type the command
SET to see a full listing of the environment variables on your computer. You will see there are quite a lot. (If using Windows 2000 or higher, you can also right click on My Computer, click Properties, then the Advanced tab, and finally, Environment Variables, to get a graphical look at your current environment.) In the figure below, the "User variables" refers to the environment set up specifically for the current user and the "System variables" refers to the base environment that defaults for everyone.
You can also set an Environment Variable temporarily at a command prompt using the
SET command. Just type
SET followed by the name of the variable you'd like to create, an equals sign, and the value you'd like to set. Take the following example where we create a variable named
ZNewVariable, then use 'set' again to display it:
c:\>set ZNewVariable="I am setting an Environment Variable."
ZNewVariable=I am setting an Environment Variable.
In C#, environment variables are accessed through the
System.Environment namespace. We'll cover that in depth a little later on.
One more thing to note with environment variables is that the available length of the variable varies from platform to platform. The basic rule of thumb is that an environment variable should be less than 512 characters. As we will see, this is one of the inherent limiting factors to the GET method of a web form.
So, the CGI standard says a web server will communicate with your application through standard input, standard output, and environment variables. To tie that into a practical view, think of standard input as a way to send large amounts of data from a web page to your app. In the web form, it is referred to as a POST. POSTs do not appear in the URL of a request and they are usually sent to the server to accommodate very large, dynamic, or sensitive information.
Environment variables are a convenient way to send a small amount of information to your app. In the web form, this is a GET (even though it's really still sending information and not "getting" it anymore than a POST does). GETs do appear in the URL of a request so they can be bookmarked for later use. This is useful for things like forums or searches that are somewhat dynamic but are generally meant to "get" information from the CGI app rather than send useful information to be stored by the CGI app, hence the name. An example URL might look like this:
Standard output is what your application sends to the web browser as a response to either a POST or GET. This is usually in the form of HTML formatted text.
When using CGI, standard input (hereafter called stdin) and standard output (hereafter referred to as stdout) are "piped" into the web server so that when a page makes a request, all of this goes on behind the scenes without actually launching a console screen on the server. You don't have to code any of this as it is all taken care of by the server.
I find the best way to understand something is to play with it. Although I alluded to the fact that you can test your CGI application with just an open console and using the
set command yourself, I think it is more interesting to play with the real thing - a web browser. In this section, I'm going to concentrate my attention on configuring IIS, since I know most of the people reading this probably have IIS already working. The concepts apply to any web server, however, and if you are in a real pinch, you can pretend you are a web server by setting the environment variables yourself at the command line before executing your CGI app.
Before we can set up our server though, we need to have a CGI application. Open up your favorite C# editor, create a console project, and type in the following small program:
static void Main(string args)
" in C#</title></head><body>" +
"CGI Environment:<br />");
Console.Write("<table border = \"1\"><tbody><tr><td>The" +
" Common Gateway " +
"Interface revision on the server:</td><td>" +
Console.Write("<tr><td>The serevr's hostname or IP address:</td><td>" +
Console.Write("<tr><td>The name and" +
" version of the server software that" +
" is answering the client request:</td><td>" +
Console.Write("<tr><td>The name and revision of the information " +
"protocol the request came in with:</td><td>" +
Console.Write("<tr><td>The method with which the information request" +
"was issued:</td><td>" +
Console.Write("<tr><td>Extra path information passed to a CGI" +
" program:</td><td>" +
Console.Write("<tr><td>The translated version of the path given " +
"by the variable PATH_INFO:</td><td>" +
Console.Write("<tr><td>The GET information passed to the program. " +
"It is appended to the URL with a \"?\":</td><td>" +
Console.Write("<tr><td>The remote IP address of the user making +"
"the request:</td><td>" +
} } }
That's plenty for us to get started with. Going through the code, we see that the only namespace we need to add with the
using directive is
System. Since we know that we will communicate back to the web browser using stdout, we can simply write the output we want the web browser to see with a series of
Console.Write() commands. The first one:
is not HTML but actually the HTTP header. It tells the web browser what kind of document it should expect back. This is useful if you would like to send something other than HTML back to the web browser to work with, such as an image or animation. It is also useful for sending other meta data back to the web client such as requesting cookies. The HTTP protocol is pretty straight forward and there are lots of good tutorials around, but the official W3C documents are very difficult to get through. I find one good way to learn about HTTP header information is through example. Go to a few web sites and look at the header information being returned. (You can do that with a web browser such as Firefox[^] and the Web Developer Tools extension[^].) To help you get started, the following code is the header information I received viewing this site. Note that each HTTP command must reside on its own line.
Date: Fri, 28 Jan 2005 19:04:10 GMT
Set-Cookie: cat=1; expires=Sat, 28-Jan-2006 05:00:00 GMT; path=/
The only required piece is the
Content-Type: tag. The HTTP header must appear as the first thing your CGI application communicates back, followed by two new lines (hence the \n\n). The extra new line tells the browser we are ready for the document content. Later, you may want to comment out the header line from our CGI application and try running it with different web browsers to see how each one copes with the error.
The rest of the
Console.Write() lines start displaying some interesting environment variables set by the web server. They use the
System.Environment.GetEnvironmentVariable() method built into the .NET framework. For a full list of variables a standards compliant web server provides, check out the official CGI 1.1 specification data sheet[^]. We'll explore the more interesting variables in more detail once we have a web page up and using this CGI application.
That's it for the source code for now. We have a very basic framework so far that looks at interesting environment variables to show we are getting input, and stdout
Console.Write()s to send output. We haven't dealt with stdin yet, or security, but I'm anxious to see something happen, so let's go on to checking our IIS web server setup.
In this example, we are going to use the Default Web Site in IIS. If you already have some websites going, go ahead and create a new web site or virtual directory to use. I'm going to show you two ways to use our CGI application, the first way will require the executable to be in an IIS browsable directory, the second way will be more secure and won't require the executable to move from the folder where you compile it (but will be a little bit more complex to set up).
To get going with the first method, open the Internet Information Services manager. Locate the web site you want to use and right click on it (in our example, "Default Web Site"). Now go to Properties.
Click on the Home Directory tab. Make sure that Execute Permissions is set to Scripts and Executables. If it hasn't been set before, it will warn you this is a security risk. We'll end up setting it back later, so go ahead and confirm the change here for now. Press Apply, and then OK.
You are ready to copy the executable you compiled into your website directory. Once you've copied it there, open up a web browser such as Internet Explorer and browse directly to the executable through IIS by starting with http://localhost. If you've followed the instructions here, then you can type into the address bar: http://localhost/cgi_csharp.exe[^].
For a side bit of fun, you could also rename your executable to end in .com to seem more web like, since both .com and .exe are considered executable by Windows. That makes the link look more like this: http://localhost/cgi_csharp.com[^].
Your web browser output should look something like this:
The Common Gateway Interface revision on the server: CGI/1.1
The serevr's hostname or IP address: localhost
The name and version of the server software
that is answering the client request: Microsoft-IIS/5.1
The name and revision of the information protocol
the request came in with: HTTP/1.1
The method with which the information request was issued: GET
Extra path information passed to a CGI program:
The translated version of the path given by the variable PATH_INFO:
The GET information passed to the program. It is appended to the URL with a "?":
The remote IP address of the user making the request: 127.0.0.1
Don't worry if some of the variables are blank. Some of them don't apply yet and some others may not be provided by your web server. Part of this experiment is to understand what kind of information we get back. See the source and demo attachments at the beginning of this article to get a more thorough listing of items.
Now let's add a web page form so that we can see information pass to the CGI program dynamically. Create a new text file in the same web folder you've copied your CGI program into by right clicking and going to New -> Text Document. Name it index.txt. Double click the document to open it in Notepad and type in the following HTML form:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<meta content="text/html; charset=ISO-8859-1" http-equiv="content-type">
<title>CGI in C#</title>
<h2>Form for Testing CGI Application</h2>
<form action="cgi_csharp.com" method="GET">
Enter your own input here: <INPUT TYPE="text" NAME="play" SIZE="30" /><br />
<input type="submit" value="Submit Report with GET" />
Once you finish typing that in, save the document and exit Notepad (or whatever you used). Now rename the file to index.html. If Windows doesn't prompt you to ask if you are sure you want to rename the extension, then it might be hiding file extensions from you and only pretending to be naming it correctly. For an example how to fix your view in XP, go to Tools -> Folder Options -> View tab and uncheck "Hide extensions for known file types".
I'll assume most of this HTML you already understand or can look up on your own. The important bit is the
<form action="cgi_csharp.com" method="GET">. This code makes what you type in the input box be sent to the CGI application cgi_csharp.com. If you named your CGI application something else (like, cgi_csharp.exe), just substitute your name in the
action attribute. The "
method" attribute sets how the CGI application receives the information. Note that we specified that this will be a GET request.
When you hit the Submit button, look up at the top of the URL and you will see whatever you typed in the text box appear. This is how the GET method allows you to bookmark a request for later use. Instead of using the HTML web form, you can alter the URL and submit the information directly. This is very useful for allowing a richer, more powerful user experience. However, it is also a common way for people to try to abuse your CGI application, by ignoring any filtering done in the HTML client and sending dangerous inputs to the server directly.
Once you hit Submit, you can see that our application has indeed picked up the GET data as an environment variable:
The GET information passed to the program.
It is appended to the URL with a "?": Testing+1+2+3
Any spaces were converted to + signs because you can't pass spaces in a GET. Go ahead and play with this a little bit and see what kind of output you can produce from your input.
Now this is where most tutorials end, which isn't necessarily bad since we can already do a lot with CGI. However, GET doesn't meet every need. If you want to send large amounts of data, or data that doesn't end up in the URL of a web browser, what you need is the POST method. Many tutorials end with GET because getting POST information is a trickier problem than it may seem. The problem is actually with the 1.x versions of the .NET framework. Microsoft says they will fix the issue in version 2, but as of the writing of this article, that hasn't come about yet. The limitation is with the poor implementation for Console applications in .NET. What we will need is a more powerful
The issue stems from the fact that although
Console.Read() only returns one character at a time like we want, the Console defaults to a mode in which input is buffered until the user presses the Enter key. That means you won't have any POST information unless the last character in your HTML form is an ASCII 13 (enter). Unfortunately, I'm not sure how cross-platform the fix for buffering for an Enter key is. If you plan on using the Mono or DotGnu runtimes, you will have to use trial and error, or just deal with adding an Enter to all web form POSTs.
Although that may be a limitation you are willing to accept, there is a more serious problem facing our POST.
Console.Read() is blocking. That means that once your application requests a
Console.Read(), it will not be able to process anything else until it reads a character. If it is expecting POST information and there isn't any, then your CGI app will hang and the browser will eventually just time out. (A security issue and annoyance.)
Fortunately, both of these problems are easy to overcome in just a few lines of code.
In order to change the Console mode so that it doesn't require an Enter, we need to call a Win32 method. In order to do that, we have to tell the C# program where that method exists in the operating system. First, add a new namespace to the top of the CS file:
The method we are after is, appropriately,
SetConsoleMode(). It lives in the Windows kernel. To tell C# where to find it, just after defining the class, enter a
DllImport directive like so:
static extern int SetConsoleMode (int hConsoleHandle, int dwMode);
Then all you have to do is call the method. To do that, add this one liner near the top of your
Why are the parameters 3 and 0? Well, the basic answer is I peeked at a C header file for
SetConsoleMode and found that the first parameter expects stdin to be equal to 3. Because this was a hard coded define in the header file, I assume that 3 is the correct answer for all Windows platforms. If someone has a more elegant solution, please let me know and I will make the correction. The second parameter, 0, tells
SetConsoleMode what to do with the first parameter, stdin. We want to clear everything and utilize a raw console, so we set it to 0. Again, this is based off of examining a C header file.
Now that we can read in characters all day long, we need to watch out for input blocking on stdin or else our CGI application could be vulnerable to locking up if the POST data is empty or otherwise problematic. Just like the last POST hurdle, this will require some intermediate level programming, but it will also be a very quick fix.
Since .NET doesn't support non-blocking Console input, we're just going to have to invent it ourselves. To do that, we're going to create a method just for getting the stdin input, then process that method separately from
Main() will track the progress of our custom method and if it doesn't return the information we expect after a predetermined time-out period, we will gracefully report the error and exit. Of course, that sounds like doing two things at once, right? I warned you we would touch on threading.
First, we need to add one last namespace:
Then, we need to set up our method for reading stdin.
public static void GatherPostThread()
if(PostLength > 2048) PostLength = 2048;
PostData += Convert.ToChar(Console.Read()).ToString();
Note that in this example we limited the POST length in characters to 2048. This was an arbitrary number and probably pretty low. But if someone is trying to use their web browser as a weapon, you probably don't want to let them send unlimited ammunition into your program, so choose a value that is appropriate for your purposes.
Hey, where did those
PostLength variables come from? Well, we're going to use those variables to communicate between the two threads. I normally try to keep static members down to a minimum (probably because of my C++ background), but here's how we defined these for a quick and easy solution:
[DllImport("kernel32", SetLastError=true)] static extern int SetConsoleMode ( int hConsoleHandle, int dwMode);
private static string PostData; private static int PostLength; ...
PostData string will be used to store our incoming POST stream. The
PostLength is actually something that should come in as an environment variable and gives us a clue as to how much data might have been POSTed to us. We load that variable with this line from within
We can now setup a new thread to start in order to run our
GatherPostThread() method, also in
ThreadStart ThreadDelegate = new ThreadStart(GatherPostThread);
Thread PostThread = new Thread(ThreadDelegate);
int LengthCompare = PostLength;
The conditional statement checks if the web server is telling us there is data. If not, then there's no point in wasting our resources on starting the thread. I've also added a local variable to
LengthCompare. We are about to use this at the end of our
Main() method to check and make sure our
GatherPostThread() is still doing work and not being lazy on the job.
while(PostLength > 0)
if(PostLength < LengthCompare)
LengthCompare = PostLength;
PostData += "Error with POST data or connection problem.";
Now recall that in our threaded method, we are counting down the
PostLength every time we read in a character. Here, we check to make sure that it is indeed decreasing in value. Every 100 milliseconds, we check it against its prior value. If it is the same, then we know there is a problem because, although 100 milliseconds goes by fast for us, it is more than enough time for the POST thread to read in another single byte of data. (With a small POST, I'd be surprised if the data hasn't already finished reading before we reach that loop.) Now let's add another table row that shows off our new POST capabilities:
Console.Write("<tr><td>The POST data passed" +
" to the program through standard input:" +
"</td><td>" + PostData +
There's only one more piece to the POST threading puzzle, and that is what happens if the web server lies to us. If the value of
PostLength is not the real length of data waiting for us, then even though our main thread may close, the
GatherPostThread method will still be spinning its wheels. To make sure we avoid any unnecessary trouble there, we simply add:
to the end of
You may be thinking that this seems (and indeed the entire example) very procedural. And after all, what's the point of using a powerful object oriented language like C# if you don't take advantage of its design capabilities. In the end, that is a question you will have to answer for yourself, but here is my perspective.
First of all, for design, we can't really afford for this application to be event driven. That doesn't even make sense, really, since there is no human interaction and the user's web experience is going to be largely based upon how fast the application can do its thing and exit. Although we are not taking advantage of those aspects of C#, we can still take advantage of an object oriented design for a complex enough project. This small CGI app does not qualify, but if you are out to make the next Perl, you probably will use OOP, and quite a lot. Also, C# to me, is the most beautiful language available that takes advantage of the versatile .NET framework. Just because you don't use everything C# or .NET has to offer, doesn't mean that the things you do use with it don't add a lot of value. OK, enough of the side tangent. Let's move on.
Rebuild the program and copy it to the web directory you're using. Edit the index.html page with a text editor and change the
method="POST". Reload the page and you should now see the data appear under the POST section. Hurray! We've made it through the tricky part!
This section could actually also be called, "how to make the extension of the CGI look like anything you want", or "how to leave the executable where you compile it so you don't have to keep copying it over every time you want to test a change". All of these things are about to be accomplished at once with one important change. All we have to do is associate our executable as the CGI app of choice for an arbitrary script extension. In IIS, this is called "Application Mappings" and it is similar with other web servers.
The reason I put it off until now is that I wanted it to be clear that a CGI application is independent of any scripting language. Indeed, even what we are about to do is totally independent of any scripting, but because it can be used to make your application a scripting interpreter, I wanted to hold off on using this method until the distinction was more clear.
Application Mappings' real purpose is to associate a script file with its interpreter. By creating our own extension to map to our CGI application, we are no longer bound by where our executable resides or what name the website shows when launching it. The custom scripting file itself, as we'll soon see, can be completely empty for all we care. But if you do care, this section will show you how you can go about optionally reading in its contents as well.
Open up the IIS manager again and right click on your website. In our example, it is called Default Web Site again.
Go back to the Home Directory tab. Change Execute Permissions back to Scripts and click "Configuration..." on the side.
Now, click the Browse button next to the Executable field. Find the directory where your executable is generated and select it. Under Extension, type anything you like as long as it doesn't conflict with other common extensions. I used .csx to mimic aspx but for C#. You can also use .test or .your_initials if you like. I'll assume .csx was used from here on out, but understand that is arbitrary.
The Verbs section allows you to limit what kind of information goes to your CGI app. Specifically, you can limit yourself to just GET, or just POST here, for instance. You may see other extensions listing things like HEAD, PUT, or DELETE. These are HTTP variations to GET and POST that help clarify the intention of the requesting page. Because they are rarely used (for good purposes), many web servers block or transform these requests into plain old GETs and POSTs, so don't worry much about them unless you are a system administrator for a public web site. To your CGI apps, HEAD will usually look just like GET and everything else will usually look like POSTs. You can just leave the Verbs section here in IIS as it stands.
Script engine check box you should leave checked if you want to execute your application from where you compile it. If at some point you wish for it to only execute from a folder specifically designated for CGI applications, you can uncheck this option. For now however, leave it so you don't have to go through extra troubleshooting should something go wrong.
"Check that File Exists" means that the web server will make sure the requested .csx file exists. The down side is that that takes extra time and resources for something you should be doing in your application anyway, if you are even creating a scripting interpreter. I usually uncheck this, but it won't make a difference for this discussion one way or the other.
Now hit Apply and OK, and we are ready to go. All we need is an empty file in our web site with an extension of .csx. You can create that by right clicking in the web folder, going to New -> Text File, then renaming the file to something like csharp.csx. Again, don't forget to check if Windows is hiding file extensions from you if it doesn't work at first.
Now point your web browser to your file. In our example, you'd type: http://localhost/csharp.csx[^] into the address bar. With any luck, you should have gotten the CGI output. Now edit your index.html and change the
FORM action="cgi_csharp.com" into
action="csharp.csx". Now when you re-load your index.html (you should probably refresh it to be sure) and type information into the text field, it will launch what looks like csharp.csx in your web browser but gives the output of your C# application, POST or GET data included.
Now that we have told IIS that a file with a .csx extension is a script file for our CGI application, you may actually be interested in the contents of the .csx file that launches your app. Again, you don't have to do anything with it at all, but if you want to, you may have trouble at first figuring out which .csx file called your program.
I struggled with this at first because I assumed IIS would pass the file name to the application as a command line argument, similar to launching a Perl script with "perl.exe script.pl". This is not the case, however. Instead, we must look again at the environment variables. Specifically, the one named
PATH_TRANSLATED. Note that we are already checking the value of
PATH_TRANSLATED in our CGI application. Up until we set up the association with .csx files, however, this field has been empty. Now, it shows the full path and name of the csx file that requested our program. All you have to do is set up a text reader or whatever you like and process the file in whatever way your heart desires. (Don't forget to do a check to make sure the file really does exist though!)
In the official definition of the CGI 1.1 specification, there is a brief look at some security issues[^]. I would like to cover the issue from a slightly different perspective here. The biggest issue with security is making sure that you make safe all input coming into your program from "out there". Never trust the client. It is not difficult at all to send bad information. The most common issue is when your CGI application provides an interface from the outside world into a local system, such as a database. If you just blindly send the input from the webpage into the database, you have created a back door to the database, nullifying its security model in a way that can and will be exploited. Same thing goes for if your application lets a web user send e-mail. Same if it executes a shell command on behalf of a dynamic request.
In order to provide security, I recommend you really get to know what the purpose of your application is regarding what it provides for user input, then filter the user input with only the worst in mind. Even if you can't think of how the input could be abused, make sure you allow only the things you can think of that are legitimate and deny all else!
Also, and this is my opinion, try not to undermine the security of another system. If, for instance, a database normally requires authentication to use, then make the user authenticate instead of hard coding the user name and password into your application.
Finally, if the content that is sent to your CGI application can be used in some way back in a browser, think of the security and stability of your users as well. By weeding out things like HTML from a forum post, or only allowing certain types of formatting, you can protect your other users from potentially malevolent intentions. Since our example application does redisplay in HTML, the POST and GET information it receives, this is an excellent opportunity to give a quick example.
Let's create a new method called
public static string Sanitize(string Raw)
string Clean = "";
if(Raw == null)return Clean;
Raw=Raw.Replace("%22", "\""); Raw=Raw.Replace("<", "<"); ....
ByCharacter = raw.ToCharArray();
for(Walk = 0; Walk < Raw.Length;Walk++)
if(ByCharacter[Walk] == '\'') Clean += "'";
else if(ByCharacter[Walk] == '"') Clean += "\"";
else if(ByCharacter[Walk] == ' ') Clean += " ";
else if(ByCharacter[Walk] == '&') Clean += "<br />";
else if(ByCharacter[Walk] >= 'A' && ByCharacter[Walk] <= 'z' ||
ByCharacter[Walk] >= '0' && ByCharacter[Walk] <= '9' ||
ByCharacter[Walk] == '=' || ByCharacter[Walk] == ',' ||
ByCharacter[Walk] == '.' || ByCharacter[Walk] == '@' ||
ByCharacter[Walk] == '#')
Clean += ByCharacter[Walk].ToString();
else Clean += "^";
} // End of Sanitize() method.
Sanitize() method takes a questionable string as a parameter, tries to translate any GET or HTML formatting, then walks through the string character by character admitting or denying based on, in this example, a limited set of acceptable sets. It replaces characters it doesn't accept with the '^' character. There are more elegant ways to do this, and in fact, I would recommend looking into regular expressions, but this seemed like a straight forward way to help get the point across. I left out a lot of the Raw=xxx code that translates HTML and GET requests due to the size of this code, but just download the example source for the full filter. Keep in mind, I had English in mind when I created it so you may want to expand its functionality and maybe try a few creative ways to play with filtering on your own.
To use our
Sanitize() method, simply wrap the
QUERY_STRING strings in it.
Console.Write("<tr><td>The GET information passed to the program. " +
"It is appended to the URL with a \"?\":</td><td>" +
Console.Write("<tr><td>The POST data passed" +
" to the program through standard input:" +
"</td><td>" + Sanitize(PostData) + "</td></tr>")
Alas, our time is up. I hope this has been an insightful look at how to work all the ins and outs of CGI with C#. The one thing I feel might be missing is a nice example of working with binary input and output. That is more a C# discussion and less a CGI one though, so perhaps that too is fodder for an article or code snippet for another day. In the mean time, the best way to learn is to change some things and recompile, and see what happens. Have fun trying to hack your CGI and happy coding.
- Rev 1.00: January 27, 2005
First uploaded to Code Project.
- Rev 1.01: January 27, 2005, slightly later
As this is my first posted article, I missed some of the standard stuff like adding [^] to links.
- Rev 1.02: January 28, 2005
Added a little more HTTP comments that seemed relevant.