Click here to Skip to main content
Email Password   helpLost your password?

Reduce your output size by 7% with one class and one line of code.

Introduction

This article demonstrates how to use HttpResponse.Filter to easily reduce the output size of your website.

Background

Recently, we redesigned the web site for Layton City. Because the redesign made it much easier for citizens to find what they were looking for, our hits per day nearly tripled overnight. Unfortunately, so did our bandwidth. We're currently serving almost 60Mb a day of just HTML. That doesn't include images or Adobe� Reader� documents. So priority #1 became reducing our bandwidth without reducing usability or having to rewrite the majority of our pages.

One downside of using some of the ASP.NET controls is that they insert lots of whitespace characters so that developers can easily see where problems are. While that is desirable during debugging, there is no means of turning that functionality off when you have released your site.

After finding an article on HttpResponse.Filter in the Longhorn SDK (here), we decided to use HttpResponse.Filter to intercept our outgoing HTML and squish it.

Using the code

Add the WhitespaceFilter class to your project, and add the following line of code into the Application_BeginRequest function in your Global.asax file:

Sub Application_BeginRequest(ByVal sender As Object, ByVal e As EventArgs)
    Response.Filter = New WhitespaceFilter(Response.Filter)
End Sub

The above code causes the compressor to be added to every single page in your application. Alternatively, if you only want to compress individual pages, you can add the line to the Page_Load event.

Whitespace.vb

Comments are inline. Some of the weird lines are in to help compress specific portions of the website. (Updated 1/23/2004)

Imports System.IO
Imports System.Text.RegularExpressions 

' This filter gets rid of all unnecessary whitespace in the output.


Public Class WhitespaceFilter
    Inherits Stream

    Private _sink As Stream
    Private _position As Long

    Public Sub New(ByVal sink As Stream)
        _sink = sink
    End Sub 'New


#Region " Code that will most likely never change from filter to filter. "
    ' The following members of Stream must be overridden.

    Public Overrides ReadOnly Property CanRead() As Boolean
        Get
            Return True
        End Get
    End Property

    Public Overrides ReadOnly Property CanSeek() As Boolean
        Get
            Return True
        End Get
    End Property

    Public Overrides ReadOnly Property CanWrite() As Boolean
        Get
            Return True
        End Get
    End Property

    Public Overrides ReadOnly Property Length() As Long
        Get
            Return 0
        End Get
    End Property

    Public Overrides Property Position() As Long
        Get
            Return _position
        End Get
        Set(ByVal Value As Long)
            _position = Value
        End Set
    End Property

    Public Overrides Function Seek(ByVal offset As Long, _ 
           ByVal direction As System.IO.SeekOrigin) As Long
        Return _sink.Seek(offset, direction)
    End Function 'Seek


    Public Overrides Sub SetLength(ByVal length As Long)
        _sink.SetLength(length)
    End Sub 'SetLength


    Public Overrides Sub Close()
        _sink.Close()
    End Sub 'Close


    Public Overrides Sub Flush()
        _sink.Flush()
    End Sub 'Flush


    Public Overrides Function Read(ByVal MyBuffer() As Byte, _ 
      ByVal offset As Integer, ByVal count As Integer) As Integer
        _sink.Read(MyBuffer, offset, count)
    End Function

#End Region

    ' Write is the method that actually does the filtering.


    Public Overrides Sub Write(ByVal MyBuffer() As Byte, _ 
             ByVal offset As Integer, ByVal count As Integer)
        Dim data(count) As Byte
        Buffer.BlockCopy(MyBuffer, offset, data, 0, count)

        ' Don't use ASCII encoding here.  The .NET IDE replaces

        ' some characters, such as �

        ' with a UTF-8 entity.  If you use ASCII encoding,

        ' you'll get B. instead of the registered

        ' trademark symbol.

        Dim s As String = System.Text.Encoding.UTF8.GetString(data)

        ' Replace control characters with either spaces or nothing


        ' The funky semi-colon handling is there because

        ' of a JavaScript comment in a component.

        ' This way, we keep the carriage returns that actually matter.

        s = s.Replace(ControlChars.Cr, _ 
              Chr(255)).Replace(ControlChars.Lf, _ 
              "").Replace(ControlChars.Tab, "")
        s = s.Replace(";" & Chr(255), ";" & ControlChars.Cr)
        s = s.Replace(Chr(255), " ")

        ' Eliminate excess whitespace.

        Do
            s = s.Replace("  ", " ")
        Loop Until s.IndexOf("  ") = -1

        ' Eliminate known comments.


        ' We use three comments in our template. These comments

        ' go on every single page on the site.

        ' Obviously, we can kill them when they are going out.

        ' This way, the comments stay in for

        ' maintenance, but are trimmed before release.

        s = s.Replace("<!-- Page Content Goes Above Here -->", "")
        s = s.Replace("<!-- Page Content Goes Below Here -->", "")
        s = s.Replace("<!-- Do not get rid of this   on data pages -->", "")

        ' Eliminate some additional whitespace we can kill


        ' For some reason, a single space gets emitted

        ' before each of our DOCTYPE directives.

        s = s.Replace(" <!DOCTYPE", "<!DOCTYPE")

        ' These are the most common excess whitespace items we can remove.

        s = s.Replace("<li> ", _ 
              "<li>").Replace("</td> ", _ 
              "</td>").Replace("</tr> ", _ 
              "</tr>").Replace("</ul> ", _ 
              "</ul>").Replace("</table> ", _ 
              "</table>").Replace("</li> ", "</li>")
        s = s.Replace("<LI> ", _ 
              "<LI>").Replace("</TD> ", _
              "</TD>").Replace("</TR> ", _ 
              "</TR>").Replace("</UL> ", _ 
              "</UL>").Replace("</TABLE> ", _
              "</TABLE>").Replace("</LI> ", "</LI>")
        s = s.Replace("<td> ", _
              "<td>").Replace("<tr> ", _
              "<tr>")
        s = s.Replace("<TD> ", _ 
              "<TD>").Replace("<TR> ",_ 
              "<TR>")
        s = s.Replace("<P> ", "<P>").Replace("<p> ", "<p>")
        s = s.Replace("</P> ", "</P>").Replace("</p> ", "</p>")
        s = s.Replace("style=""display:inline""> ", _ 
              "style=""display:inline"">")
        s = s.Replace(" <H", "<H").Replace(" <h", _ 
              "<h").Replace(" </H", _
              "</H").Replace(" </h", "</h")
        s = s.Replace("<UL> ", "<UL>").Replace("<ul> ", "<ul>")
        s = s.Replace(" <TABLE", _ 
              " ID="Table1"<TABLE").Replace(" ID="Table2"<table", _
              " ID="Table3"<table")
        s = s.Replace(" ID="Table4"<li>", _
              "<li>").Replace(" <LI>", "<LI>")
        s = s.Replace(" <br>", _ 
              "<br>").Replace(" <BR>",_ 
              "<BR>").Replace("<br> ", _
              "<br>").Replace("<BR> ", "<BR>")
        s = s.Replace(" <ul>", "<ul>").Replace(" <UL>", "<UL>")

        ' Replace long tags with short ones

        s = s.Replace("<STRONG>", "<B>").Replace("<strong>", "<b>")
        s = s.Replace("</STRONG>", "</B>").Replace("</strong>", "</b>")

        ' Replace some HTML entities with true character codes

        s = s.Replace("&brkbar;", "|")
        s = s.Replace("�", "|")
        s = s.Replace("&shy;", "-")
        s = s.Replace(" ", Chr(160))
        s = s.Replace("&lsquor;", "'")
        s = s.Replace("&ldquor;", """")
        s = s.Replace("�", "'")
        s = s.Replace("&rsquor;", "'")
        s = s.Replace("�", "'")
        s = s.Replace("�", """")
        s = s.Replace("&rdquor;", """")
        s = s.Replace("�", """")
        s = s.Replace("�", "-")
        s = s.Replace("&endash;", "-")

        ' If we don't do this, JavaScript horks on the site

        s = s.Replace("<!--", "<!--" & ControlChars.Cr)
        s = s.Replace("}", "}" & ControlChars.Cr)

        ' Last chance to eliminate excess whitespace

        Do
            s = s.Replace("  ", " ")
        Loop Until s.IndexOf("  ") = -1

        ' Finally, we spit out what we have done.

        Dim outdata() As Byte = System.Text.Encoding.UTF8.GetBytes(s)
        _sink.Write(outdata, 0, outdata.GetLength(0))

    End Sub 'Write 


End Class

Points of Interest

Occasionally, you will find that you have one or more pages that you do not want to compress. For example, the pages may use pre-formatted text or the pages may emit binary data instead of HTML.

In that case, you would want to filter the filter, so to speak. On our site, we have one page that we don't compress, so our Application_BeginRequest looks a little bit like this...

    Sub Application_BeginRequest(ByVal sender As Object, ByVal e As EventArgs)

        ' ...non-related code trimmed...


        ' Whitespace Reduction

        If Request.Url.PathAndQuery.ToLower.IndexOf("makethumbnail") = -1 Then
            Response.Filter = New WhitespaceFilter(Response.Filter)
        End If
    End Sub

Using this class will increase the amount of processing time used for each page. In our case, the reduction in bandwidth (7% on our main page, as much as 30% on some of our more complex pages) was worth the increased workload on the server. All of the string operations are very inefficient, admittedly. A rewrite to use StringBuilder is in the works. The only downside to StringBuilder is that you can't run regular expressions against it. However, because of the use of Strings in the current version, I do not recommend using it if the HTML on your page is greater than 80,000 bytes on average, due to the behavior of the .NET Framework's garbage collector. Essentially, any object greater than 80,000 bytes will be immediately pushed into the Large Object Heap, which is only GC'ed as a measure of last resort by the framework.

If you are using a server operating system, you can also enable HTTP compression on the server to reduce your bandwidth usage even further. If an HTTP/1.1 client connects to your server, Windows will compress the binary stream (similar to ZIP) before sending it out to the client.

To enable HTTP compression on Windows 2000, open the Internet Service Manager, right-click on your server, and pick "Properties". Select the "Service" tab, then check "Compress Application Files" and "Compress Static Files".

As far as I can tell, HTTP compression is automatically enabled on IIS 5.1 in Windows XP.

History

You must Sign In to use this message board.
 
 
Per page   
 FirstPrevNext
GeneralThis sucks - performance wise!!
softer
15:07 14 Jan '09  
Michael,
Your point of view is very good - but your method is extremly bad - performance wise!
Basically you just creating tons of strings in ASP.NET's memory!
Multiply that for each page, for each request, each time!
Bummer!

Here is a simpler method - WAY FASTER, no performance penalty at all!

The idea relies on three steps:
1. Replace the standard HtmlTextWriter with a non-indenting one!
2. Copy the app before deployment and remove the white spaces from all the copied files! [Don't worry it is very simple to do this - see below)
3. Precompile your ASP.NET application!

So here is the code:

#1
Put this in your BasePage class

public class BasePage
: System.Web.UI.Page
{
public class H : HtmlTextWriter
{
public H(TextWriter writer)
: base(writer, "")
{
this.NewLine = "\n";
}
public override void WriteLine()
{
//base.WriteLine();
}
protected override void OutputTabs()
{
//base.OutputTabs();
}
}
protected override HtmlTextWriter CreateHtmlTextWriter(System.IO.TextWriter tw)
{
if (((this.Context != null) && (this.Context.Request != null)) && (this.Context.Request.Browser != null))
{
return new H(this.Context.Request.Browser.CreateHtmlTextWriter(tw));
}
HtmlTextWriter writer = new H(tw);
if (writer == null)
{
writer = new HtmlTextWriter(tw);
}
return writer;
}
}

2. Create a small console application and put this code on the Program.cs file:
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;

namespace CleanupWhiteSpace
{
class Program
{
static string[] Extensions = new string[] { "*.aspx", "*.master", "*.ascx", "*.asmx", "*.htm", "*.html" };
static Regex EndTag = new Regex(@"^(\<\/?\w+\s*\:?\w*\s*\>\s*)+$");

static void Main(string[] args)
{
if (args.Length > 0)
{
string path = args[0];
CleanUp(path);
}
}

private static void CleanUp(string path)
{
foreach (var ext in Extensions)
foreach (var file in Directory.GetFiles(path, ext))
{
CleanUpFile(file);
}
foreach (var dir in Directory.GetDirectories(path))
{
CleanUp(dir);
}
}

private static void CleanUpFile(string file)
{
StringBuilder sb = new StringBuilder();
using (var s = new StreamReader(file))
{
bool appendLine = false;
while (!s.EndOfStream)
{
var line = s.ReadLine();
line = line.TrimStart(' ', '\t');
if (line.Length == 0)
continue;
if (EndTag.Match(line).Success)
{
line = line.TrimEnd(' ', '\t');
appendLine = false;
}
if (appendLine)
sb.Append("\n");
sb.Append(line);
appendLine = true;
}
}
using (var s = new StreamWriter(file))
s.Write(sb.ToString());
}
}
}

Make sure you first copy your web site/application to a different folder and run the cleanup app on it before precompling.

3.Precomplie the application and deploy it!

Enjoy the white-space free outputs Smile

Happy Programming,
Laurentiu
GeneralRe: This sucks - performance wise!!
Michael Russell
15:21 14 Jan '09  
Agreed, it is always embarrassing to see code of your own after a long period of time...especially nearly five years.

That said, I wouldn't recommend going down this path anymore regardless. I keep running into all these wonderful issues with some whitespace being necessary to get IE6 to render things correctly compared to IE7 and other interesting side effects so at this point I'm just enabling HTTP compression and leaving it at that...


GeneralSuggestions
Thiago Rafael
18:45 1 Jun '05  
I got some ideas:

- remove all \t (tabs)
- remove all \n\n (dual new line)
- remove ms_positioning="FlowLayout" from <body>
- remove <table>'s id="..." - remove visual studio's meta tag´s:

<meta content="JavaScript" name="vs_defaultClientScript">
<meta content="http://schemas.microsoft.com/intellisense/ie5" name="vs_targetSchema">
<meta content="Microsoft Visual Studio .NET 7.1" name="GENERATOR">
<meta content="VisualStudio.HTML" name="ProgId">
<meta content="Microsoft Visual Studio .NET 7.1" name="Originator">

Btw, good job dude Smile

[]s

--
Thiago Rafael
Como fazer um currículo
GeneralRe: Suggestions
Thiago Rafael
9:17 2 Jun '05  
Sorry, you already remove ID from table's...

--
Thiago Rafael
Como fazer um currículo
GeneralThe code does not work with webservices
EvilPanda
5:55 24 Feb '05  
Hi,

Thanks for this great invention. I have some pages that calls webservice using javascript and after using the filter, all of those pages does not work. I tried modifying the filter code so that it doesn't do any changes to the output but that doesn't work either.

any ideas?

thanks
Bernie
GeneralRe: The code does not work with webservices
MartialWeb.com
14:01 16 Jan '07  
more than likely, those js lines of code does not end with ";" so it merges all the lines.

cheers
GeneralStraight port to c# from 1 pass version (not tested)
superk
4:20 30 Jun '04  
/* WhitespaceFilter License
Copyright (c) 2004, Layton City Corporation
All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted
provided that the following conditions are met:

Redistributions of source code must retain the above copyright notice, this list of conditions
and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice, this list of conditions
and the following disclaimer in the documentation and/or other materials provided with the
distribution.

Neither the name of Layton City nor the names of its contributors may be used to endorse or
promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR
IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS
BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

This filter gets rid of all unnecessary whitespace in the output.
*/


/* 06/30/2004
Port to C# by Karim Laurent
klaurent@ligne-bleue-cyber.com
France
*/



using System;
using System.IO;
using System.Text;
using System.Text.RegularExpressions ;


namespace MyNameSpace
{
/// /// Summary description for WhiteSpaceFilter.
///
public class WhiteSpaceFilter : Stream
{

private Stream _sink=null;
private long _position;

// Write is the method that actually does the filtering.
// NOTE: Larger pages will be passed in chunks. if we don't keep consistent state, we're hosed.
string lasttag="" ;// used for special casing during scripting
bool delcomments=false;
bool inpre=false;
bool intag=false;
bool insquote=false;
bool indquote=false;

// contructor
public WhiteSpaceFilter(Stream sink)
{
_sink=sink;
}

#region Code that will most likely never change from filter to filter.
public override bool CanRead
{
get {return true;}
}

public override bool CanSeek
{
get {return true;}
}

public override bool CanWrite
{
get {return true;}
}

public override long Length
{
get {return 0;}
}

public override long Position
{
get {return _position;}
set { _position = value;}
}

public override long Seek(long offset , System.IO.SeekOrigin direction)
{
return _sink.Seek(offset, direction);
}

public override void SetLength(long length )
{
_sink.SetLength(length);
}

public override void Close()
{
_sink.Close();
}

public override void Flush()
{
_sink.Flush();
}

public override int Read(byte[] buffer, int offset, int count)
{
return _sink.Read(buffer, offset, count);
}

#endregion

private bool inquote()
{
return (delcomments && (insquote || indquote) );
}

public string WhitespaceStrip(string S)
{
StringBuilder sb= new StringBuilder(S);
try {
// Replace some character entities with shorter types
sb = sb.Replace("&brkbar;", "|");
sb = sb.Replace("&brvbar;", "|");
sb = sb.Replace("&shy;", "-");
sb = sb.Replace(" ", Convert.ToChar(160).ToString() );
sb = sb.Replace("&lsquor;", "'");;
sb = sb.Replace("&ldquor;", "\"\"");
sb = sb.Replace("&lsquo;", "'");
sb = sb.Replace("&rsquor;", "'");
sb = sb.Replace("&rsquo;", "'");
sb = sb.Replace("&ldquo;", "\"\"");
sb = sb.Replace("&rdquor;", "\"\"");
sb = sb.Replace("&rdquo;", "\"\"");
sb = sb.Replace("&ndash;", "-");
sb = sb.Replace("&endash;", "-");

int sboff= 0;
bool trimmed=false;
bool kill=false;
int sboff2;
bool endsc;
while (sboff < sb.Length)
{
trimmed=false;
if (intag)
lasttag += Char.ToLower(sb[sboff]);

if (sb[sboff]== '<')
{
intag = true;
lasttag = "<" ;
}


if (sb[sboff]=='>')
{
intag = false ;
if (Regex.IsMatch(lasttag,"",RegexOptions.IgnoreCase))
inpre = true ;

if (lasttag == "")
inpre = false;

if (Regex.IsMatch(lasttag,"<script*>",RegexOptions.IgnoreCase))
{
delcomments = true;
/*'sb.Insert(sboff + 1, ControlChars.Cr, 1)
'sboff += 2
'trimmed = True
*/
}
if (lasttag.ToLower() == "</script>")
delcomments = false;

lasttag = String.Empty ;
}

if (Char.IsControl(sb[sboff]) && !inpre)
{
kill=false;
if (delcomments)
{
if (sb[sboff]!= '\r' )
kill = true;
}
else {
if ( (sb[sboff] == '\r' ) && (sboff > 0) )
{
if ( ( sb[sboff - 1]==' ') || (sb[sboff - 1] == Convert.ToChar(160)) )
kill = true;
else sb[sboff]=' ';

}
else kill = true;
}

if (kill)
{
sb = sb.Remove(sboff, 1);
trimmed = true;
}

}

if (! (inpre || inquote()))
{
if (sboff > 0 && sb[sboff]==' ' && sb[sboff - 1]==' ' )
{
sb = sb.Remove(sboff, 1);
trimmed = true ;
}
}

if (delcomments)
{
if (sb[sboff] == '\t' ) //tab
{
sb = sb.Remove(sboff, 1);
trimmed = true;
}

if (sboff > 0 && sb[sboff] =='\r' && sb[sboff-1] == '\r' ) // carriage return
{
sb = sb.Remove(sboff, 1);
trimmed = true ;
}

if (sb[sboff]== '\n') // line feed
{
sb = sb.Remove(sboff, 1);
trimmed = true;
}

// Handle quotes and escaped quotes
if (sb[sboff]== '\'' && sboff > 0 && sb[sboff-1]!='\\' )
insquote = ! insquote;

if (sb[sboff]== '
"' && sboff > 0 && sb[sboff - 1]!='\\')
insquote = ! insquote;

// Handle /* */ style script comments
if (sb[sboff]== '*' && sboff > 0 && !inquote() )
{
if (sb[sboff - 1] == '/') // Comment! MUST KILL!
{
sboff2 = sboff;
do
{
sboff2 += 1;
if (sboff2 == sb.Length)
{
sboff2 -= 1;
break;
}
}
while (sb[sboff2]!='/' && sb[sboff2- 1]!='*');

sboff -= 1;
sb = sb.Remove(sboff, sboff2 - sboff);
trimmed = true;
}
} // multiline comment

// Handle '//' style single line comments
if (sb[sboff]== '/' && sboff> 0 && !inquote() )
{
if (sb[sboff - 1]== '/') // Comment! MUST KILL!
{
// First scan ahead to next CR and see if this one ends a source comment
// i.e. // -->
sboff2 = sboff;
endsc = false;

do
{
sboff2 += 1;
if (sboff2 == sb.Length)
{
sboff2 -= 1;
endsc = true; // We don't want to remove if we're at the end of a chunk
break;
}
}
while (sb[sboff2]!= '\r' &&
sb[sboff2] != '\n' &&
sb[sboff2] != '>' );

if (sb[sboff2]=='>')
{
if (sb[sboff2- 1]=='-' && sb[sboff2-2]== '-' )
endsc = true;
}

if (!endsc)
{
sboff -= 1;
sb = sb.Remove(sboff, sboff2 - sboff);
trimmed = true;
}
}
} // single line comment

if (! inquote() )
{
if ( sb[sboff] == '<'
&& sboff < (sb.Length - 4)
&& sb[sboff + 1] == '!'
&& sb[sboff + 2] == '-'
&& sb[sboff + 3] == '-' )
{
sb.Insert(sboff + 4, "
\r", 1);
sboff += 4;
trimmed = true;
}

if (sb[sboff]== ';' && sb[sboff + 1] == '\r')
sb = sb.Remove(sboff + 1, 1);

} // Not inquote

} // delcomments

if (!trimmed )
sboff += 1;

trimmed = false;
}
}
catch(Exception ex)
{
/* Do nothing here.
It is possible that we may try to read past the end of the StringBuilder. if we
try to, we just catch the exception and ignore it.
*/

}
return sb.ToString();
}



public override void Write(byte[] MyBuffer , int offset, int count)
{
byte [] data= new byte[count];
Buffer.BlockCopy(MyBuffer, offset, data, 0, count);

/* Don't use ASCII encoding here. The .NET IDE replaces some characters, such as &reg;
with a UTF-8 entity. If you use ASCII encoding, you'll get B. instead of the registered
trademark symbol.
*/
string s = System.Text.Encoding.UTF8.GetString(data);

// ROM: 1/26/2004: Moving to StringBuilder
StringBuilder sb = new System.Text.StringBuilder(WhitespaceStrip(s));

/* Eliminate known comments.

We use three comments in our template. These comments go on every single page on the site.
Obviously, we can kill them when they are going out. This way, the comments stay in for
maintenance, but are trimmed before release.
*/

/*sb = sb.Replace("
<!-- Page Content Goes Above Here -->", String.Empty);
sb = sb.Replace("
<!-- Page Content Goes Below Here -->", String.Empty);
sb = sb.Replace("
<!-- Do not get rid of this " & Chr(160) & " on data pages -->", String.Empty);
*/


// Replace long tags with short ones
sb = sb.Replace("
", "").Replace("", "");
sb = sb.Replace("
", "").Replace("
", "");

// Finally, we spit out what we have done.
byte [] outdata = System.Text.Encoding.UTF8.GetBytes(sb.ToString());
_sink.Write(outdata, 0, outdata.GetLength(0));

} //Write



}
}

GeneralIssues
arthur dzhelali
4:22 3 Feb '04  
there is an issues with this approach.
1. Asp.net forms do not work very well with that java code you have. it needs to be modified or not used at all.
2. Binary output. If website does image proccessing on the fly
this code totaly destroys it.
<img src="imageProc.aspx">

GeneralString.Empty
Keith Farmer
12:18 28 Jan '04  
.. yet another way you can get some performance gain. Probably not much, but it's also not much to add.
GeneralRe: String.Empty
Michael Russell (Layton)
12:34 28 Jan '04  
After benchmarking it on the 1-pass version, while it does not improve performance, it does slightly reduce the memory used by the process.

I'll put up 1-pass 2.2 shortly on Layton's site, and submit an update to this article later this week on v2.2.
General1-Pass Take 1 Available
Michael Russell (Layton)
10:08 27 Jan '04  
Okay, good news:

My first try at a single-pass version of the above is available. I still use StringBuilder.Replace to handle some entities, but whitespace and comment removal are handled via a single pass. Single and multiple line comments are removed.

This new version works well with all JavaScript and handles <pre> tags. Whitespace removal and script reformatting avoid strings in JavaScript.

On a test page with multiple validators and nearly 30k of JavaScript code, the page went from 72,944 bytes to 66,632 bytes.

Bad news:

While JavaScript comments are removed, HTML comments are not removed...yet. VBScript comments are not removed.

When you use HttpResponse.Filter, you do not get the entire page at once. Instead, you get anywhere from a 4k to a 32k chunk of the file. I moved some state variables to an object scope instead of a local scope to compensate.

It still isn't the most efficient version. Optimization tips are welcome.

Available on Layton City's released source page at http://www.laytoncity.org/public1/code/code.htm.
GeneralRe: 1-Pass Take 1 Available
dog_spawn
1:34 28 Jan '04  
I looked at that code. You are going in the right direction by eating characters and knowing what to do based on what state you are in. And of course the StringBuilder is still used in one place: to build up the 'current tag'. All you have to do is to extend that idea until you have no 'Replace' at any point.

It seems using filter is a bad solution compared to HttpModule and so on. This is now going a bit off topic from your article Smile

To be honest, the code was pretty hard to read. I use C# mostly so if I seem to misunderstand something it is probably because I am not used to VB.NET. Especially with no curly braces D'Oh! Blush

Anyway, I wish you continued progress and good luck!
GeneralRe: 1-Pass Take 1 Available
SimonGreen
2:27 28 Jan '04  
I think a filter is the correct way to do it - whether you package it up in a module or call it from global.asax events is irrelevent IMO, it's just how you package it up.

If memory serves there is some SGML (HTML) Parsing code on GotDotNet somewhere that I think works on streams. This would be ideal to using ina filter like this to keep track of where you are upto as the byetes come in / go past (ie. am I in an element? in an attribute?) and remove the need to load the data into strings and do search & replace operations.

GeneralRe: 1-Pass Take 1 Available
Rocky Moore
17:02 2 Feb '04  
Yeah, this is one of the examples that would be best coded in VC++. A VC++ compression module would be a useful tool!



Rocky <><
www.HintsAndTips.com www.GotTheAnswerToSpam.com
GeneralOne path Or Loop what is more efficient?
arthur dzhelali
7:09 27 Jan '04  
Here is two variants on how to remove extra chars.
it looks like second one is faster, but I would think that first one should be. any thoughts?

dim s as string= " Some string more string evenore string... ..."
dim ssb as new system.text.stringbuilder(s)
dim i as in32=0
While i < ssb.Length - 1
If ssb.Chars(i) = " "c And ssb.Chars(i + 1) = " "c Then
ssb.Remove(i + 1, 1)
i = i - 1
End If
i = i + 1
end While

'*********************

dim s as string= " Some string more string evenore string... ..."
dim ssb as new system.text.stringbuilder(s)
dim i as in32=0
Do While I <> .Length - 1
I = ssb.Length - 1
ssb.Replace(" ", " ")
loop
Generalmmm
dog_spawn
13:51 27 Jan '04  
Obviously removing whitespace is a piece of p*ss. But you are forgetting we only want to remove some whitespace. This is HTML remember Smile

I will say it again (see my earlier comments), don't use StringBuilder. Use streams properly. This way you can directly write to the output and utilize ASP.NET's optimized buffering.
GeneralConsidered Using Response.Filter & GZIP?
stevensk
18:09 26 Jan '04  
A few months back I found that Ben Lowery had done some interesting work with Response.Filter and GZIP, resulting in his HTTPCompressionModule[^], which he is kindly distributing freely (with source code and examples).

There is also an article he wrote on Filtering HTTP Requests with .Net[^] , which you may find of interest.

I'd still be interested to see the results of your work using a one-pass algorithm with Response.Filter.

Best of luck with it,

Kim
GeneralRe: Considered Using Response.Filter & GZIP?
SimonGreen
2:34 27 Jan '04  
We also have a compression module for ASP.NET which, while not free, does have some additional features which may be interesting:

Hierarchical configuration options and a rule engine which allows it to caters for many of the 'gotchas' where different browser versions may have bugs that mean they can't accept compressed content when they say they can and also where they can accept it even when not specifically requesting it. The rules can be extended to suit your own application.

It compresses streamed content (using Buffering / Response.Flush)

Suppression feature (for sending 304 response if the content hasn't changed)

In-built performance statistics page.

We did have a look at doing some white-space removal as an extra option but the additional saving it bought when used with compression was negligable and not worth the extra CPU time IMO.

White-space removal works great on it's own but the savings aren't cumulative. ie. if WSR saves 7% and compression saves 80%, you don't save 87%, you may only get 81%.

http://www.intesoft.net/aspaccelerator/[^]

BTW: I don't think IIS 5.1 has compression - it was a separate filter available on IIS 5.0 but not available on workstation. IIS 6.0 has it integrated (ie. not an ISAPI filter you have to load) but the legacy ISAPI/COM approach is not as fast as completely managed code.

- Simon Green MCSD.NET
Generalblowery.org = cool
dog_spawn
13:47 27 Jan '04  
I use Lowery's module too http://www.blowery.org/[^]. It is better than any commercial solution I have found because it is open source. Normally, I don't mind about open source, but in this case it is very handy to customize the code.

Obviously, this is only useful for coder people.

SimonGreen wrote: legacy ISAPI/COM approach is not as fast as completely managed code
Well, I think many people would disagree with this (including myself)...

My experience is ISAPI is faster. But I use .NET modules because they are easier to code and maintain. Speaking of speed, the link above has a simple benchmark.

Here is another product worth looking at:
http://www.port80software.com/products/httpzip/[^]
GeneralRe: blowery.org = cool
SimonGreen
14:12 27 Jan '04  
Yes, I agree. The blowery module *is* very good. I'd disagree that being open source necessarily makes anything better better or worse though. It's only better if having the source is the most important thing to you surely?

We've tried to make ours 'better' with useful features such as the rich configuration options, performance stats and support for streamed content but it's horses for courses ... one size does not always fit all.

The speed of a 100% managed solution vs an ISAPI / COM legacy approach is more science than opinion though - if you do some performance / load testing with ACT (or similar) then I would be extremely surprised if the ISAPI solution was faster - in all the tests we've done it has been much slower. Check out Ben's blog and he mentions similar tests with his component which bear out our own results - the .NET components beat the ISAPI ones for compression and CPU usage.

Port80 had a legacy / ISAPI solution and the reason I think they have now stopped doing it for Win 2003 (and gone for a 'config tool' for IIS 6.0 compression instead) is that the inbuilt compression in IIS is now better than the add-on ISAPI ones (which the IIS one used to be).

It still isn't as fast as the pure .NET ones though from all the testing we've done. Of course, there may be other reasons to go with an ISAPI solution such as support for other legacy code (ASP) and compression of static HTML content. One of the downsides is you need admin access to the server to use them which again, is where .NET solutions shine IMO.

There are lots of ISAPI compression filters for IIS besides the Port 80 one: Pipeboost, XCompress, TurboIIS and others.

GeneralRe: blowery.org = cool
dog_spawn
1:39 28 Jan '04  
Interesting!

SimonGreen wrote: There are lots of ISAPI compression filters for IIS besides the Port 80 one: Pipeboost, XCompress, TurboIIS and others
Link to them too then Poke tongue
GeneralRe: blowery.org = cool
SimonGreen
2:22 28 Jan '04  
"Link to them too then"

You really should learn how to use Google you know ... Poke tongue

Here's all the ones I know about:

.NET Modules
ASPAccelerator.NET
http://www.intesoft.net/aspaccelerator/
Blowery HttpCompression Module
http://www.blowery.org/code/HttpCompressionModule.html
ISAPI Filters
XCompress
http://www.xcompress.com/
IIS Accelerator
http://www.iisaccelerator.com/
SqueezePlay
http://www.innermedia.com/
jetNEXUS for IIS
http://www.preactholdings.com/performance/products/jetnexus/jet-nexus/
PipeBoost
http://www.pipeboost.com/
TurboIIS Pro
http://www.objectsfarm.com/turboiis/
Will that do you? I am more than happy to recommend people take a look at the ISAPI solutions if a pure .NET solution doesn't satisfy their requirements and have always done this.

However, for .NET applications the managed code solutions offer the best flexibility and performance IMO.

GeneralThanks
dog_spawn
2:24 28 Jan '04  
SimonGreen wrote: You really should learn how to use Google
I searched for "google" at AltaVista and couldn't find it. What is the url? Big Grin
Seriously though,

I am going to look at them and also the difference between ISAPI and .NET. Thanks for posting those links, very useful.
GeneralRe: Thanks
SimonGreen
3:15 28 Jan '04  
No problem - let me know if I can be of any help if you do any comparative testing.

I'm sure I saw once that on one of the lists of top search phrases "google" was in the top 100 words searched for on google itself. Maybe we should be searching for intelligent life on Earth before we get sidetracked with Mars ! Smile
GeneralRe: blowery.org = cool
Ben Lowery
9:35 3 Feb '04  
Hah, thanks! I'm really glad people like it and find it useful.

I have to say though, the product offered by Simon's company is also very good and if you need a supported product, I'd definitely go with them. I see him all over the place on forums answering questions and I've had nothing but good experiences with him.


Last Updated 27 Jan 2004 | Advertise | Privacy | Terms of Use | Copyright © CodeProject, 1999-2010