Click here to Skip to main content
15,860,972 members
Articles / Programming Languages / Javascript
Article

JavaScript Code Compressor

Rate me:
Please Sign up or sign in to vote.
2.75/5 (12 votes)
7 Jul 20056 min read 80.7K   856   27   17
In this article, we will be creating a JavaScript code compressor using C#, which compresses the JavaScript code into a single line.

Sample Image - JSCodeCompressor.gif

Introduction

In this article, we will be creating a JavaScript code compressor using C# and Regular Expressions.

Before we start, I'd like to mention that the best compression method for JavaScript is having no compression at all. That is, writing small JavaScript files that are small enough so that they load at a gasp, should be our aim rather than trying to compress a big bunch of script to decrease download time.

Code compression in JavaScript is generally used for JavaScript library files. (There is also a trade-off between having a huge library file and putting everything needed in it versus having small snippets tailored to each page, keeping code-size small and having no library at all. The former will increase page-load time considerably and may be annoying for visitors. The latter will be rewriting, copy & pasting code around, which will be annoying for you as a developer. To find the optimum point in between is up to you.)

Another benefit of compression is, since compression removes all the indentation, the code becomes harder to read. So if someone decides to reverse-engineer your library s/he will need to spare some time to re-indent the code. It is not as good as obfuscating the code, but it is a level of protection anyway.

Although you can find several JavaScript-based compressors around, there are several advantages of using a compiled executable:

  • It will run apparently faster than JavaScript counterparts.
  • You can arrange the code to process files in batches.
  • You can utilize threads and do your compression in the background.

You may want to look at the uncompressed JavaScript sample and what we will achieve when we pass it inside our compressor.

This class is a part of my JavaScript code utilities package that I am currently designing to make life easier on my side.

You should note that, I didn't take certain things into consideration when creating the class. You may want to have a look at the Known Issues section before proceeding.

The Compressor Interface

Here is our interface:

C#
public interface CodeCompressor 
{
    void Compress(ref String toBeCompressed);
    void Compress(String sourcePath, String destinationPath);
}

The first method will compress and overwrite the string passed by ref. The second method will compress the text in the file existing on sourcePath and create a compressed file on the destinationPath. sourcePath and destinationPath are fully-qualified pathnames to files. If a file on destinationPath does not exist, a new file will be created.

Next, we implement the interface (JSCompressor). This level of indirection allows us to, say implement a VBScriptCodeCompressor adhering to the same interface. This enables us to reach the class with:

C#
Compressor compressor = new JSCompressor();

and use compressor to refer to the object forgetting about whether it is a JSCodeCompressor, VBScriptCodeCompressor or XCodeCompressor; hence Polymorphism.

Enhancing the Interface

After using the final application in real-life cases, we felt in need of a way to set certain levels of compression. For instance, in one case we needed to remove all the comments in the script file but leave the indentation untouched.

This was not possible in the initial design, so in version 1.0.3, I modified the interface to add compression option flags. So here follows the new interface:

C#
public interface CodeCompressor 
{
    void Compress(ref String toBeCompressed);
    void Compress(String sourcePath, String destinationPath);

    bool RemoveComments {set;}
    bool TrimLines {set;}
    bool RemoveCRLF {set;}
    bool RemoveEverthingElse {set;}
}

The GUI has also been modified accordingly. Four checkboxes have been added to set/reset those flags.

The JSCompressor Class

There are several things to mention in this class:

C#
regCStyleComment = new Regex("/\\*.*?\\*/",
    RegexOptions.Compiled|RegexOptions.Singleline);
regLineComment = new Regex("//.*\r\n",
    RegexOptions.Compiled|RegexOptions.ECMAScript);
regSpaceLeft = new Regex("^\\s*",
    RegexOptions.Compiled|RegexOptions.Multiline);
regSpaceRight = new Regex("//s*//r//n",
    RegexOptions.Compiled|RegexOptions.ECMAScript);
regWhiteSpaceExceptCRLF = new Regex("[ //t]+",
    RegexOptions.Compiled|RegexOptions.ECMAScript/);
regSpecialElement = new Regex(
    "\"[^\"\\r\\n]*\"|'[^'\\r\\n]*'|/[^/\\*](?<![/\\S]/.)"+
    "([^/\\\\\\r\\n]|\\\\.)*/(?=[ig]{0,2}[^\\S])",
    RegexOptions.Compiled|RegexOptions.Multiline);
regLeftCurlyBrace = new Regex("//s*{//s*",
    RegexOptions.Compiled|RegexOptions.ECMAScript);
regRightCurlyBrace = new Regex("//s*}//s*",
    RegexOptions.Compiled|RegexOptions.ECMAScript);
regComma = new Regex("//s*,//s*",
    RegexOptions.Compiled|RegexOptions.ECMAScript);
regSemiColumn = new Regex("//s*;//s*",
    RegexOptions.Compiled|RegexOptions.ECMAScript);
regNewLine = new Regex("//r//n",
    RegexOptions.Compiled|RegexOptions.ECMAScript);

These regular expressions are used to find and replace the unnecessary text. They are precompiled with Compiled option to boost their performance.

The second thing to note is quotes and regular expressions are stored in a Hashtable before making any conversion. Because they should remain unchanged for all times. Here is how it is done:

C#
/*mark special elements for later replacement*/
MarkQuotesAndRegExps(toBeCompressed);
 
... the compression logic comes here. ...
 
/* restore the formerly stored elements. */
RestoreQuotesAndRegExps(ref toBeCompressed);

That is, we first store them before doing any conversion (MarkQuotesAndRegExps) and replace them after all conversion is done (RestoreQuotesAndRegExps).

And here is how we read from one file and compress the contents of it to another file:

C#
Encoding locale = System.Text.Encoding.GetEncoding(Constant.CODEPAGE);
StreamReader sr = new StreamReader(sourcePath,locale);
String strCompress = sr.ReadToEnd();
sr.Close();
Compress(ref strCompress);
StreamWriter sw = new StreamWriter(destinationPath, false, locale);
sw.Write(strCompress);
sw.Close();

The encoding class is used for handling foreign developers' nightmare, the localization issue.

We read from a file with StreamReader, compress the contents, and write the compressed contents to another file with StreamWriter.

The GUI

The GUI can be seen at the top of this article.

After we choose a file and click the "compress" button, the code below runs:

C#
if(ThreadCompressor==null) {
    /* start compression. */
    StartCompression();
}
else {
    if(ThreadCompressor.ThreadState != ThreadState.Stopped) {
          LblProgress.Text = 
        "The current operation is in progress.\r\nPlease wait...";
     }
     else {
          /* start compression. */
          StartCompression();
     }
}

ThreadCompressor is a background thread to do compression so that our GUI will not be suspended when the compression operation is in progress.

And here is the code that does the compression using a compressor instance:

C#
private void StartCompression() {
    ThreadCompressor = new Thread(new ThreadStart(Do_Compress));
    ThreadCompressor.Start();
}
 
private void Do_Compress() {
        /*create a filename instance */
        FileName theFileName = new FileName(TxtFileURL.Text);

        /*compress the code*/
        CodeCompressor compressor = new JSCompressor();
        
        /*set additional options for the compressor */
        compressor.TrimLines = ChkTrimBlankLine.Checked;
        compressor.RemoveComments = ChkRemoveComment.Checked;
        compressor.RemoveCRLF = ChkRemoveCRLF.Checked;
        compressor.RemoveEverthingElse = ChkRemoveOther.Checked;
        
        compressor.Compress(theFileName.Path, 
            theFileName.Name + "_compressed"+
            theFileName.Extension);

}

com.sarmal.io.FileName is a utility class to separate the file name and file extension of a file path. As seen from the code above, various flags are set according to the checkbox states. For instance, if ChkTrimBlankLine is checked, then the application will trim lines in the text during compression, otherwise the lines will not be trimmed.

At times, this incremental compression feature may be useful. For example, you may want to reduce the file size but require some readability as well so you do not want to compress everything into a single line.

Known Issues

Listed below are special cases which if exists in the original code, may lead JavaScript errors in the compressed code:

  1. You need to terminate your statements with a semi column (;). Normally it is optional in JavaScript. However the Compressor assumes a (;) at the end of each statement. If not, the compressed code will generate JavaScript errors.
    JavaScript
    var x; //true
    var y //false
    
    var fn = function() {}; //true
    var fn = function() {} //false

    I did not implement it, because it would bring significant additions to the code first of all, and secondly using ; at the end of each statement is a good practice of coding and I wanted to enforce it on myself.

    I plan to add a flag indicating that if checked, line breaks will not be removed. This will alleviate the restriction. In that case, the code will be less compressed as a minus; but the compression will be faster as a plus.

Conclusion

In this project:

  • We created a class file to compress JavaScript files to a single line.
  • We generated a background thread to do our job.
  • We worked with text file IO and localization.
  • We dealt with regular expressions.

The code is well-documented and you may find further details commented in it.

History

  • 2005-06-18
    • Article created.
  • 2005-06-19
    • The "by ref" paragraph removed, since it was incorrect and misleading.
  • 2005-06-20
    • Fixed a bug; the regSpecial was matching incorrectly.
    • Got rid of two for loops as a bonus of the bug fix.
    • "Putting CRLF between consecutive keywords produces JS error" bug sorted out.
    • Updated version to 1.0.1.
    • Necessary parts of the article updated accordingly.
  • 2005-06-30
    • Added several modes of compression; GUI updated accordingly.
    • Bits and pieces of the code modified slightly.
    • The current version is 1.0.3.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


Written By
Web Developer
Turkey Turkey
Volkan is a java enterprise architect who left his full-time senior developer position to venture his ideas and dreams. He codes C# as a hobby, trying to combine the .Net concept with his Java and J2EE know-how. He also works as a freelance web application developer/designer.

Volkan is especially interested in database oriented content management systems, web design and development, web standards, usability and accessibility.

He was born on May '79. He has graduated from one of the most reputable universities of his country (i.e. Bogazici University) in 2003 as a Communication Engineer. He also has earned his Master of Business Administration degree from a second university in 2006.

Comments and Discussions

 
GeneralMy vote of 4 Pin
RusselSSC27-Jan-11 12:24
RusselSSC27-Jan-11 12:24 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.