Click here to Skip to main content
15,867,453 members
Articles / Web Development / ASP.NET

PowerShell Script for Reviewing Text Shown to Users

Rate me:
Please Sign up or sign in to vote.
4.67/5 (6 votes)
26 Jun 2014BSD3 min read 40.3K   265   16   1
A script for extracting string literals from source code for review

 

Introduction

It's embarrassing when users see text containing spelling and grammar errors. Error handling code is most likely to contain bad prose because developers are sloppy with code that isn't "supposed" to ever run. Imagine a snapshot of your dialog landing on your bosses desk. The message text reads "If you got here, your #$%@ed." How humiliating for your boss to see that you misspelled "you're".

String tables are supposed to avoid this problem, but error handling strings often get left out. After all, why go to the effort to put a message string in a table when you know the code will never run? And while it's an exaggeration to say the code never runs, it may run so seldom that it is never seen in testing.

The PowerShell script described in this article searches through a source code tree and extracts string literals that may be visible to users. It tries to filter out strings that are code from strings that are prose. The script isn't perfect; a complete solution would require a lot more work than a little script. But it does a good job of finding errors that would otherwise go undetected. You will probably find that your source code has far more typos than you thought.

Other solutions to this problem have been proposed, such as spell checkers that run in the development environment. One advantage to the approach presented here is that the strings can be examined alone. The output could be given to an editor to review, someone without the desire to open thousands of source files. Another advantage is that the strings appear without context, just as the user sees them. We can forget that the user doesn't see the source code we were working on when we inserted a message box. If a message doesn't make sense in the text review report, it probably wouldn't make sense to a user either.

Although the original intention of the script was to find spelling and grammar errors, the script is also a useful tool for code reviews. If a project has a large amount of redundant code from "clipboard inheritance" this will show up in the text review, particularly if the repetitive code contains distinct typos. The script also makes it evident if a project is constructing HTML, SQL, or JavaScript by string concatenation.

Using the Code

The script takes one optional argument, the path of the root of the directory to search. If no argument is provided, the script explores the current working directory. The source code directory is searched recursively. The script contains a list of file extensions to specify which kinds of files to search. The script writes to the command line, and so you will usually want to pipe the output to a text file.

PS C:\> .\TextReview.ps1 <a href=""file:///C:/foo/bar"">C:\foo\bar</a> > out.txt

The output will list each file name followed by the string literals that are not filtered out, each with its line number. Files not containing strings are omitted.

You may want to examine the file out.txt in Microsoft Word to run its spelling and grammar check on the output.

The script is configured to search C++, C#, VB, ASP.NET, JavaScript, and XML files. You may want to modify this line to change the file extensions you want to search.

$sourceExtensions = "\.(cs|vb|aspx|resx|cpp|rc|h|js|xml)$"

You may also want to modify some of the regular expressions used to filter out strings that appear to be source code rather than text intended for human readers.

If this is your first PowerShell script to run, you will need to set your execution policy to allow scripts to run on your computer.

Points of Interest

This script began as a Perl script used for extracting strings from MFC code. I've since rewritten it as PowerShell and now use it mostly on ASP.NET and WinForms projects. The file extensions and pattern filtering have had to evolve as the script has been used with new languages.

History

  • 11th April, 2008: Initial post

License

This article, along with any associated source code and files, is licensed under The BSD License


Written By
President John D. Cook Consulting
United States United States
I work in the areas of applied mathematics, data analysis, and data privacy.

Check out my blog or send me a note.

 


Comments and Discussions

 
GeneralVery entertaining read. Pin
Ashaman15-Apr-08 2:54
Ashaman15-Apr-08 2:54 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.