It's embarrassing when users see text containing spelling and grammar errors. Error handling code is most likely to contain bad prose because developers are sloppy with code that isn't "supposed" to ever run. Imagine a snapshot of your dialog landing on your bosses desk. The message text reads "If you got here, your #$%@ed." How humiliating for your boss to see that you misspelled "you're".
String tables are supposed to avoid this problem, but error handling strings often get left out. After all, why go to the effort to put a message string in a table when you know the code will never run? And while it's an exaggeration to say the code never runs, it may run so seldom that it is never seen in testing.
The PowerShell script described in this article searches through a source code tree and extracts string literals that may be visible to users. It tries to filter out strings that are code from strings that are prose. The script isn't perfect; a complete solution would require a lot more work than a little script. But it does a good job of finding errors that would otherwise go undetected. You will probably find that your source code has far more typos than you thought.
Other solutions to this problem have been proposed, such as spell checkers that run in the development environment. One advantage to the approach presented here is that the strings can be examined alone. The output could be given to an editor to review, someone without the desire to open thousands of source files. Another advantage is that the strings appear without context, just as the user sees them. We can forget that the user doesn't see the source code we were working on when we inserted a message box. If a message doesn't make sense in the text review report, it probably wouldn't make sense to a user either.
Using the Code
The script takes one optional argument, the path of the root of the directory to search. If no argument is provided, the script explores the current working directory. The source code directory is searched recursively. The script contains a list of file extensions to specify which kinds of files to search. The script writes to the command line, and so you will usually want to pipe the output to a text file.
PS C:\> .\TextReview.ps1 <a href="file:///C:/foo/bar">C:\foo\bar</a> > out.txt
The output will list each file name followed by the string literals that are not filtered out, each with its line number. Files not containing strings are omitted.
You may want to examine the file out.txt in Microsoft Word to run its spelling and grammar check on the output.
$sourceExtensions = "\.(cs|vb|aspx|resx|cpp|rc|h|js|xml)$"
You may also want to modify some of the regular expressions used to filter out strings that appear to be source code rather than text intended for human readers.
If this is your first PowerShell script to run, you will need to set your execution policy to allow scripts to run on your computer.
Points of Interest
This script began as a Perl script used for extracting strings from MFC code. I've since rewritten it as PowerShell and now use it mostly on ASP.NET and WinForms projects. The file extensions and pattern filtering have had to evolve as the script has been used with new languages.
- 11th April, 2008: Initial post