Click here to Skip to main content
Click here to Skip to main content
Go to top

PowerShell Script for Reviewing Text Shown to Users

, 26 Jun 2014
Rate this:
Please Sign up or sign in to vote.
A script for extracting string literals from source code for review

 

Introduction

It's embarrassing when users see text containing spelling and grammar errors. Error handling code is most likely to contain bad prose because developers are sloppy with code that isn't "supposed" to ever run. Imagine a snapshot of your dialog landing on your bosses desk. The message text reads "If you got here, your #$%@ed." How humiliating for your boss to see that you misspelled "you're".

String tables are supposed to avoid this problem, but error handling strings often get left out. After all, why go to the effort to put a message string in a table when you know the code will never run? And while it's an exaggeration to say the code never runs, it may run so seldom that it is never seen in testing.

The PowerShell script described in this article searches through a source code tree and extracts string literals that may be visible to users. It tries to filter out strings that are code from strings that are prose. The script isn't perfect; a complete solution would require a lot more work than a little script. But it does a good job of finding errors that would otherwise go undetected. You will probably find that your source code has far more typos than you thought.

Other solutions to this problem have been proposed, such as spell checkers that run in the development environment. One advantage to the approach presented here is that the strings can be examined alone. The output could be given to an editor to review, someone without the desire to open thousands of source files. Another advantage is that the strings appear without context, just as the user sees them. We can forget that the user doesn't see the source code we were working on when we inserted a message box. If a message doesn't make sense in the text review report, it probably wouldn't make sense to a user either.

Although the original intention of the script was to find spelling and grammar errors, the script is also a useful tool for code reviews. If a project has a large amount of redundant code from "clipboard inheritance" this will show up in the text review, particularly if the repetitive code contains distinct typos. The script also makes it evident if a project is constructing HTML, SQL, or JavaScript by string concatenation.

Using the Code

The script takes one optional argument, the path of the root of the directory to search. If no argument is provided, the script explores the current working directory. The source code directory is searched recursively. The script contains a list of file extensions to specify which kinds of files to search. The script writes to the command line, and so you will usually want to pipe the output to a text file.

PS C:\> .\TextReview.ps1 <a href=""file:///C:/foo/bar"">C:\foo\bar</a> > out.txt

The output will list each file name followed by the string literals that are not filtered out, each with its line number. Files not containing strings are omitted.

You may want to examine the file out.txt in Microsoft Word to run its spelling and grammar check on the output.

The script is configured to search C++, C#, VB, ASP.NET, JavaScript, and XML files. You may want to modify this line to change the file extensions you want to search.

$sourceExtensions = "\.(cs|vb|aspx|resx|cpp|rc|h|js|xml)$"

You may also want to modify some of the regular expressions used to filter out strings that appear to be source code rather than text intended for human readers.

If this is your first PowerShell script to run, you will need to set your execution policy to allow scripts to run on your computer.

Points of Interest

This script began as a Perl script used for extracting strings from MFC code. I've since rewritten it as PowerShell and now use it mostly on ASP.NET and WinForms projects. The file extensions and pattern filtering have had to evolve as the script has been used with new languages.

History

  • 11th April, 2008: Initial post

License

This article, along with any associated source code and files, is licensed under A Public Domain dedication

Share

About the Author

John D. Cook

United States United States
I am an independent consultant in software development and applied mathematics. I help companies learn from their data to make better decisions.
 
Check out my blog or Follow on   Twitter   Google+

Comments and Discussions

 
GeneralVery entertaining read. PinmemberAshaman15-Apr-08 2:54 
This is a nicely written article.

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web03 | 2.8.140922.1 | Last Updated 26 Jun 2014
Article Copyright 2008 by John D. Cook
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid