This is the first of a four part series of articles on a system that allows you to protect your software against unlicensed use. This series will follow the outline below:
- Describe the architecture of the system
- Describe the creation of the license key
- Describe the process of installing a license key
- Describe the process of validating an installation
Along the way, we will look at n-bit block feedback encryption (also known as cipher-block chaining; see the Wikipedia article), and an obfuscation method that should suitably prevent any attempts to understand how the system works. To add to the difficulty of reverse engineering the system, the result will be a set of COM objects written in unmanaged C++. Interop can be used to invoke the appropriate methods from the .NET framework.
Protecting the investment of time and energy made to develop the next great software application is something that worries a lot of developers. If you are one of the software powerhouses, hackers / crackers will target you. But for everyone else, having a decent system that prevents casual copying should ward off all but those with the skill and determination to use your software without paying you for your services.
This was always an interesting problem in my mind, but it wasn't until I wrote a fairly large application (based on Leslie Sanford's awesome MIDI C# Toolkit) to manage my MIDI addressable music devices on stage during the shows my band played that I started taking the idea seriously. After all, it took me two years to get the system to a truly usable point, and since I thought the system was definitely marketable, I wanted to protect my investment.
Before we begin, your expectations need to be properly set. Specifically, no one reading this series should delude themselves into thinking this system, or any copy-protection mechanism for that matter, is ironclad. The question you should be asking yourself is not "can anyone crack this" but instead "does the person have the skills to do it and will they think it is a reasonable investment of their time to figure out how it works?" Remember: where there's a will, there's a way.
The overall objective of the system, obviously, is to prevent people from distributing your application in an unauthorized manner. More specific objectives that lead to this main goal are listed below:
- Ensure that attempts to reverse engineer the code responsible for providing these services is not trivial
- Provide that any license keys are sufficiently "hidden" when stored, i.e., they are not easily found nor can they be easily changed
- Separate the code to create license keys from the code to validate them to allow your application to distribute only the validation routines
- Provide for easy integration with Microsoft's installation utility or a home grown installer
- Allow your application to determine if an installation is valid
- Allow your application to specify licensed capabilities, e.g., allow them to use the runtime but not the designer
- Allow your application to determine the date of installation so that evaluation periods may be supported
Since the prevalent means of distributing frameworks like this is COM, we will be using that for the create key and validate key functionality. By necessity, the install key functionality will simply be a DLL with the appropriate entry points exported. This is because Microsoft's installation utility does not allow for integration with external COM objects, ironically enough.
Let's match up the objectives with specific means to achieve those objectives.
- Reverse engineering should be non-trivial. The code will be written in a non-managed language. While there is still the possibility of using a disassembler to view the compiled output, the difficulty with which understanding of the code is achieved is considerably higher, especially given the skeletal constructs that are created by the IDL compiler to support the necessary interfaces with the COM subsystem.
- Hidden license keys. We will be using the n-bit block feedback cipher to encrypt the data that the license keys provide, and will also use it to encrypt the key itself before storing it.
- Separation of code. We will have separate executable images for each of the main function groups.
- Easy integration with Microsoft's installation utility. We will discuss this in detail in part 3.
- Determine if an installation is valid, specify licensed capabilities and evaluation periods. This is all related to the data that we store in the key itself, and will be discussed in parts 2 and 4.
In addition to these, I also added the ability to specify a manufacturer identifier and a product identifier. This was to allow me, as was originally intended, to write a Web Service that would accept a license key printed on a packing slip and generate an installation key as a result. Having these identifiers would have allowed me to advertise this Web Service to software vendors that wanted to use this system.
Below is the logical architecture of the system:
The diagram above illustrates the final, logical design of the system. It evolved to this structure as the capabilities of the system matured. The shared functions reside in a statically linked library and contain the cipher functions in addition to the storage / retrieval system. The other components are fairly simplistic in nature; typically, they convert the arguments to internal structures and call one or more functions in the shared library.
The COM interfaces are listed below. Since COM interfaces are easily inspected using any number of publicly available tools, the argument names are intentionally vague. Granted, this isn't as robust as code obfuscation, but since there are a total of three interfaces, I didn't feel that obfuscation was worth the hassle.
BSTR NbbfCLib.Create(LONG lID1, LONG lID2, LONG lID3);
bool NbbfVLib.Validate(LONG lID1, LONG lID2, LONG lID3);
LONG NbbfVLIb.Elapsed(LONG lID1, LONG lID2);
As stated above, the install key functionality is - by necessity - a standard DLL with exported functions. These functions are listed below:
void InstallMSI(MSIHANDLE hInstall);
bool InstallDirect(LPSTR lpstrKey, LONG lID1, LONG lID2);
We'll understand the meaning of each of the arguments as we discuss the specific functionality of each component.
N-bit Block Feedback Encryption
As a tangent, I remember "discovering" this algorithm one morning in the shower as I prepared to go to work. I was obviously disappointed when I saw that I wasn't the first person to think of it, but given the difficulty that is usually associated with Cryptography, I was secretly happy that I didn't look like a total idiot either. According to the Wikipedia entry, this type of cipher (also known in some places as cipher block chaining) was developed by IBM in 1976 and patented under the title "Message verification and transmission error detection by block chaining" (US Patent 4074066).
The cipher essentially works in the following manner. We'll use
schar to denote a character from the source (unencrypted),
pchar to denote a character from a password-like overlay, and
dchar to denote a character after the cipher operation has completed.
XOR schar and pchar to get dchar
For each i > 0, XOR schar[i] and pchar[i] and dchar[i-1]
Decryption is exactly the opposite process.
As a side note: n (in the name of the cipher) is 8 since 8-bits is one character, and that is the block size.
Since I am processing one byte at a time - and because I want to look like a professionally done registration key - I also take the opportunity to convert the data (which is numerical) from base 16 to base 32, using a modified character set to represent values above 9. In hexadecimal numbers, go from 0-9A-F, and base 32 would be 0-9A-V. But to mix it up, I extend that all the way to the letter Z and removed 4 letters from the middle of the alphabet. Then, I moved the digits 0-9 from the front to various places within the set of letters.
The conversion works this way: the code looks at the hexadecimal digit being processed and then finds its position in the array of pseudo base 32 set of "digits". The index within that array is taken as the numerical value for the original digit. The XOR operations are performed on the digits post conversion to ensure that a complete range of values are produced as a result.
It's important to note my choice of the size of the set of digits: it's a power of 2. This means that any unused bits (in this case, the 3 most significant bits) will never be set, meaning that you never run the risk of ending up with a numerical value that cannot be represented using the set of digits. It's also important to note that, since a hexadecimal number is always used as an input to the conversion, every hexadecimal digit must be in the set of characters used in the base 32 conversion. Specifically, the set looks like the following:
#define BASE32_CHARSET (LPSTR)"AB0CD1EF2HI3KL4MN5PQ6RS7TV8WX9YZ"
If you want to change things to prevent your use of the library from being "cracked" by readers of this article, simply rearrange the letters in the preprocessor macro definition.
The next installment will look at the mechanism used to create the keys themselves.
- September 7, 2009 - Initial version.