One of the most interesting features of the NTFS file system is that of hidden data streams. Well documented but poorly supported by the .NET framework, alternate data streams (ADS) allow information to be stored with files in ways not visible to traditional file viewing techniques.
I became interested in this feature of NTFS (the default file system for Windows 2000 and available for use by WinXP) when I tried to write a tool to manipulate files created by a third party application. I wanted to write an application that would allow a developer to customize their view of the data in the file, and needed a way to persist the additional metadata of the view state without mucking up the file structure of the original. In my particular case, the raw data is XML, and any extension of it by me to include the view metadata would cause it to fail the original author's validation scheme in the original tool.
I had a couple of choices: add a new file with a different extension to the directory with the original file, or keep a list of viewed files local to my application in a database of some sort (XML file, flat file, Access, whatever) and store the metadata in there.
Neither of the choices is very good. The first case may cause the original third party application to fail if it parses all the files in its data directory on execution and doesn't recognize the metadata file. The second case means a lot of extra infrastructure unrelated to the actual information in which I was interested. In both cases, there was a synchronization issue, where a user could rename one of the original data files in Windows Explorer, and then both my file -> metadata file mapping scheme and database location tracking scheme would fail to find the original file.
ADS is the solution. By writing data to an alternate stream of the original data file, you get:
- Transparency - the original data is not affected.
- Synchronization -the metadata stays with the relevant file.
I'll show you a class I wrote to help manage my interactions with ADS.
You can play with alternate data streams at the DOS prompt without code. Type the following at a DOS prompt:
Click the Yes button when you’re prompted to create a new file. Once Notepad opens, type “obvious data” and save the file. Now type the following back at the DOS prompt:
The colon separates the name of the file from the name of your stream. The stream didn’t exist yet, so you’re prompted to create it. Click Yes again. When Notepad becomes ready, type “secret stuff” and save.
The “secret stuff” text is saved with the text.txt file, but it’s hidden in your new ADS named “secret.txt”. If you double-click on c:\test.txt in Explorer, you’ll only see the “obvious data” text. It’s the same if you use the DOS
type command. The only way to view and modify the data is to explicitly name the stream with the file when it gets opened, as you see above in the second DOS command using the colon syntax.
The example above uses text data written to a text file, but there is no restriction on the files to which you can append secret data in an alternate stream. You can read and write secret data to EXE files, DLLs, or any other file type. You can have as many streams in a file as you want, as long as they have different names of course.
Using the code
The basic class is a thin wrapper around a couple of Win32 API calls,
WriteFile(). The Win32 functions are necessary because the .NET framework won’t let you use colons in file names. If you attempt to open a file with a colon in the name using a
TextReader, you get the following error:
System.NotSupportedException: The given path's format is not supported.
at System.Security.Util.StringExpressionSet.CanonicalizePath(String path,
I’m only interested in saving and reading text data (specifically, XML encoded metadata), so I wrap the API calls with
Read() methods that accept and return (in addition to the name of the file and stream to use) strings of data to save or read in the ADS. The class is implemented as a couple of static methods in a class that can’t be instantiated because I don’t need any stateful information to be retained. I call it once to read the view metadata and once to write to it for each of the views of the original document.
Despite the big looming changes in Longhorn and beyond, it never hurts to learn more about the Win32 API. Before you can read and write using
WriteFile() functions, you need a file handle. To get a handle, call
static extern uint CreateFile(string filename,
fileName parameter takes the fully qualified path to the stream using the colon syntax. Pass the
OPEN_EXISTING constant to the
creationDisposition parameter to get a handle for reading, and
CREATE_ALWAYS to get a handle for writing. You use the unsigned integer returned by this function as a parameter to
WriteFile() to actually do the file IO. The read/write code is pretty straightforward, and is well commented in the downloadable code.
The managed assembly also contains a
StreamNotFoundException object that extends
FileNotFoundException and is thrown if the requested stream cannot be found on a read attempt. By extending
FileNotFoundException, you automatically get the
FileName property, and a logical exception to use. It just remains up to this implementation to store the additional information that a consumer of the exception would require, i.e., the name of the stream.
You can use the test project to read and write streams to and from arbitrary files on the file system to exercise the library and get comfortable manipulating alternate streams programmatically. If you run it from the IDE, don’t write to the DLLs and EXEs that your project is currently compiling for execution. You may get concerned that your ADS is disappearing when, in fact, what's happening is that the file is being recreated on every execution. Not that it happened to me of course...
When I use the library in my projects, I use the name of the client namespace for the stream name. That pretty much guarantees that the stream will not be overwritten by some other application.
Points of Interest
I have found the use of ADS to keep metadata with the original data to be very helpful. The code isn’t rocket science, but it is fairly well organized and ready for your consumption.
This CodeProject article shows you how to write a tool to help identify and track alternate streams, which can be used for malicious purposes as well as helpful ones as I’ve demonstrated here.
Understanding the ways information can be encoded surreptitiously in your files can help you identify positive and negative implications.
Jan 23 – Initial revision.