In this article I am representing the approach on how the application specific (small/large amount of) data can be stored in ms word files as well as how it can be made hidden from end user’s eye.
Similar approach can also be used for ms excel and ms powerpoint files.
Mechanisms used to achieve the above stated objectives are:
1) Compound file Implementation of Structured storage (for storing application specific data in hidden form)
2) VSTO addins (to restore the application specific data after it is lost on save operation)
1) Compound file Implementation of Structured storage:
Compound File Format and MS Office File Format:
Compound Files (OLE compound files) may be viewed as a file system within a file or we can say a file-based implementation of OLE Structured Storage.
They allow you to create files (known as streams) and directories (known as sub-storages) within a single file.
More Information on compound files can be checked from following link
The OLE Compound files have been included in the Windows API many years ago to support the OLE enabled applications such as MS Word, Excel, Powerpoint and many others.
All the three types of files(ms word,ms excel,ms powerpoint) have their own compound file format.
File format of word, excel, and powerpoint can be checked from following link
Note: We do not need to understand these complex file formats. The only thing we should know is the basic of compound files. That is what are storages and streams.
To view the built-in storages and streams within msword,xls and ppt files, please download the compound file explorer from the following link :
Install it and Run the compound file explorer. Open the ms word file in compound file explorer. It will show the built-in storages and streams within it. Similarly, one can open the ppt or xls file in compound file explorer to view the built-in storages and streams.
COM provides a standard implementation of structured storage called Compound Files. Compound files provide following benefits.
Hiding of Data:
By using compound file apis, we can create our own storage and streams within msword, xls and powerpoint files. These storages and streams are not visible to the user when document is opened in respective application.
File-system and platform independence:
Because COM's Compound Files implementation runs on top of existing flat-file systems, compound files stored in the FAT file system, NTFS file system, or Macintosh file systems can be opened by applications using any one of the other file systems.
I used IStorage-Compound File Implementation in my sample application.
The compound file implementation of IStorage allows you to create and manage substorages and streams within a storage object residing in a compound file object. To create a compound file object and get an IStorage pointer, call the API function StgCreateStorageEx. To open an existing compound file object and get its root IStorage pointer, call StgOpenStorageEx.
For API reference please check following URL:
Using these apis, I was successful in adding the application specific data in ms word document. No problem was detected on copying /mailing the ms word document.
But still there was one problem:
I tested and found that if some modification/addition/deletion of data is done in the file and then if file is saved, that application specific data was getting removed from that file.
To solve this problem, I created the VSTO addins which is described below.
2) VSTO Addins:
I created the VSTO addins for MS word. Similar addin can also be created for ppt and xls file.
This addin check if the application specific data has been removed, then reinsert the application specific data back into the ms word file.
Basically, using addin, it is possible to capture the various events in doc file.
Similary if addin is created for powerpoint and excel application, then after the installation of these addins,one would be able to capture the various events in excel and powerpoint files.
Visual Studio Tools for Office enables you to create the Application Level addins.
Application-level add-ins consist of a managed code assembly that runs as an add-in in a Microsoft Office application. Add-ins that are created by using Visual Studio Tools for Office have access to the Microsoft .NET Framework as well as the application's object model. When you build an add-in project, Visual Studio compiles the assembly into a .dll file and creates a separate application manifest file.
Visual Studio Tools for Office provides a loader for add-ins that are created by using Visual Studio Tools for Office. When a user starts the Microsoft Office application, this loader starts the common language runtime (CLR) and the Visual Studio Tools for Office runtime, and then loads the add-in assembly. The assembly can capture events that are raised in the application.
In VSTO Addin solution, you can do the following:
Respond to events that are raised in the application itself (for instance, when a user clicks a menu item).
I checked for the open, save and close events for the office files and with the help of these events I implemented a solution to reinsert the application specific data after it is lost on save operation.
So compound file apis and VSTO addins together provide the solution to the problem.
Source Code Organization
The Proof Of Concept(poc) code available in this article has following organization
Proof Of Concept
-FileStorage -> MFC Regular dll that uses Compound file Implementation of
structured storage for storing DocumentId(application specific data) in
hidden form. </br>
Using the Code
-Adding DocumentID in MS word (which would be hidden from end user eye)
1. Open the \Proof Of Concept\FileStorage project and build the FileStorage dll in release mode.
2. Now Copy the \Proof Of Concept\FileStorage\release\FileStorage.dll in \windows\system32
3. Open \Proof Of Concept\DocumentIDReader project and build it and run it
4. It shows the form with three buttons: Browse, AddDocumentID and Read DocumentID.
5. Use the browse button and select some ms word file say “file1.doc”
6. Click on Add DocumentID button to insert the DocumentID in the file1.doc.
If ID is embedded successfully, then “DocumentID added” message is shown.
7. Now click on the Read DocumentID button and it would show the document id inserted in ms word document.
8. Now open the file1.doc and you will find that document id is hidden from user.
-Document ID lost on ms word save operation
9. Now open the file1.doc in ms word application and type some text in this file and now click on save button.
10. Close the file1.doc file and now again run DocumentIDReader application and select file1.doc
11. Click on Read DocumentID button and it will show “DocumentID not found.”
Preventing the "document ID lost" on ms word save operation
12. Open the \Proof Of Concept\Common project and build this c# assembly.
13. Open the \Proof Of Concept\SocketServer project and remove the existing reference of c# assemble named “Common”
14. Add the newly build reference of c# assembly "Common" in SocketServer
15. Now build the socketserver project and run the socketserver exe
16. Open the \Proof Of Concept\WordAddIn1 project and remove the existing reference of c# assemble named “Common”
17. Add the newly build reference of c# assembly "Common" in WordAddIn1 project.
18. Build WordAddIn1 in debug mode. Now run the word addin application using Debug menu Debug->Start debugging
19. When this addin is run, it opens a new blank document document.Now open some word file and write some data and close the ms word application with or without save.
Now also close the document1.
20 On closing the document, it may show the following message
This file is in use by another application or user
If above message is shown then click on OK button on the message box.
After clicking OK,it shows SaveAs dialog box. Now click on Cancel button
21.Now close the ms word application.
On closing the ms word application, it may show the following message
"Changes have been made that affect the global template,Normal.dot. Do you want to save those changes"
If above message is shown then click on NO.
Warning messages shown in step 20 and step 21 may occur only during first time/fresh installation of word addin. After that these warning messages would not be displayed.
22.Now the word addin has been installed in the system.
23. Now use the \Proof Of Concept\DocumentIDReader project to insert the document id in file1.doc.
24.Make sure that socketserver exe is running.
25.Now open the file1.doc.Do some modification in the file and save it and then close the file1.doc.
26. Use the \Proof Of Concept\DocumentIDReader project to read the doumentID from file1.doc
27.Now you will see that document ID is not lost from file1.doc.
Removing Word Addin:
If after testing, you want to remove the word addin from system, then please follow below steps:
2.Search for WordAddin1
Delete the "WordAddin1" enteries from the regisrty.>
This will remove the addin from the system.
About The Contributors
This proof of concept has been developed under the supervision of Mr Alok Srivastava,Head,Middleware Solutions Group at NEC HCL System Technologies Ltd, India (NEC HCL ST) by Vinita Batra and Atul Suyal.
You can visit our website http://www.nechclst.in
First Version. Future Versions would be based on comments/suggestions received.
About The Author
Occupation: Software Engineer
Company : NEC HCL System Technologies Ltd
Location : 4th Floor, Tower B, Logix Technopark
Sector 127, Noida 201301