|
|||||||||||||||||||||
|
|||||||||||||||||||||
|
Announcements
Chapters
Services
Feature Zones
|
IntroductionOriginally, I wrote a C++ parser which was used to parse given MS Word documents and put them into some form of a structure that was more useful for data processing. After I wrote the parser, I started working with .NET and C# to re-create the parser. In the process, I also wrote my first article for Code Project, Automating MS Word Using Visual Studio .NET. Several people have requested to see the C++ version of the application, hence, I finally got some time to put something together. I have written this article with the intention of making it easier for someone who is looking for quick answers. I hope that people can benefit from the information provided and help them get started faster. BackgroundNo special background is necessary. Just have some hands on experience with C++. Using the codeI think the best way to present the code would be to first give you the critical sections which you need to get an instance of MS Word, and then give you snapshots of code that perform specific functions. I believe this way will help you get started faster in developing your own programs. The following block is the header portion of the CPP file. Note: The most important include files are <utilcls.h> and <comobj.hpp>. These are used for COM and OLE. // Vahe Karamian - 04-20-2004 - For Code Project //--------------------------------------------------------------------------- #include <vcl.h> #pragma hdrstop // We need this for the OLE object #include <utilcls.h> #include <comobj.hpp> #include "Unit1.h" #include <except.h> //--------------------------------------------------------------------------- #pragma package(smart_init) #pragma resource "*.dfm" TForm1 *Form1; The following block creates MS Word COM Object. This is the object which will be used to access MS Word application functions. To see what functions are available, you can do within MS Word. Refer to the first article, Automating MS Word Using Visual Studio .NET. As before, you can either make a Windows Forms Application or a Command Line application, the process is the same. The code below is based on a Windows Forms application, that has a button to start the process. When the user clicks the button, the Note: To better understand the code, ignore everything in the code except the portions that are in bold. TForm1 *Form1; //--------------------------------------------------------------------------- __fastcall TForm1::TForm1(TComponent* Owner) : TForm(Owner) { } //--------------------------------------------------------------------------- void __fastcall TForm1::Button1Click(TObject *Sender) { . . . // used for the file name OleVariant fileName; fileName = openDialog->FileName; Variant my_word; Variant my_docs; // create word object my_word = Variant::CreateObject( "word.application" ); // make word visible, to make invisible put false my_word.OlePropertySet( "Visible", (Variant) true ); // get document object my_docs = my_word.OlePropertyGet( "documents" ); Variant wordActiveDocument = my_docs.OleFunction( "open", fileName ); . . . So a brief explanation, we define a Next, we define two Next, we define another Notice that most of the variables are of type At this point, we have a Word document that we can start performing functions on. At first, it might take a while for you to see how it works, but once you get a hang of it, anything in MS Word domain is possible. Let's take a look at the following code, it is going to be dealing with tables within a MS Word document. .
.
Variant wordTables = wordActiveDocument.OlePropertyGet( "Tables" );
long table_count = wordTables.OlePropertyGet( "count" );
.
.
As I mentioned before, all your data types are going to be of Variant wordTables = wordActiveDocument.OlePropertyGet( "Tables" ); The line above will return all long table_count = wordTables.OlePropertyGet( "count" ); The line above will return the number of tables in out You might be wondering where do I get this information from? The answer to that question is in the first article: Automating MS Word Using Visual Studio .NET. The next block of code will demonstrate how to extract content from the . . . int t, r, c; try { for( t=1; t<=table_count; t++ ) { Variant wordTable1 = wordTables.OleFunction( "Item", (Variant) t ); Variant tableRows = wordTable1.OlePropertyGet( "Rows" ); Variant tableCols = wordTable1.OlePropertyGet( "Columns" ); long row_count, col_count; row_count = tableRows.OlePropertyGet( "count" ); col_count = tableCols.OlePropertyGet( "count" ); // LET'S GET THE CONTENT FROM THE TABLES // THIS IS GOING TO BE FUN!!! for( r=1; r<=row_count; r++ ) { Variant tableRow = tableRows.OleFunction( "Item", (Variant) r ); tableRow.OleProcedure( "Select" ); Variant rowSelection = my_word.OlePropertyGet( "Selection" ); Variant rowColumns = rowSelection.OlePropertyGet( "Columns" ); Variant selectionRows = rowSelection.OlePropertyGet( "Rows" ); long rowColumn = rowColumns.OlePropertyGet( "count" ); for( c=1; c<=rowColumn; c++ ) //col_count; c++ ) { Variant rowCells = tableRow.OlePropertyGet( "cells" ); Variant wordCell = wordTable1.OleFunction( "Cell", (Variant) r, (Variant) c ); Variant cellRange = wordCell.OlePropertyGet( "Range" ); Variant rangeWords = cellRange.OlePropertyGet( "Words" ); long words_count = rangeWords.OlePropertyGet( "count" ); AnsiString test = '"'; for( int v=1; v<=words_count; v++ ) { test = test + rangeWords.OleFunction( "Item", (Variant) v ) + " "; } test = test + '"'; } } } my_word.OleFunction( "Quit" ); } catch( Exception &e ) { ShowMessage( e.Message + "\nType: " + __ThrowExceptionName() + "\nFile: "+ __ThrowFileName() + "\nLine: " + AnsiString(__ThrowLineNumber()) ); } . . . Okay, so above we have the code that actually will go through all of the tables in the So we have three nested Note: Notice that Next, we get the Next, we get a count of rows and columns in the given Now, we will have to define four new In the most inner Let's sum what we did so far:
Note: Yes, some steps are repeated, but the reason behind it is because not all tables in a given document are uniform! I.e., it does not necessarily mean that if row 1 has 3 columns, then row 2 must have 3 columns as well. More than likely, it will have different number of columns. You can thank the document authors/owners. So then the final step will just step through the cells and get the content and concatenate it for a single string output. And finally, we want to quit Word and close all documents. ...
my_word.OleFunction( "Quit" );
...
That is pretty much it. The code does sometimes get pretty tedious and messy. The best way to approach automating/using Word is by first knowing what it is that you exactly want to do. Once you know what you want to achieve, then you will need to find out what objects or properties you need to use to perform what you want. That's the tricky part, you will have to read the documentation: Automating MS Word Using Visual Studio .NET. In the next code block, I will show you how to open an existing document, create a new document, select content from the existing document and paste it in the new document using Paste Special function, then do clean up, i.e., Find and Replace function. Before you look at the block of code, the following list will identify which variable is used to identify what object and the function that can be applied to them. Variables and representations:
// Get the filename from the list of files in the OpenDialog vk_filename = openDialog->Files->Strings[i]; vk_converted_filename = openDialog->Files->Strings[i] + "_c.doc"; // Open the given Word file vk_this_doc = vk_word_doc.OleFunction( "Open", vk_filename ); statusBar->Panels->Items[2]->Text = "READING"; // ------------------------------------------------------------------- // Vahe Karamian - 10-10-2003 // This portion of the code will convert the word document into // unformatted text, and do extensive clean up statusBar->Panels->Items[0]->Text = "Converting to text..."; vk_timerTimer( Sender ); // Create a new document Variant vk_converted_document = vk_word_doc.OleFunction( "Add" ); // Select text from the original document Variant vk_this_doc_select = vk_this_doc.OleFunction( "Select" ); Variant vk_this_doc_selection = vk_word_app.OlePropertyGet( "Selection" ); // Copy the selected text vk_this_doc_selection.OleFunction( "Copy" ); // Paste selected text into the new document Variant vk_converted_document_select = vk_converted_document.OleFunction( "Select" ); Variant vk_converted_document_selection = vk_word_app.OlePropertyGet( "Selection" ); vk_converted_document_selection.OleFunction( "PasteSpecial", 0, false, 0, false, 2 ); // Re-Select the text in the new document vk_converted_document_select = vk_converted_document.OleFunction( "Select" ); vk_converted_document_selection = vk_word_app.OlePropertyGet( "Selection" ); // Close the original document vk_this_doc.OleProcedure( "Close" ); // Let's do out clean-up here ... Variant wordSelectionFind = vk_converted_document_selection.OlePropertyGet( "Find" ); statusBar->Panels->Items[0]->Text = "Find & Replace..."; vk_timerTimer( Sender ); wordSelectionFind.OleFunction( "Execute", "^l", false, false, false, false, false, true, 1, false, " ", 2, false, false, false, false ); wordSelectionFind.OleFunction( "Execute", "^p", false, false, false, false, false, true, 1, false, " ", 2, false, false, false, false ); // Save the new document vk_converted_document.OleFunction( "SaveAs", vk_converted_filename ); // Close the new document vk_converted_document.OleProcedure( "Close" ); // ------------------------------------------------------------------- So what we are doing in the code above, we are opening an existing document with That's all there is to it! Points of InterestPutting structure to a Word document is a challenging task, given that many people have different ways of authoring documents. Nevertheless, it would help for organizations to start modeling their documents. This will allow them to apply XML schema to their documents and make extracting content from them much easier. This is a challenging task for most companies; usually, either they are lacking the expertise or the resources. And such projects are huge in scale due to the fact that they will affect more than one functional business area. But on the long run, it will be beneficial to the organization as a whole. The fact that your documents are driven by structured data and not by formatting and lose documents has a lot of value added to your business. | ||||||||||||||||||||