Click here to Skip to main content
6,630,289 members and growing! (23,298 online)
Email Password   helpLost your password?
Enterprise Systems » Office Development » Office Automation     Beginner License: The Code Project Open License (CPOL)

Word automation (Part 1)

By padmanabhan N

Word automation (Part 1)
C#, Windows, ASP.NET, Dev
Version:4 (See All)
Posted:24 Jun 2009
Views:4,608
Bookmarked:17 times
Announcements
Loading...
 
Search    
Advanced Search
Add to IE Search
printPrint   add Share
      Discuss Discuss   Broken Article?Report  
7 votes for this article.
Popularity: 3.77 Rating: 4.47 out of 5

1

2
1 vote, 14.3%
3

4
6 votes, 85.7%
5

Introduction

In this article, we are going to see some automation concepts like converting Table of Contents to TreeView and converting tables in Word to Excel.

Background

The main concept of this article is that everyone can read the data in Word, but when I had some requirements like getting the Table of Contents and converting the tables to Excel, I really struggled a lot and found the solution. So I thought of sharing those things.

What Does this Article Do?

The automation has two tabs.

1) Converting the Table of Contents to Treeview

While automating any document which has more than 400 pages, we don't have any options to select only a particular part which we want to automate. So, I thought that the Table of Contents would be helpful. The Table of Contents will have the key words or the headings of the details given in the document. This article will find the Table of Contents and store it in TreeView, both parent and child.

2) Converting Tables from Word to Excel

This is the second issue that I faced. While automating Word which has a lot of tables, the alignment was bad and for that, I had to delete those tables. But later, I had a requirement for those tables also to be exported. So, I started Automation of Tables to Excel.

word4.JPG

Table of Contents

The uploaded Document is taken and the Table of Content is identified. There, it is been separated as parent and child.

word1.JPG

if (doc.TablesOfContents.Count != 0)
{
doc.TablesOfContents[1].IncludePageNumbers = false;
Table conTable = doc.TablesOfContents[1].Range.ConvertToTable
		(ref nullobj, ref nullobj, ref nullobj, ref nullobj,
ref nullobj, ref nullobj, ref nullobj, ref nullobj, 
	ref nullobj, ref nullobj, ref nullobj, ref nullobj,
ref nullobj, ref nullobj, ref nullobj, ref nullobj);
string insertRange1 = string.Empty;
int tblrowa = conTable.Rows.Count;
int tblcola = conTable.Columns.Count;
for (int tblrow2 = 1; tblrow2 <= tblrowa; tblrow2++)
{
for (int tblcol2 = 1; tblcol2 <= tblcola; tblcol2++)
{
insertRange1 = conTable.Cell(tblrow2, tblcol2).Range.Text;
}
}
string[] Content = insertRange1.Split(new char[] { '\r' });
List<string> cellValdata = new List<string>();
TreeNode Parenttree = new TreeNode();
TreeNode treechild = new TreeNode();
//assigning the parent and child nodes
for (int cou = 0; cou <= Content.Length - 1; cou++)
{
CopyWithProgress(Content.Length);
if (Content[cou] != "" && Content[cou] != "\a")
{
string[] FurtherSplit = Content[cou].Split(new char[] { });
if (FurtherSplit[0].Length <= 2)
{
if (Content[cou] != "" && Content[cou] != "\a")
{
Parenttree = new TreeNode(Content[cou]);
tree.Nodes.Add(Parenttree);
}
}
else
{
if (Content[cou].Contains("."))
{
if (Content[cou] != "" && Content[cou] != "\a")
{
treechild = new TreeNode(Content[cou]);
Parenttree.Nodes.Add(treechild);
}
}
else
{
Parenttree = new TreeNode(Content[cou]);
tree.Nodes.Add(Parenttree);
}
}
}
ProgressBar.PerformStep();
}
}

Remember it is been assumed that there can be only one Table of Contents for each document.

Check and Uncheck the TreeNodes

When a parent is checked, all the child nodes will be checked and when any one of the child nodes is unchecked, then the parent node will be unchecked.

word3.JPG

Boolean bChild = true;
Boolean bParent = true;
private void tree_AfterCheck(object sender, TreeViewEventArgs e)
{
if (bChild)
{
CheckAllChildren(e.Node, e.Node.Checked);
}
if (bParent)
{
CheckMyParent(e.Node, e.Node.Checked);
}
} 
void CheckAllChildren(TreeNode tn, Boolean bCheck)
{
bParent = false;
foreach (TreeNode ctn in tn.Nodes)
{
bChild = false;
ctn.Checked = bCheck;
bChild = true;
CheckAllChildren(ctn, bCheck);
}
bParent = true;
} 

void CheckMyParent(TreeNode tn, Boolean bCheck)
{
if (tn == null) return;
if (tn.Parent == null) return;
bChild = false;
bParent = false;
tn.Parent.Checked = bCheck;
CheckMyParent(tn.Parent, bCheck);
bParent = true;
bChild = true;
}

Expand All

Expand all is used to expand the TreeView:

private void btnExpandAll_Click(object sender, EventArgs e)
{
this.tree.ExpandAll();
}

Collapse All

Collapse all is used to collapse the TreeView:

private void btnCollapseAll_Click(object sender, EventArgs e)
{
this.tree.CollapseAll();
}

Check All

Check all is used to check all the TreeView parent nodes and Child nodes:

private void btnCheckAll_Click(object sender, EventArgs e)
{
for (int node = 0; node < tree.Nodes.Count; node++)
{
tree.Nodes[node].Checked = true;
if (bChild)
{
CheckAllChildren(tree.Nodes[node], tree.Nodes[node].Checked);
}
if (bParent)
{
CheckMyParent(tree.Nodes[node], tree.Nodes[node].Checked);
}
}
} 

UnCheck All

UnCheck all is used to uncheck all the TreeView parent nodes and Child nodes:

private void btnUncheckAll_Click(object sender, EventArgs e)
{
for (int node = 0; node < tree.Nodes.Count; node++)
{
tree.Nodes[node].Checked = false;
if (bChild)
{
CheckAllChildren(tree.Nodes[node], tree.Nodes[node].Checked);
}
if (bParent)
{
CheckMyParent(tree.Nodes[node], tree.Nodes[node].Checked);
}
}
}

Tables to Excel

This is the process of reading each and every row and converting it to Excel.

word5.JPG

if (doc.Tables.Count != 0)
{
//Identifying the table and getting the values.
int rowtbl = 0;
System.Text.Encoding ascii = System.Text.Encoding.ASCII;
for (int tables = 1; tables <= doc.Tables.Count; tables++)
{
rowtbl = rowtbl + 1;
Table tbl = doc.Tables[tables];
CopyWithProgress(100);
foreach (Microsoft.Office.Interop.Word.Row row in tbl.Rows)
{
CopyWithProgress(doc.Tables.Count);
List<string> cellValues = new List<string>();
int val = 0;
foreach (Microsoft.Office.Interop.Word.Cell cell in row.Cells)
{
string cellContents = cell.Range.Text;
if(!cellContents.Contains("="))
cellValues.Add(cellContents.Remove(cellContents.Length - 2));
}
int ran = 65;
for (int celval = 0; celval <= cellValues.Count - 1; celval++)
{
m_objRange = m_objSheet.get_Range(ascii.GetString(new byte[] 
	{ (byte)ran }) + rowtbl.ToString(), m_objOpt);
m_objRange.Value2 = cellValues[val].Trim().TrimEnd().TrimStart().ToString();
ran++;
val++;
}
rowtbl = rowtbl + 1;
ProgressBar.PerformStep();
}}
//Saving the output Excel file 
m_objBook.SaveAs(@CurrentPath + "\\Temp.xlsx", m_objOpt, m_objOpt,
m_objOpt, m_objOpt, m_objOpt, 
	Microsoft.Office.Interop.Excel.XlSaveAsAccessMode.xlNoChange,
m_objOpt, m_objOpt, m_objOpt, m_objOpt, m_objOpt);
System.Runtime.InteropServices.Marshal.ReleaseComObject(m_objBooks);
System.Runtime.InteropServices.Marshal.ReleaseComObject(m_objExcel);
//m_objBook.Close(false, TMPpath, false);
doc.Close(ref m_objOpt, ref m_objOpt, ref m_objOpt);
a.Quit(ref m_objOpt, ref m_objOpt, ref m_objOpt);
File.Delete(file.ToString());
Process.Start(@CurrentPath + "\\Temp.xlsx");
}

Limitations

The uploaded document should have Table of Contents and Tables or otherwise an error message will be given. When the columns are merged in a table, an error will appear. So if a table has merged columns, this automation won't provide results. If a document has multiple number of Table of Contents, the first Table of Contents will be considered for automation.

Conclusion

This is my third article on CodeProject. This project is tested with more documents and the corrections have been made up to an extent. If further errors occur, please let me know so that it may be corrected in future.

History

  • 24th June, 2009: Initial post

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

padmanabhan N


Member
Completed my MCA in Madras University.
MCTS in Microsoft® .NET Framework 2.0 - Web-based Client
Development
MCTS in Microsoft® .NET Framework 2.0 - Application Development
Foundation
Occupation: Software Developer
Location: India India

Other popular Office Development articles:

Article Top
You must Sign In to use this message board.
FAQ FAQ 
 
Noise Tolerance  Layout  Per page   
 Msgs 1 to 2 of 2 (Total in Forum: 2) (Refresh)FirstPrevNext
GeneralExcellent !! PinmemberPragneshMPatel22:50 25 Jun '09  
GeneralRe: Excellent !! Pinmemberpadmanabhan N1:25 26 Jun '09  

General General    News News    Question Question    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

PermaLink | Privacy | Terms of Use
Last Updated: 24 Jun 2009
Editor: Deeksha Shenoy
Copyright 2009 by padmanabhan N
Everything else Copyright © CodeProject, 1999-2009
Web21 | Advertise on the Code Project