Click here to Skip to main content
15,886,067 members
Articles / Web Development / ASP.NET

Word automation (Part 1)

Rate me:
Please Sign up or sign in to vote.
4.44/5 (8 votes)
24 Jun 2009CPOL2 min read 49.9K   1.3K   25   4
Word automation (Part 1)

Introduction

In this article, we are going to see some automation concepts like converting Table of Contents to TreeView and converting tables in Word to Excel.

Background

The main concept of this article is that everyone can read the data in Word, but when I had some requirements like getting the Table of Contents and converting the tables to Excel, I really struggled a lot and found the solution. So I thought of sharing those things.

What Does this Article Do?

The automation has two tabs.

1) Converting the Table of Contents to Treeview

While automating any document which has more than 400 pages, we don't have any options to select only a particular part which we want to automate. So, I thought that the Table of Contents would be helpful. The Table of Contents will have the key words or the headings of the details given in the document. This article will find the Table of Contents and store it in TreeView, both parent and child.

2) Converting Tables from Word to Excel

This is the second issue that I faced. While automating Word which has a lot of tables, the alignment was bad and for that, I had to delete those tables. But later, I had a requirement for those tables also to be exported. So, I started Automation of Tables to Excel.

word4.JPG

Table of Contents

The uploaded Document is taken and the Table of Content is identified. There, it is been separated as parent and child.

word1.JPG

C#
if (doc.TablesOfContents.Count != 0)
{
doc.TablesOfContents[1].IncludePageNumbers = false;
Table conTable = doc.TablesOfContents[1].Range.ConvertToTable
		(ref nullobj, ref nullobj, ref nullobj, ref nullobj,
ref nullobj, ref nullobj, ref nullobj, ref nullobj, 
	ref nullobj, ref nullobj, ref nullobj, ref nullobj,
ref nullobj, ref nullobj, ref nullobj, ref nullobj);
string insertRange1 = string.Empty;
int tblrowa = conTable.Rows.Count;
int tblcola = conTable.Columns.Count;
for (int tblrow2 = 1; tblrow2 <= tblrowa; tblrow2++)
{
for (int tblcol2 = 1; tblcol2 <= tblcola; tblcol2++)
{
insertRange1 = conTable.Cell(tblrow2, tblcol2).Range.Text;
}
}
string[] Content = insertRange1.Split(new char[] { '\r' });
List<string> cellValdata = new List<string>();
TreeNode Parenttree = new TreeNode();
TreeNode treechild = new TreeNode();
//assigning the parent and child nodes
for (int cou = 0; cou <= Content.Length - 1; cou++)
{
CopyWithProgress(Content.Length);
if (Content[cou] != "" && Content[cou] != "\a")
{
string[] FurtherSplit = Content[cou].Split(new char[] { });
if (FurtherSplit[0].Length <= 2)
{
if (Content[cou] != "" && Content[cou] != "\a")
{
Parenttree = new TreeNode(Content[cou]);
tree.Nodes.Add(Parenttree);
}
}
else
{
if (Content[cou].Contains("."))
{
if (Content[cou] != "" && Content[cou] != "\a")
{
treechild = new TreeNode(Content[cou]);
Parenttree.Nodes.Add(treechild);
}
}
else
{
Parenttree = new TreeNode(Content[cou]);
tree.Nodes.Add(Parenttree);
}
}
}
ProgressBar.PerformStep();
}
}

Remember it is been assumed that there can be only one Table of Contents for each document.

Check and Uncheck the TreeNodes

When a parent is checked, all the child nodes will be checked and when any one of the child nodes is unchecked, then the parent node will be unchecked.

word3.JPG

C#
Boolean bChild = true;
Boolean bParent = true;
private void tree_AfterCheck(object sender, TreeViewEventArgs e)
{
if (bChild)
{
CheckAllChildren(e.Node, e.Node.Checked);
}
if (bParent)
{
CheckMyParent(e.Node, e.Node.Checked);
}
} 
C#
void CheckAllChildren(TreeNode tn, Boolean bCheck)
{
bParent = false;
foreach (TreeNode ctn in tn.Nodes)
{
bChild = false;
ctn.Checked = bCheck;
bChild = true;
CheckAllChildren(ctn, bCheck);
}
bParent = true;
} 

void CheckMyParent(TreeNode tn, Boolean bCheck)
{
if (tn == null) return;
if (tn.Parent == null) return;
bChild = false;
bParent = false;
tn.Parent.Checked = bCheck;
CheckMyParent(tn.Parent, bCheck);
bParent = true;
bChild = true;
}

Expand All

Expand all is used to expand the TreeView:

C#
private void btnExpandAll_Click(object sender, EventArgs e)
{
this.tree.ExpandAll();
}

Collapse All

Collapse all is used to collapse the TreeView:

C#
private void btnCollapseAll_Click(object sender, EventArgs e)
{
this.tree.CollapseAll();
}

Check All

Check all is used to check all the TreeView parent nodes and Child nodes:

C#
private void btnCheckAll_Click(object sender, EventArgs e)
{
for (int node = 0; node < tree.Nodes.Count; node++)
{
tree.Nodes[node].Checked = true;
if (bChild)
{
CheckAllChildren(tree.Nodes[node], tree.Nodes[node].Checked);
}
if (bParent)
{
CheckMyParent(tree.Nodes[node], tree.Nodes[node].Checked);
}
}
} 

UnCheck All

UnCheck all is used to uncheck all the TreeView parent nodes and Child nodes:

C#
private void btnUncheckAll_Click(object sender, EventArgs e)
{
for (int node = 0; node < tree.Nodes.Count; node++)
{
tree.Nodes[node].Checked = false;
if (bChild)
{
CheckAllChildren(tree.Nodes[node], tree.Nodes[node].Checked);
}
if (bParent)
{
CheckMyParent(tree.Nodes[node], tree.Nodes[node].Checked);
}
}
}

Tables to Excel

This is the process of reading each and every row and converting it to Excel.

word5.JPG

C#
if (doc.Tables.Count != 0)
{
//Identifying the table and getting the values.
int rowtbl = 0;
System.Text.Encoding ascii = System.Text.Encoding.ASCII;
for (int tables = 1; tables <= doc.Tables.Count; tables++)
{
rowtbl = rowtbl + 1;
Table tbl = doc.Tables[tables];
CopyWithProgress(100);
foreach (Microsoft.Office.Interop.Word.Row row in tbl.Rows)
{
CopyWithProgress(doc.Tables.Count);
List<string> cellValues = new List<string>();
int val = 0;
foreach (Microsoft.Office.Interop.Word.Cell cell in row.Cells)
{
string cellContents = cell.Range.Text;
if(!cellContents.Contains("="))
cellValues.Add(cellContents.Remove(cellContents.Length - 2));
}
int ran = 65;
for (int celval = 0; celval <= cellValues.Count - 1; celval++)
{
m_objRange = m_objSheet.get_Range(ascii.GetString(new byte[] 
	{ (byte)ran }) + rowtbl.ToString(), m_objOpt);
m_objRange.Value2 = cellValues[val].Trim().TrimEnd().TrimStart().ToString();
ran++;
val++;
}
rowtbl = rowtbl + 1;
ProgressBar.PerformStep();
}}
//Saving the output Excel file 
m_objBook.SaveAs(@CurrentPath + "\\Temp.xlsx", m_objOpt, m_objOpt,
m_objOpt, m_objOpt, m_objOpt, 
	Microsoft.Office.Interop.Excel.XlSaveAsAccessMode.xlNoChange,
m_objOpt, m_objOpt, m_objOpt, m_objOpt, m_objOpt);
System.Runtime.InteropServices.Marshal.ReleaseComObject(m_objBooks);
System.Runtime.InteropServices.Marshal.ReleaseComObject(m_objExcel);
//m_objBook.Close(false, TMPpath, false);
doc.Close(ref m_objOpt, ref m_objOpt, ref m_objOpt);
a.Quit(ref m_objOpt, ref m_objOpt, ref m_objOpt);
File.Delete(file.ToString());
Process.Start(@CurrentPath + "\\Temp.xlsx");
}

Limitations

The uploaded document should have Table of Contents and Tables or otherwise an error message will be given. When the columns are merged in a table, an error will appear. So if a table has merged columns, this automation won't provide results. If a document has multiple number of Table of Contents, the first Table of Contents will be considered for automation.

Conclusion

This is my third article on CodeProject. This project is tested with more documents and the corrections have been made up to an extent. If further errors occur, please let me know so that it may be corrected in future.

History

  • 24th June, 2009: Initial post

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer
India India
Completed my MCA in Madras University.
MCTS in Microsoft® .NET Framework 2.0 - Web-based Client
Development
MCTS in Microsoft® .NET Framework 2.0 - Application Development
Foundation

Comments and Discussions

 
Questiondon't understand Pin
xrongzhen2-Sep-10 5:46
xrongzhen2-Sep-10 5:46 
GeneralSlow document generation! Pin
Tamas2421-Jan-10 8:43
Tamas2421-Jan-10 8:43 
GeneralExcellent !! Pin
PragneshMPatel25-Jun-09 21:50
PragneshMPatel25-Jun-09 21:50 
i want to do the same thing.

i am looking for automatic publication solution.

where there will be predefined CSS. & different users will contribute in document & at the and need exported to single PDF.

Any suggestion how to implement it?

Also is there any other method than make it Database oriented?

Thanks
Pragnesh

Pragnesh Patel

GeneralRe: Excellent !! Pin
padmanabhan N26-Jun-09 0:25
padmanabhan N26-Jun-09 0:25 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.