Introduction
In this article, we are going to see some automation concepts like converting Table of Contents to TreeView
and converting tables in Word to Excel.
Background
The main concept of this article is that everyone can read the data in Word, but when I had some requirements like getting the Table of Contents and converting the tables to Excel, I really struggled a lot and found the solution. So I thought of sharing those things.
What Does this Article Do?
The automation has two tabs.
1) Converting the Table of Contents to Treeview
While automating any document which has more than 400 pages, we don't have any options to select only a particular part which we want to automate. So, I thought that the Table of Contents would be helpful. The Table of Contents will have the key words or the headings of the details given in the document. This article will find the Table of Contents and store it in TreeView
, both parent and child.
2) Converting Tables from Word to Excel
This is the second issue that I faced. While automating Word which has a lot of tables, the alignment was bad and for that, I had to delete those tables. But later, I had a requirement for those tables also to be exported. So, I started Automation of Tables to Excel.
Table of Contents
The uploaded Document is taken and the Table of Content is identified. There, it is been separated as parent and child.
if (doc.TablesOfContents.Count != 0)
{
doc.TablesOfContents[1].IncludePageNumbers = false;
Table conTable = doc.TablesOfContents[1].Range.ConvertToTable
(ref nullobj, ref nullobj, ref nullobj, ref nullobj,
ref nullobj, ref nullobj, ref nullobj, ref nullobj,
ref nullobj, ref nullobj, ref nullobj, ref nullobj,
ref nullobj, ref nullobj, ref nullobj, ref nullobj);
string insertRange1 = string.Empty;
int tblrowa = conTable.Rows.Count;
int tblcola = conTable.Columns.Count;
for (int tblrow2 = 1; tblrow2 <= tblrowa; tblrow2++)
{
for (int tblcol2 = 1; tblcol2 <= tblcola; tblcol2++)
{
insertRange1 = conTable.Cell(tblrow2, tblcol2).Range.Text;
}
}
string[] Content = insertRange1.Split(new char[] { '\r' });
List<string> cellValdata = new List<string>();
TreeNode Parenttree = new TreeNode();
TreeNode treechild = new TreeNode();
for (int cou = 0; cou <= Content.Length - 1; cou++)
{
CopyWithProgress(Content.Length);
if (Content[cou] != "" && Content[cou] != "\a")
{
string[] FurtherSplit = Content[cou].Split(new char[] { });
if (FurtherSplit[0].Length <= 2)
{
if (Content[cou] != "" && Content[cou] != "\a")
{
Parenttree = new TreeNode(Content[cou]);
tree.Nodes.Add(Parenttree);
}
}
else
{
if (Content[cou].Contains("."))
{
if (Content[cou] != "" && Content[cou] != "\a")
{
treechild = new TreeNode(Content[cou]);
Parenttree.Nodes.Add(treechild);
}
}
else
{
Parenttree = new TreeNode(Content[cou]);
tree.Nodes.Add(Parenttree);
}
}
}
ProgressBar.PerformStep();
}
}
Remember it is been assumed that there can be only one Table of Contents for each document.
Check and Uncheck the TreeNodes
When a parent is checked, all the child nodes will be checked and when any one of the child nodes is unchecked, then the parent node will be unchecked.
Boolean bChild = true;
Boolean bParent = true;
private void tree_AfterCheck(object sender, TreeViewEventArgs e)
{
if (bChild)
{
CheckAllChildren(e.Node, e.Node.Checked);
}
if (bParent)
{
CheckMyParent(e.Node, e.Node.Checked);
}
}
void CheckAllChildren(TreeNode tn, Boolean bCheck)
{
bParent = false;
foreach (TreeNode ctn in tn.Nodes)
{
bChild = false;
ctn.Checked = bCheck;
bChild = true;
CheckAllChildren(ctn, bCheck);
}
bParent = true;
}
void CheckMyParent(TreeNode tn, Boolean bCheck)
{
if (tn == null) return;
if (tn.Parent == null) return;
bChild = false;
bParent = false;
tn.Parent.Checked = bCheck;
CheckMyParent(tn.Parent, bCheck);
bParent = true;
bChild = true;
}
Expand All
Expand all is used to expand the TreeView
:
private void btnExpandAll_Click(object sender, EventArgs e)
{
this.tree.ExpandAll();
}
Collapse All
Collapse all is used to collapse the TreeView
:
private void btnCollapseAll_Click(object sender, EventArgs e)
{
this.tree.CollapseAll();
}
Check All
Check all is used to check all the TreeView parent nodes and Child nodes:
private void btnCheckAll_Click(object sender, EventArgs e)
{
for (int node = 0; node < tree.Nodes.Count; node++)
{
tree.Nodes[node].Checked = true;
if (bChild)
{
CheckAllChildren(tree.Nodes[node], tree.Nodes[node].Checked);
}
if (bParent)
{
CheckMyParent(tree.Nodes[node], tree.Nodes[node].Checked);
}
}
}
UnCheck All
UnCheck all is used to uncheck all the TreeView
parent nodes and Child nodes:
private void btnUncheckAll_Click(object sender, EventArgs e)
{
for (int node = 0; node < tree.Nodes.Count; node++)
{
tree.Nodes[node].Checked = false;
if (bChild)
{
CheckAllChildren(tree.Nodes[node], tree.Nodes[node].Checked);
}
if (bParent)
{
CheckMyParent(tree.Nodes[node], tree.Nodes[node].Checked);
}
}
}
Tables to Excel
This is the process of reading each and every row and converting it to Excel.
if (doc.Tables.Count != 0)
{
int rowtbl = 0;
System.Text.Encoding ascii = System.Text.Encoding.ASCII;
for (int tables = 1; tables <= doc.Tables.Count; tables++)
{
rowtbl = rowtbl + 1;
Table tbl = doc.Tables[tables];
CopyWithProgress(100);
foreach (Microsoft.Office.Interop.Word.Row row in tbl.Rows)
{
CopyWithProgress(doc.Tables.Count);
List<string> cellValues = new List<string>();
int val = 0;
foreach (Microsoft.Office.Interop.Word.Cell cell in row.Cells)
{
string cellContents = cell.Range.Text;
if(!cellContents.Contains("="))
cellValues.Add(cellContents.Remove(cellContents.Length - 2));
}
int ran = 65;
for (int celval = 0; celval <= cellValues.Count - 1; celval++)
{
m_objRange = m_objSheet.get_Range(ascii.GetString(new byte[]
{ (byte)ran }) + rowtbl.ToString(), m_objOpt);
m_objRange.Value2 = cellValues[val].Trim().TrimEnd().TrimStart().ToString();
ran++;
val++;
}
rowtbl = rowtbl + 1;
ProgressBar.PerformStep();
}}
m_objBook.SaveAs(@CurrentPath + "\\Temp.xlsx", m_objOpt, m_objOpt,
m_objOpt, m_objOpt, m_objOpt,
Microsoft.Office.Interop.Excel.XlSaveAsAccessMode.xlNoChange,
m_objOpt, m_objOpt, m_objOpt, m_objOpt, m_objOpt);
System.Runtime.InteropServices.Marshal.ReleaseComObject(m_objBooks);
System.Runtime.InteropServices.Marshal.ReleaseComObject(m_objExcel);
doc.Close(ref m_objOpt, ref m_objOpt, ref m_objOpt);
a.Quit(ref m_objOpt, ref m_objOpt, ref m_objOpt);
File.Delete(file.ToString());
Process.Start(@CurrentPath + "\\Temp.xlsx");
}
Limitations
The uploaded document should have Table of Contents and Tables or otherwise an error message will be given. When the columns are merged in a table, an error will appear. So if a table has merged columns, this automation won't provide results. If a document has multiple number of Table of Contents, the first Table of Contents will be considered for automation.
Conclusion
This is my third article on CodeProject. This project is tested with more documents and the corrections have been made up to an extent. If further errors occur, please let me know so that it may be corrected in future.
History
- 24th June, 2009: Initial post
Completed my MCA in Madras University.
MCTS in Microsoft® .NET Framework 2.0 - Web-based Client
Development
MCTS in Microsoft® .NET Framework 2.0 - Application Development
Foundation