Importing Excel Clipboard Data INTO .NET via "XML Spreadsheet"

Casey Witt

5.00/5 (4 votes)

Apr 14, 2015

CPOL

4 min read

26057

Function for parsing Microsoft Excel "XML Spreadsheet" clipboard DataObject into .NET Array of Objects

Introduction

This tip presents a function to transfer data cut or copied from Microsoft Excel into VB.NET using the clipboard. The function will inspect the clipboard DataObject for an "XML Spreadsheet" object, and convert that to a .NET Array of Objects (which the user can then use as needed).

Background

I am developing an application and need to be able to copy and paste between my application and Microsoft Excel using the clipboard. Copying from my app to Excel is easy (it is already baked into the DevExpress GridView I am using). Copying data from Excel to my application turned out to be a tricky beast. My first try was copying the text from the clipboard and then parsing it, but that didn't work because my text has Asian characters and the text is copied as ASCII (I think). My next try was to grab the text from the clipboard as a "UnicodeText" object and then use a TextFieldParser (with VbTab deliminator), but that didn't work because the TextFieldParser would ignore blank cells (Microsoft says this is not a bug, but I think it is since the TextFieldParser actually performs a transform which deletes empty rows as it parses). I didn't not find any other standardized parser and wasn't confident to roll my own due to issues such as embedded deliminators or line breaks, etc. The other issue was that both of these methods transferred the data as it was "displayed" in Microsoft Excel which meant that according to the Microsoft Excel user's Excel formatting, the data could be transformed in arbitrary formats.

The solution I finally arrived at was to use the "XML Spreadsheet" format to transfer the data. The main benefits of this are that it ensures empty cells are not skipped, and it will always transfer the raw unformatted value. Although the array which is returned is an array of objects, the underlying object type will be either String, Decimal, or DateTime.

The "XML Spreadsheet" object seems to be a self contained XML spreadsheet with a single sheet containing only the cells that were copied to the clipboard. You can inspect the schema for this by setting a debugging breakpoint after the call to ReadXml and then click the magnifying glass in the watch window.

Using the Code

This is a function that will inspect the clipboard for a DataObject containing a "XML Spreadsheet" object and if found return an Array(,) of Object, or Nothing if not found.

The code is mostly self explanatory, and I would appreciate any comments or suggestions for improvement.

    Private Class XMLSpreadsheetCellData
        Public Property CellId As Int32
        Public Property RowIndex As Int32
        Public Property ColumnIndex As Int32
        Public Property DataType As String
        Public Property DataValue As String
    End Class

    Private Function ParseClipboard() As Object(,)
            Dim clipboardData = Clipboard.GetDataObject
            If clipboardData IsNot Nothing Then
                If clipboardData.GetFormats.Contains("XML Spreadsheet") Then
                    Dim spreadsheet = New DataSet
                    spreadsheet.ReadXml(clipboardData.GetData("XML Spreadsheet"))
                    Dim rowCount = spreadsheet.Tables("Table").Rows(0)("ExpandedRowCount")
                    Dim columnCount = spreadsheet.Tables("Table").Rows(0)("ExpandedColumnCount")
                    If rowCount > 0 AndAlso columnCount > 0 Then
                        Dim result(rowCount - 1, columnCount - 1) As Object
                        If spreadsheet.Tables.Contains("Data") Then 'if there is no "Data" table then all cells are empty and all array elements will be nothing
                            If Not spreadsheet.Tables("Cell").Columns.Contains("Index") Then spreadsheet.Tables("Cell").Columns.Add(New DataColumn("Index", GetType(Int32)))
                        If Not spreadsheet.Tables("Row").Columns.Contains("Index") Then spreadsheet.Tables("Row").Columns.Add(New DataColumn("Index", GetType(Int32)))

                        'Iterate through the Row table and set the row indexes
                        Dim rowIndex = 1
                        With spreadsheet.Tables("Row")
                            For i = 0 To .Rows.Count - 1
                                If IsDBNull(.Rows(i)("Index")) Then
                                    .Rows(i)("Index") = rowIndex
                                    rowIndex += 1
                                Else
                                    rowIndex = .Rows(i)("Index") + 1
                                End If
                            Next
                        End With

                        'Iterate through the cell table and set the column indexes
                        rowIndex = -1
                        Dim columnIndex = 0
                        With spreadsheet.Tables("Cell")
                            For i = 0 To .Rows.Count - 1
                                If .Rows(i)("Row_Id") <> rowIndex Then columnIndex = 1
                                rowIndex = .Rows(i)("Row_Id")
                                If IsDBNull(.Rows(i)("Index")) Then
                                    .Rows(i)("Index") = columnIndex
                                    columnIndex += 1
                                Else
                                    columnIndex = .Rows(i)("Index") + 1
                                End If
                            Next
                        End With

                        Dim cells = (From cellRecord In spreadsheet.Tables("Cell") Join rowRecord In spreadsheet.Tables("Row")
                                     On cellRecord("Row_Id") Equals rowRecord("Row_Id") Join dataRecord In spreadsheet.Tables("Data")
                                     On cellRecord("Cell_Id") Equals dataRecord("Cell_Id")
                                     Select New XMLSpreadsheetCellData With {.CellId = cellRecord("Cell_Id"),
                                                                             .RowIndex = rowRecord("Index") - 1,
                                                                             .ColumnIndex = cellRecord("Index") - 1,
                                                                             .DataType = dataRecord("Type"),
                                                                             .DataValue = dataRecord("Data_Text")})

                        For Each cell In (From entry In cells
                                          Order By entry.RowIndex, entry.CellId)
                            rowIndex = cell.RowIndex
                            columnIndex = cell.ColumnIndex
                            Select Case cell.DataType
                                Case "String"
                                    result(rowIndex, columnIndex) = cell.DataValue
                                Case "DateTime"
                                    result(rowIndex, columnIndex) = DateTime.Parse(cell.DataValue)
                                Case "Number"
                                    result(rowIndex, columnIndex) = Decimal.Parse(cell.DataValue)
                                    If Decimal.Floor(result(rowIndex, columnIndex)) = result(rowIndex, columnIndex) Then
                                        result(rowIndex, columnIndex) = Integer.Parse(result(rowIndex, columnIndex))
                                    End If
                                Case Else
                                    Throw New DataException(String.Format("XML Spreadsheet Type {0} not recognized.", cell.DataType))
                            End Select
                        Next
                        End If
                        Return result
                    End If
                    Return Nothing
                End If
            End If
            Return Nothing
        End Function

The three tables of interest in the xmlSpreadsheet DataSet schema are "Data" which holds the data values, "Row" which holds the row indexes, and "Cell" which hold the column indexes.

If there are no blank cells in the Excel data then the "Index" columns will be missing from the "Row" and "Cell" tables. In this case a new "Index" column is added to each of these tables.

Then the row indexes are calculated and stored in the "Index" column in the "Row" table using the rule that the first index is 1, and each time the Row_Id field changes the index is incremented.

The column indexes are calculated and stored in the "Index" column in the "Cell" table using the rule that each time the Row_Id changes the column index is reset to 1, and then incremented for each of the following records with the same Row_Id.

If the Excel data contains empty cells in a column then the "Cell" table will already include the "Index" column, so the rule for calculating the column indexes is each time the Row_Id changes the column index is reset to 1, and if the "Index" field is empty set it to the column index, but if it is not empty then don't change it.

If the Excel data contains empty rows then the "Row" table will already include the "Index" column, so the rule for calculating the row indexes is that the first index is 1, and for each row if the "Index" field has a value don't change it otherwise increment the "Index" field by 1 for each row.

History

14April 2015: First posted
16 April 2015: Updated code to fix a bug based on a wrong assumption about how row and column indexes should be calculated