Click here to Skip to main content
16,003,277 members
Please Sign up or sign in to vote.
5.00/5 (1 vote)
See more:
Hi,

Let me explain the title:
I am reading in a *.xmls file. I need to get put the cells into an object[][] before the next step. I had been adding each row in index order but I have found that if a cell is left blank then it is not included in the row object:

C#
foreach (OpenXmlWorksheetSelector selector in sheetIndexes)
{
    WorksheetPart wsPart =
        (WorksheetPart)(wbPart.GetPartById(selector.Id));

    if (wsPart != null)
    {
        object[][] cells =
            wsPart.Worksheet.Descendants<Row>()
            .Select(row => row.Descendants<Cell>().ToArray())
            .Select(cellsa => cellsa.Select(cellsaCell => GetCellValue(cellsaCell, wbPart)).ToArray())
            .Where(row => row.Any(c => !c.ToString().IsEmpty()))
            .Select(row => row
                .Concat(new object[] { selector.WorkBookName, selector.Name, selector.Id.ToString() }).ToArray())
            .ToArray();

        var p = Products(columnDetailSelectors, cells);
        if (p != null)
            products.AddRange(p);
    }

}


This means that my columns don't match up. The only identifier for each rows column position is the cell position attribute ('A1', 'B1', etc). Without parsing each row, can you think of a way that I could get the columns to match up for each row?

In the meantime I'll be writing a row-by-row parser to fix my issue :S

Thanks in advance
Andy ^_^


PS: in case you don't know what the *.xmlx sheet format looks like: here is an example of row 2:
XML
<row r="2" spans="1:28" x14ac:dyDescent="0.25">
    <c r="A2" t="s">
        <v>28</v>
    </c><c r="B2">
        <v>0</v>
    </c><c r="C2">
        <v>8000</v>
    </c><c r="D2" t="s">
        <v>29</v>
    </c><c r="E2" s="1">
        <v>42313</v>
    </c><c r="G2">
        <v>1</v>
    </c><c r="H2" t="s">
        <v>30</v>
    </c><c r="I2" t="s">
        <v>31</v>
    </c><c r="J2" t="s">
        <v>32</v>
    </c><c r="K2" t="s">
        <v>33</v>
    </c><c r="L2" t="s">
        <v>34</v>
    </c><c r="M2">
        <v>1</v>
    </c><c r="N2">
        <v>15363200</v>
    </c><c r="O2">
        <v>76.13</v>
    </c><c r="P2">
        <v>9.44</v>
    </c><c r="Q2" t="s">
        <v>34</v>
    </c><c r="AB2" t="s">
        <v>35</v>
    </c>
</row>

As you can see, columns F, R-Z and AA are missing
Posted
Comments
Richard Deeming 25-Jan-16 7:53am    
There are several suggestions in this StackOverflow thread[^].
Andy Lanng 25-Jan-16 7:58am    
Cool. It kinda works out the same as what I ended up doing in my row-by-row parser. It's actually pretty anyway, speedy so I'll post it as an answer ^_^

1 solution

Ok, so I wrote my row-by-row parser. I'll be dealing with the order of 1,000's of rows so it's speedy enough. Some suggestions on StackOverflow (datatable - reading Excel Open XML is ignoring blank cells - Stack Overflow[^]) may be more efficient:


C#
foreach (OpenXmlWorksheetSelector selector in sheetIndexes)
{
  WorksheetPart wsPart = (WorksheetPart)(wbPart.GetPartById(selector.Id));

  if (wsPart != null)
  {
    Row[] rows = wsPart.Worksheet.Descendants<Row>().ToArray();

    // in my case the max width will be the first (header) row
    var headerCells = rows.SelectMany(r => r.Descendants<Cell>()).ToArray();
    var maxColumnIndex = headerCells.Max(c => 
        // gets index from column letter
        GetColumnIndexByLetters(c.GetAttribute("r", "").Value));

    object[,] cells = new object[rows.Count(), maxColumnIndex + 4];

    foreach (Row row in rows)
    {
      foreach (Cell cell in row.Descendants<Cell>())
      {
        cells[
          int.Parse(row.GetAttribute("r", "").Value) - 1,
          GetColumnIndexByLetters(cell.GetAttribute("r", "").Value)] = GetCellValue(cell,
                                    wbPart);
      }
    }
    //ToEnumerable() is a function that, given an array on n dimensions, returns an enumerable<array[n-1>
    object[][] allObjects = cells.ToEnumerable()
        .Where(r => r.Any(c => !(c ?? "").ToString().IsEmpty()))
        .Select(r => r.Concat(new object[]
           {
                selector.WorkBookName, 
                selector.Name, 
                selector.Id.ToString()
           }).ToArray()).ToArray();
           
    var p = Products(columnDetailSelectors, allObjects);
  if (p != null)
     products.AddRange(p);
 }
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900