Click here to Skip to main content
14,297,845 members
Rate this:
Please Sign up or sign in to vote.
See more:
I have a byte array, which represents either XLS or XLSX file, and I need to find some string(s) in it using regex. It is easy to encode XLS into string and do what I need but XLSX files are in zipped xml format, so encoding doesn't do any good - string is unreadable. Is there any way to extract/read "XLSX" byte arrays? I don't want to use Interop, save Excel files an and open workbooks. Any ideas?

P.S. Just to make it clear - I get a byte array 9file content) from InfoPath attachments
Posted
Comments
Sergey Alexandrovich Kryukov 10-Sep-14 20:59pm
   
If course you can do it. This is trivial, but it depends on how the string/file is encoded in this byte array...
—SA
esb77 10-Sep-14 21:19pm
   
I thought I have been clear - this is an attachment in the InfoPath form
Sergey Alexandrovich Kryukov 10-Sep-14 23:21pm
   
Sorry, I misread it. XLSX is zipped XML in MS Open XML format, but XLS is proprietary to Microsoft obsolete format which is not standardized. As far as I remember, open-source LibreOffice has the code reading/writing it.
—SA
Rate this:
Please Sign up or sign in to vote.

Solution 2

Updated solution (after clarification by OP):

If you want to read in the attachment from an Infopath document, you can use this method: http://support.microsoft.com/kb/2517906. The code contains both an Encoder and Decoder.

The Decoder gives you the actual byte[] for the attachment. Then byte[] -> MemoryStream -> Unzip (using DeflateStream or ZipArchive -> then you get your XML files out of your archive.

Start by getting the attachment data from the infopath form as byte array:
XPathDocument document = new XPathDocument(@"infopathform.xml");
XPathNavigator navigator = document.CreateNavigator();

//Get the my namespace URI
navigator.MoveToFollowing(XPathNodeType.Element);
string myNamespace = navigator.GetNamespacesInScope(XmlNamespaceScope.All)["my"];

//Create an XmlNamespaceManager
XmlNamespaceManager ns = new XmlNamespaceManager(new NameTable());
ns.AddNamespace("my", myNamespace);

//Create an XPathNavigator object for the attachment node
XPathNavigator xnAttNode = navigator.SelectSingleNode("//my:Attachments", ns);

//Get the encoded value as string
string encodedAttachment = xnAttNode.InnerXml;

byte[] theData = Convert.FromBase64String(encodedAttachment);

Next, get the actual attachment data, because the raw data includes filename etc.:
using (MemoryStream ms = new MemoryStream(theData))
{
    BinaryReader theReader = new BinaryReader(ms);

    //Position the reader to obtain the file size.
    byte[] headerData = new byte[16];
    headerData = theReader.ReadBytes(headerData.Length);

    int fileSize = (int)theReader.ReadUInt32();
    int attachmentNameLength = (int)theReader.ReadUInt32() * 2;

    byte[] fileNameBytes = theReader.ReadBytes(attachmentNameLength);
    //InfoPath uses UTF8 encoding.
    Encoding enc = Encoding.Unicode;
    string attachmentName = enc.GetString(fileNameBytes, 0, attachmentNameLength - 2);
    byte[] decodedAttachment = theReader.ReadBytes(fileSize);
}

Once you get your byte[], put it into a stream, and you can use any of the following methods to extract your XML files.

.NET 4.5 - reference System.IO.Packaging.
using (MemoryStream memoryStream = new MemoryStream(decodedAttachment))
{
    ZipArchive zip = new ZipArchive(memoryStream, ZipArchiveMode.Read);
    foreach (var entry in zip.Entries)
    {
        //do something with each XML file
        var sw = new StreamReader(entry.Open());
        string xmlString = sw.ReadToEnd();
    }
}

Or pre-.NET 4.5 - reference WindowsBase:
using (MemoryStream memoryStream = new MemoryStream(decodedAttachment))
{
    ZipPackage p = (ZipPackage) ZipPackage.Open(memoryStream);
    foreach (var part in p.GetParts())
    {
        var s = new StreamReader(part.GetStream());
        string xmlString = s.ReadToEnd();
    }
}

Or using the OpenXML SDK:
using (MemoryStream memoryStream = new MemoryStream(decodedAttachment))
{
    SpreadsheetDocument document = SpreadsheetDocument.Open(memoryStream, false);
    foreach (var worksheetPart in document.WorkbookPart.WorksheetParts)
    {
        //do something with all worksheets
    }
}
   
v8
Comments
esb77 11-Sep-14 10:46am
   
good code but I guess only if you know an attachment path. Unfortunately in my case there is no path, attachments stored in the form and don't have a physical location. All I have is XmlNodeList from Infopath and a string from the <my:attachments> node. I can extract file name from that string and encode byte array of the file content.

byte[] b = Convert.FromBase64String(myString);
int namebufferlen = b[20] * 2;

byte[] filecontent = new byte[b.Length - (24 + namebufferlen)];
for (int i = 0; i <= filecontent.Length - 1; i++)
filecontent(i) = b[24 + namebufferlen + i];

Then I'm not sure what to do
string byteArrayToStr = Encoding.UTF8.GetString(filecontent, 0, filecontent.Length);
is good for xls files only. Tried ToBase64String function (from the example you provided) with file content as a parameter, it still returns zipped file.

Of course, I can save the file from file content and read Excel -

FileStream fs = new FileStream(tempfilename, FileMode.Create);
fs.Write(filecontent, 0, filecontent.Length);
fs.Close();

xApp = new Microsoft.Office.Interop.Excel.Application();
xApp.Visible = true;
wb1 = xApp.Workbooks.Open(tempfilename);

but that is not preferred method.
kbrandwijk 11-Sep-14 11:02am
   
You're spot on, right until the point where you see that you get the zipped file back (as byte stream). This needs to be the input for the call to unzip the contents.
esb77 11-Sep-14 11:51am
   
still doesn't work as I'm not sure how InfoPath deals with attachments. Tried System.IO.Compression to unzip byte array

MemoryStream ms = new MemoryStream();

using (compressedStream == new MemoryStream(filecontent)) {
using (zipStream == new GZipStream(compressedStream, CompressionMode.Decompress)) {
zipStream.CopyTo(ms);
}
}
and got "The magic number in GZip header is not correct. Make sure you are passing in a GZip stream."
kbrandwijk 11-Sep-14 12:28pm
   
I think you can use DeflateStream(stream, CompressionMode.NoCompression) but I'm not 100% sure because I don't have a working example of an Infopath form with xlsx attachment. I'll try to set something up later.
esb77 11-Sep-14 12:40pm
   
I could send one to you if I knew to where...
esb77 11-Sep-14 13:59pm
   
using (MemoryStream compressedStream = new MemoryStream(filecontent)) {
using (DeflateStream decompressedStream = new DeflateStream(compressedStream, CompressionMode.Decompress)) {
StreamReader reader = new StreamReader(decompressedStream);
decompressedStream.Close();
myStr = reader.ReadToEnd();
}}
returns - "Block length does not match with its complement"
kbrandwijk 12-Sep-14 10:38am
   
I have updated the solution. Your error was probably caused by the fact that the attachmentdata also contains filename information, that you have to strip from it first.
Rate this:
Please Sign up or sign in to vote.

Solution 1

If the XLSX are OpenXML format, you can use one of the following methods:

1. Use the OpenXML SDK (http://msdn.microsoft.com/en-us/library/office/bb448854(v=office.15).aspx[^])

2. Use an open source tool like ExcelPackage: http://excelpackage.codeplex.com/wikipage?title=Reading%20data%20from%20an%20Excel%20spreadsheet&referringTitle=Home[^]

3. The XLSX is a collection of XML files. Using built-in support for archives in .NET using System.Io.Compression.ZipArchive, you can easily access the XML files in there. This post: Read and write Open XML files (MS Office 2007)[^] contains information on how to extract the XML files from a XSLX and read them using a simple XmlReader.
   
Comments
esb77 10-Sep-14 21:39pm
   
kbrandwijk thanks, but I'm not sure I understood how that helps me. I don't have XLSX file, just a byte array (excel file has been previously attached to the InfoPath form and only that form xml file is available to me

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)




CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100