Click here to Skip to main content
Licence Apache
First Posted 25 Oct 2010
Views 2,740
Bookmarked 1 time

CSV: Carelessly Saved Values

By | 25 Oct 2010 | Technical Blog
CSV: Carelessly Saved Values
A Technical Blog article. View original blog here.[^]

I currently work with some numeric methods that accept large quantities of data as inputs. Normally the input comes from a database. As I was writing the tests, I needed a suitable external format to represent relatively large quantities of data. CSV seemed like a natural choice, because it is widely used and supported by SQL Management Studio and Excel. However, I found CSV very inconvenient, for the following reasons:

  • There is no standard way to export a data table to CSV. CSV files can be read by OLE DB Jet driver, but apparently this driver cannot create new tables/files. Google is full of samples on how to write to CSV files by hand.
  • There is no standard data format. Should the data be included in quotes? How do you handles strings with embedded commas? What is the date format? Apparently, there is no such things as “CSV standard” and everyone does what they please.
  • There is no type information. OLE DB driver can use schema.ini, but it is non-standard and cumbersome to work with. Also, it combines type information for multiple tables. Without explicit type information, OLE DB driver tries to “guess” the types, but often does it wrong. It seems that it looks at the first couple of dozen rows, and makes its decision based on that. If further rows are incompatible, they will suffer. The driver will either complain (“cannot convert string to integer”), or silently truncate doubles to ints, etc., which is even worse.
  • There is no null value. Empty strings are treated as NULLs on read, even if included in quotes. However, SQL Management Studio when exporting NULL values puts literal NULL into the file, which is read back as string.

All this makes CSV format somewhat difficult to use – it is very hard to guarantee that the data read will be exactly equivalent (in type and value) to the data written. I gave up and switched to DataTable.WriteXml() and WriteXmlSchema(), but this leads to much bigger files and is not compatible with anything but .NET. I am still looking for an alternative.

License

This article, along with any associated source code and files, is licensed under The Apache License, Version 2.0

About the Author

Ivan Krivyakov

Architect
Sungard Consulting Services
United States United States

Member



Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board. (secure sign-in)
 
Search this forum  
 FAQ
    Noise  Layout  Per page   
  Refresh
-- There are no messages in this forum --
Permalink | Advertise | Privacy | Mobile
Web02 | 2.5.120517.1 | Last Updated 25 Oct 2010
Article Copyright 2010 by Ivan Krivyakov
Everything else Copyright © CodeProject, 1999-2012
Terms of Use
Layout: fixed | fluid