|
Please read this article ... In particular, section "Basic usage scenario".
|
|
|
|
|
I found an other free .NET utility that deals with CSV.
1)THe good point is that there is a detailed documentation.
2)But it is not open source
3)No information about the performances
4)No detailed information in the exception so you don't know what causes the error
Delimited Text File Utility
Renaud
|
|
|
|
|
The whole point of this article is having a correct and complete parser which is also performant.
The utility you mention has none of those qualities. It merely uses Regex.Split to do its parsing. So not only is it incorrect (does not handle quoted fields and escaped chars), but its speed is less than stellar.
So you would pay only to get "detailed" documentation (ie a CHM file) ? I would point out that every single property and method of this project is fully documented.
|
|
|
|
|
Looks very good, but when we tried it on an industrial set of data, a problem showed up.
The existing code does not appear to handle having standard delimiters in string values (e.g. we had double quotes in the data we processed).
Standard CSV will store any double quotes in the middle of a string as two double quotes (e.g. every " is stored as ""). The CsvReader code assumes the string is terminating at that point, which then throws an exception.
Our temporary fix is a little messy, but basically adds a doublequote flag to the CsvReader.cs (starting at line 1260). This flag controls whether we are in the middle of a delimiter pair:
bool escaped = false;<br />
bool doublequote = false;
To make the character checks more readible we added a nextchar variable, to hold the lookahead logic:
char c = _buffer[pos];<br />
char nextchar = (pos + 1 < _bufferLength) ? _buffer[pos + 1] : ((pos + 1 == _bufferLength) ? (char)_reader.Peek() : '\0');
and we changed the parser logic to read:
if (escaped)<br />
{<br />
escaped = false;<br />
start = pos;<br />
}<br />
else if (c == _escape && (_escape != _quote || nextchar == _quote))<br />
{<br />
if (!discardValue)<br />
value += new string(_buffer, start, pos - start);<br />
<br />
escaped = true;<br />
}<br />
else if (c == _quote)<br />
{<br />
if (doublequote)<br />
{<br />
doublequote = false;<br />
}<br />
else<br />
{<br />
if (nextchar != _quote)<br />
{<br />
quoted = false;<br />
doublequote = false;<br />
break;<br />
}<br />
else<br />
{<br />
doublequote = true;<br />
}<br />
}<br />
}
Finally at the new line 1328 (and we know this is a complete hack, not using the delimiter variable, but we were in a rush :
if (!_eof)<br />
{<br />
if (!discardValue && pos > start)<br />
value += new string(_buffer, start, pos - start);<br />
<br />
if (value != null)<br />
{<br />
value = value.Replace("\"\"", "\"");<br />
}
(Sorry about the lack of indentation).
Just ran these changes on a 20,000 record CSV file (13Mb) from an online retail catalogue and it finished in less than a second. Well done on the performance. We tested the fix with embedded, leading and trailing double-quote pairs and the results all appear correct with no exceptions thrown.
Thanks for publishing such a handy piece of code.
Cheers, Dave
|
|
|
|
|
Simply setting the Escape property was not working ??
using (CsvReader csv = new CsvReader(file, true, ',', '"', '"', '#', true))
{
}
|
|
|
|
|
Hi Sébastien,
We had a different escape character specified and could not see how to cater for both at the same time (our data sources vary enormously as they come from different customers/systems).
I guess making it use the quote OR the escape for matching in any strings (what we effectively did), or allowing an array of escape characters, would be ideal.
Many thanks, Dave
|
|
|
|
|
Hello,
Thank you so much for your code that is saving me a incredible time.
The file I need to read contains multiple headers (for example a first line with short names, and a second one with long names). So I need to skip one or the other.
I could just had a comment char at the begining of the line I don't want but I don't want to modify the original file, neither work on a copy (the file can be huge...).
Any idea of the best way to use your code to do that?
By the way is there any user manual somewhere?
Thank you so much for sharing your work!
Renaud Pelissier
|
|
|
|
|
I did not test it, but I guess you could add a method to the CsvReader class:
public bool SkipToNextLine()
{
return SkipToNextLine(ref _nextFieldStart);
}
|
|
|
|
|
Thank you for answering Sébastien!
Actually, I know how to skip a "data" line. My problem is how to deal with a file where the header is not at the first row if I still want to use your header fonctionnality.
I'll share my code when done : I am programming a "Excel like" CSV import windows application.
Renaud
|
|
|
|
|
The method SkipToNextLine does not require the CsvReader to be initialized and can then be called before any call to ReadNextRecord or any other method/property which requires the reader to be initialized.
using (CsvReader csv = new CsvReader(data, true))
{
csv.SkipToNextLine();
while (csv.ReadNextRecord())
{
}
}
|
|
|
|
|
You are right your solution works for skipping lines after the header.
Below is what I found to skip lines before the header (my addition to your code is between the "RPEL" and "END RPEL" comments...
I introduced a private field named "_headerLinePosition" wich tells me how many lines I have to skip before getting to the header line.
protected virtual bool ReadNextRecord(bool onlyReadHeaders, bool skipToNextLine)
{
if (_eof)
{
if (_firstRecordInCache)
{
_firstRecordInCache = false;
_currentRecordIndex++;
return true;
}
else
return false;
}
CheckDisposed();
if (!_initialized)
{
_buffer = new char[_bufferSize];
_fieldHeaders = new string[0];
if (!ReadBuffer())
return false;
for (int iHLP = 1; iHLP < _headerLinePosition; iHLP++)
{
if (!SkipToNextLine(ref _nextFieldStart))
return false;
}
... your code continuing here
Regards,
Renaud
|
|
|
|
|
|
Hi Sebastien, I am new to c# so I am working my way through the code, I have one question: I need to load a dataset with data from a CSV file, can I do that your solution? If so, it would be great if you can show how I should do it.
Thanks,
Ashley
|
|
|
|
|
I have since found a lot of discussion in the comments on using the csv reader with datatables - sorry, I hadn't seen it before.
I am still struggling with getting the data table populated because it fits 4 column data into 2 columns. The csv data does not have column headers, and looks like this:
,"Report"
"Sales Channel 1"
"Sales"
,"2004","2005","2006"
"A",721566080.000000,707081280.000000,754787520.000000
"B",35836208.000000,27874150.000000,24252956.000000
"C",78945720.000000,74349744.000000,76166552.000000
"D",30386580.000000,38601984.000000,47140400.000000
"E",491753536.000000,476412544.000000,505892384.000000
I am using the following code to load the table:
DataTable tbl=new DataTable();
using (CsvReader csv=new CsvReader(new StreamReader(filePath + fileName), false))
{
tbl.Load(csv);
}
I would be grateful for any help.
|
|
|
|
|
Sorry for the delay, was far away from any computer What do you mean by
ashhorner wrote: I am still struggling with getting the data table populated because it fits 4 column data into 2 columns.
|
|
|
|
|
Hi, no probs, thanks for coming back to me.
My dataset has 4 columns per row (after the first few lines which just define the data spec). However, when I load the data into a datatable and bind that table to a datagrid the 4 columns of data per row are displayed as 2 rows of 2 columns, with the data for column 3 and 4 or a row being reported in the next row in column 1 and 2 - so I end up with twice as many rows and half as many columns as I am supposed to have.
Ashley
|
|
|
|
|
Hi,
I want to simply load the contents of the csv file into the System.Data.DataTable Class as shown in the code below.
using (CachedCsvReader csv = new CachedCsvReader(new StreamReader("File.csv"), false))
{
DataTable result = new DataTable();
result.Load(csv);
}
CachedCsvReader class inherits CsvReader which Implements IDataReader, so I was expecting the code above to work.
But it throws an InvalidOperationExpression, with the message "No Current Record".
SJ
|
|
|
|
|
This is really great! Like most people, I started with the string.Split() method, and quickly found out that it didn't get everything... This was a very nice code, and I'm glad that I found it! It helped me out tremendously!! Thanks!
|
|
|
|
|
Thanks for taking the time to tell me your appreciation
|
|
|
|
|
Hi Sebastien,
I love this piece of code, it saved me so much time and it is very well written. I love the detailed exceptions it throws because I can catch them and tell my users exactly where their csv file is screwed up. Nice Job!
I did possibly find one very small bug. When a user accidentally includes two identical headers GetFieldHeaders() fails with a System.ArgumentException "An item with the same key has already been added." I was thinking you may want to catch this and throw a MalformedCsvException.
Anyway, just my 2 cents, thanks again for the code.
Isn't it nice how you only hear from users when they report bugs? ha ha
Regards,
Mike
|
|
|
|
|
From my point of view, the CSV is well formed, but with a logical error due to duplicate names in the header. I agree I could handle this case and throw a more appropriate exception. In next version
|
|
|
|
|
I've got a CSV file that was generated from an Excel file in the following format:
Date,Engine,Remote Campaign,AdGroup,Keyword,Contextual,Impressions,Clicks,Media Cost,PageViews : Count,Visits : Count
26-Feb-07,Yahoo,USA Today - Tech Evergreen,Default,Default,No,0,0,$0.00 ,,
27-Feb-07,Yahoo,USA Today - Tech Evergreen,Default,Default,No,0,0,$0.00 ,,
28-Feb-07,Yahoo,USA Today - Tech Evergreen,Default,Default,No,0,0,$0.00 ,,
1-Mar-07,Yahoo,USA Today - Tech Evergreen,Default,Default,No,0,0,$0.00 ,,
2-Mar-07,Yahoo,USA Today - Tech Evergreen,Default,Default,No,0,0,$0.00 ,,
3-Mar-07,Yahoo,USA Today - Tech Evergreen,Default,Default,No,0,0,$0.00 ,,
4-Mar-07,Yahoo,USA Today - Tech Evergreen,Default,Default,No,0,0,$0.00 ,,
26-Feb-07,Google,USATODAY_Tech,blogmaney,tech blogger,No,3,0,$0.00 ,,
27-Feb-07,Google,USATODAY_Tech,blogmaney,tech blogger,No,2,0,$0.00 ,,
28-Feb-07,Google,USATODAY_Tech,blogmaney,tech blogger,No,2,0,$0.00 ,,
1-Mar-07,Google,USATODAY_Tech,blogmaney,tech blogger,No,2,0,$0.00 ,,
2-Mar-07,Google,USATODAY_Tech,blogmaney,tech blogger,No,6,0,$0.00 ,,
3-Mar-07,Google,USATODAY_Tech,blogmaney,tech blogger,No,3,0,$0.00 ,,
4-Mar-07,Google,USATODAY_Tech,blogmaney,tech blogger,No,3,0,$0.00 ,,
27-Feb-07,Google,USATODAY_Tech,blogmaney,technology blog,No,0,0,$0.00 ,,
28-Feb-07,Google,USATODAY_Tech,blogmaney,technology blog,No,1,0,$0.00
2-Mar-07,Google,USATODAY_Tech,blogmaney,technology blog,No,2,0,$0.00
3-Mar-07,Google,USATODAY_Tech,blogmaney,technology blog,No,1,0,$0.00
4-Mar-07,Google,USATODAY_Tech,blogmaney,technology blog,No,1,0,$0.00
Using the latest
_csvReader = new CsvReader(new StreamReader(fileName), true);<br />
_csvReader.MissingFieldAction = MissingFieldAction.ReplaceByEmpty;
The above file fails to parse correctly. Can you offer any insight?
|
|
|
|
|
Hi, I use this code to read a lines from csv file:
<br />
using (CsvReader csv = new CsvReader(new StringReader(fFileName), false))<br />
<br />
int fieldCount = csv.FieldCount;<br />
string[] headers = csv.GetFieldHeaders();<br />
string cHeader = "";<br />
<br />
while (csv.ReadNextRecord())<br />
for (int i = 0; i < fieldCount; i++)<br />
{<br />
string s = csv[i];<br />
}<br />
}<br />
but don't work.
Using the csv file, called "test3.csv", do not come visualized the lines with the program demo.
It appears, of the dataGridView, only the written one "C:\\test3.csv". How I can do?
Thanks...
|
|
|
|
|
Please use the CachedCsvReader class for databinding purposes.
|
|
|
|
|
Hello,
I tried my luck to process a 25 column CSV file. Sadly without success. The run was interrupted because the program cannot agree that the mentioned dimension of "_fieldHeaderIndexes" in CsvReader.cs should be increased over value "10". The critical code is (line 1456 - 1462):
for (int i = 0; i < _fields.Length; i++)
{
_fieldHeaders[i] = _fields[i];
_fieldHeaderIndexes.Add(_fields[i], i);
}
What can I do to solve the problem?
The first lines of the CSV file are:
Company,Symbol,prop1,prop2,prop3,prop4,prop5,prop6,prop7,prop8,prop9,prop10,prop11,prop12,prop13,prop14,prop15,prop16,prop17,prop18,prop19,prop20,prop21,prop22,prop23
Pre Paid Legal Services,PPD,4496837,73.5,33.89,9.43,,,-14.44,14.44,-0.30,0.30,1.17,1.17,4377658,2.72,13267000,61200,14204958,$39.41,$559817393,NY,Services,Personal Services,234.89,2007-Jan
Holdrs Tr-Wireless,WMH,361996,69.6,,0.61,,,-6.98,6.98,4.38,4.38,3.46,3.46,370023,-2.17,,5200,1700000,$62.05,$105484999,AMX,,,,2007-Jan
Tasty Baking Company - Common Stock,TSTY,869314,68.2,,22.98,,,-10.78,10.78,1.56,1.56,-0.33,0.33,872018,-0.31,,12741,8215261,$9.00,$73937349,NAS,,,,2007-Jan
ZipRealty Inc - Common Stock,ZIPR,2294817,68.1,14.27,-13.91,,,-34.58,34.58,3.07,3.07,-1.34,1.34,2328538,-1.45,16080000,33686,21021021,$7.49,$157447442,NAS,Financial,Real Estate Development,-135.18,2007-Jan
Best regards
Michael
|
|
|
|
|