|
Using directly the reader is much faster because you are streaming all the time. You were missing a using statement on the reader thought. Below is the code that should work. You are telling me that no data is written to the database... I did not test the reader with SqlBulkCopy, but I will try to find some time to test it this weekend.
protected void OnClick(object sender, EventArgs e)
{
using (LumenWorks.Framework.IO.Csv.CsvReader reader = new LumenWorks.Framework.IO.Csv.CsvReader(new StreamReader(Server.MapPath("data.txt")),false,'\t'))
{
using (SqlBulkCopy bulkCopy = new SqlBulkCopy(ConfigurationManager.ConnectionStrings["TestDatabaseConnectionString"].ConnectionString))
{
bulkCopy.DestinationTableName = "Table1";
bulkCopy.BulkCopyTimeout = 30;
bulkCopy.BatchSize = 10000;
bulkCopy.WriteToServer(reader);
}
}
}
|
|
|
|
|
Hi Sebastien
I managed to get it working with SQL Bulk Copy (my fault!). The only thing really I think would be useful would be to have a property such as Count or HasRows so that before a copy is performed you can test whether the reader contains any data because if you do a copy and there is nothing in it, it seems a bit pointless.
Thanks for your help
|
|
|
|
|
Sorry for not replying earlier, but I took some days off and computer was FAR away
Because the reader is streaming, I cannot have a RecordCount property. HasRecords is possible, but would have to be a method cause I need to parse headers first and then prefetch the first record.
That said, I don't think the overhead of doing an empty copy is going to kill your perf
|
|
|
|
|
I just noticed that when debugging the above code it doesn't copy the data across. Take the break point off and it puts the data in the table no problem. This is very weird. I also noticed if you put a watch on when debugging a watch reader[0,0] and reader[1,0] all is OK, as soon as you set the SQLBulkCopy properties reader[0,0] throws an exception whilst reader[1,0] is still ok. Any ideas?
|
|
|
|
|
Second question is the easier one. This is a forward-only reader (streaming), so you cannot go backward.
First question, I don't know. CsvReader behave the same in debug and release afaik. I am guessing you are spending too much time before reaching the write command and SqlBulkCopy has timed out.
|
|
|
|
|
I have a text file in the below format. I have set the delimeter to be a space but as you can see some have 3 spaces and others have 1 between the fields. I have tried to map the columns using SQL Bulk Copy but everytime I try to write to server the reader throws a MissingFieldCSVExpetion. "The CSV appears to be corrup near record 1..."
Could you possibly advise what is happening. Thanks
A 8 WSY £ 250
UPDATE: I have decided to use this code but to no avail. I removed the £ sign as this was causing issues.
using (CsvReader reader = new CsvReader(new StreamReader(SavePath), false, ' '))<br />
{<br />
reader.MissingFieldAction = MissingFieldAction.ReplaceByNull;<br />
<br />
using (SqlBulkCopy bulkCopy = new SqlBulkCopy(ConfigurationManager.ConnectionStrings["ConnectionString"].ConnectionString))<br />
{<br />
<br />
SqlBulkCopyColumnMapping Step1 = new SqlBulkCopyColumnMapping(0, "Step1");<br />
bulkCopy.ColumnMappings.Add(Step1);<br />
<br />
SqlBulkCopyColumnMapping Step2 = new SqlBulkCopyColumnMapping(1, "Step2");<br />
bulkCopy.ColumnMappings.Add(Step2);<br />
<br />
SqlBulkCopyColumnMapping Step3 = new SqlBulkCopyColumnMapping(2, "Step3");<br />
bulkCopy.ColumnMappings.Add(Step3);<br />
<br />
SqlBulkCopyColumnMapping Price = new SqlBulkCopyColumnMapping(3, "Price");<br />
bulkCopy.ColumnMappings.Add(Price);<br />
<br />
bulkCopy.DestinationTableName = "fmr_Test";<br />
bulkCopy.BulkCopyTimeout = 30;<br />
bulkCopy.BatchSize = 10000;<br />
bulkCopy.WriteToServer(reader);<br />
}<br />
}
|
|
|
|
|
Hello
I've a problem with csv file.
The data format may be :
'field1','field2','field,3'
you can notice the , within the two quotes.
when i put "" in the escape caracter it don't work
i tried to make an hack and it is very ugly (in bold)
if you confirm the problem, can you check this code and if you can provide a solution, it will be very very cool
thanks
private string ReadField(int field, bool initializing, bool discardValue)<br />
{<br />
if (!initializing)<br />
{<br />
if (field < 0 || field >= _fieldCount)<br />
throw new ArgumentOutOfRangeException("field", field, string.Format(CultureInfo.InvariantCulture, ExceptionMessage.FieldIndexOutOfRange, field));<br />
<br />
if (_currentRecordIndex < 0)<br />
throw new InvalidOperationException(ExceptionMessage.NoCurrentRecord);<br />
}<br />
<br />
if (_fields[field] != null)<br />
return _fields[field];<br />
else if (_missingFieldsFlag)<br />
return HandleMissingField(null, field, ref _nextFieldStart);<br />
<br />
CheckDisposed();<br />
<br />
int index = _nextFieldIndex;<br />
<br />
while (index < field + 1)<br />
{<br />
if (_nextFieldStart == _bufferLength)<br />
{<br />
_nextFieldStart = 0;<br />
<br />
ReadBuffer();<br />
}<br />
<br />
string value = null;<br />
bool eol = false;<br />
<br />
if (_missingFieldsFlag)<br />
{<br />
value = HandleMissingField(value, index, ref _nextFieldStart);<br />
}<br />
else if (_nextFieldStart == _bufferLength)<br />
{<br />
<br />
<br />
if (index == field)<br />
{<br />
if (!discardValue)<br />
{<br />
value = string.Empty;<br />
_fields[index] = value;<br />
}<br />
}<br />
else<br />
{<br />
value = HandleMissingField(value, index, ref _nextFieldStart);<br />
}<br />
}<br />
else<br />
{<br />
if (_trimSpaces)<br />
SkipWhiteSpaces(ref _nextFieldStart);<br />
<br />
if (_eof)<br />
value = string.Empty;<br />
else if (_buffer[_nextFieldStart] != _quote)<br />
{<br />
<br />
int start = _nextFieldStart;<br />
int pos = _nextFieldStart;<br />
<br />
for (; ; )<br />
{<br />
while (pos < _bufferLength)<br />
{<br />
char c = _buffer[pos];<br />
<br />
if (c == _delimiter)<br />
{<br />
_nextFieldStart = pos + 1;<br />
<br />
break;<br />
}<br />
else if (c == '\r' || c == '\n')<br />
{<br />
_nextFieldStart = pos;<br />
eol = true;<br />
<br />
break;<br />
}<br />
else<br />
pos++;<br />
}<br />
<br />
if (pos < _bufferLength)<br />
break;<br />
else<br />
{<br />
if (!discardValue)<br />
value += new string(_buffer, start, pos - start);<br />
<br />
start = 0;<br />
pos = 0;<br />
_nextFieldStart = 0;<br />
<br />
if (!ReadBuffer())<br />
break;<br />
}<br />
}<br />
<br />
if (!discardValue)<br />
{<br />
if (!_trimSpaces)<br />
{<br />
if (!_eof && pos > start)<br />
value += new string(_buffer, start, pos - start);<br />
}<br />
else<br />
{<br />
if (!_eof && pos > start)<br />
{<br />
pos--;<br />
while (pos > -1 && IsWhiteSpace(_buffer[pos]))<br />
pos--;<br />
pos++;<br />
<br />
if (pos > 0)<br />
value += new string(_buffer, start, pos - start);<br />
}<br />
else<br />
pos = -1;<br />
<br />
if (pos < 0)<br />
{<br />
pos = (value == null ? -1 : value.Length - 1);<br />
<br />
while (pos > -1 && IsWhiteSpace(value[pos]))<br />
pos--;<br />
<br />
pos++;<br />
<br />
if (pos > 0 && pos != value.Length)<br />
value = value.Substring(0, pos);<br />
}<br />
}<br />
<br />
if (value == null)<br />
value = string.Empty;<br />
}<br />
<br />
if (eol || _eof)<br />
{<br />
eol = ParseNewLine(ref _nextFieldStart);<br />
<br />
if (!initializing && index != _fieldCount - 1)<br />
{<br />
if (value != null && value.Length == 0)<br />
value = null;<br />
<br />
value = HandleMissingField(value, index, ref _nextFieldStart);<br />
}<br />
}<br />
<br />
if (!discardValue)<br />
_fields[index] = value;<br />
}<br />
else<br />
{<br />
<br />
int start = _nextFieldStart + 1;<br />
int pos = start;<br />
<br />
bool quoted = true;<br />
bool escaped = false;<br />
<br />
for (; ; )<br />
{<br />
while (pos < _bufferLength)<br />
{<br />
char c = _buffer[pos];<br />
<br />
if (escaped)<br />
{<br />
escaped = false;<br />
start = pos;<br />
}<br />
else if (c == _escape && (_escape != _quote || (pos + 1 < _bufferLength && _buffer[pos + 1] == _quote) || (pos + 1 == _bufferLength && _reader.Peek() == _quote)))<br />
{<br />
if (!discardValue)<br />
value += new string(_buffer, start, pos - start);<br />
<br />
escaped = true;<br />
}<br />
<br />
else if (c == _quote && !(pos + 1 < _bufferLength))<br />
{<br />
Console.WriteLine("quote arrivé à la fin du buffer à la position " + pos);<br />
<br />
if (!discardValue && !escaped)<br />
value += new string(_buffer, start, pos - start);<br />
<br />
start = 0;<br />
pos = 0;<br />
_nextFieldStart = 0;<br />
<br />
if (!ReadBuffer())<br />
{<br />
HandleParseError(new MalformedCsvException(GetCurrentRawData(), _nextFieldStart, _currentRecordIndex, index), ref _nextFieldStart);<br />
return null;<br />
}<br />
<br />
c = _buffer[pos];<br />
<br />
if (c == _delimiter || c == '\r' || c == '\n')<br />
{<br />
quoted = false;<br />
break;<br />
}<br />
else<br />
{<br />
continue;<br />
}<br />
}<br />
<br />
else if (c == _quote && !(pos + 1 < _bufferLength && _buffer[pos + 1] == _delimiter) && !(pos + 1 < _bufferLength && _buffer[pos + 1] == '\r') && !(pos + 1 < _bufferLength && _buffer[pos + 1] == '\n'))<br />
{<br />
Console.WriteLine("virgule arrivé entre deux guillemets position " + pos);<br />
}<br />
else if (c == _quote)<br />
{<br />
quoted = false;<br />
break;<br />
}<br />
<br />
pos++;<br />
}<br />
<br />
if (!quoted)<br />
break;<br />
else<br />
{<br />
if (!discardValue && !escaped)<br />
value += new string(_buffer, start, pos - start);<br />
<br />
start = 0;<br />
pos = 0;<br />
_nextFieldStart = 0;<br />
<br />
if (!ReadBuffer())<br />
{<br />
HandleParseError(new MalformedCsvException(GetCurrentRawData(), _nextFieldStart, _currentRecordIndex, index), ref _nextFieldStart);<br />
return null;<br />
}<br />
}<br />
}<br />
<br />
if (!_eof)<br />
{<br />
if (!discardValue && pos > start)<br />
value += new string(_buffer, start, pos - start);<br />
<br />
_nextFieldStart = pos + 1;<br />
<br />
SkipWhiteSpaces(ref _nextFieldStart);<br />
<br />
bool delimiterSkipped;<br />
if (_nextFieldStart < _bufferLength && _buffer[_nextFieldStart] == _delimiter)<br />
{<br />
_nextFieldStart++;<br />
delimiterSkipped = true;<br />
}<br />
else<br />
{<br />
delimiterSkipped = false;<br />
}<br />
<br />
if (!_eof && !delimiterSkipped && (initializing || index == _fieldCount - 1))<br />
eol = ParseNewLine(ref _nextFieldStart);<br />
}<br />
<br />
if (!discardValue)<br />
{<br />
if (value == null)<br />
value = string.Empty;<br />
<br />
_fields[index] = value;<br />
}<br />
}<br />
}<br />
<br />
if (initializing || index < _fieldCount - 1)<br />
_nextFieldIndex = Math.Max(index + 1, _nextFieldIndex);<br />
else<br />
_nextFieldIndex = 0;<br />
<br />
if (index == field)<br />
{<br />
<br />
if (initializing && (eol || _eof))<br />
return null;<br />
else<br />
return value;<br />
}<br />
<br />
index++;<br />
}<br />
<br />
HandleParseError(new MalformedCsvException(GetCurrentRawData(), _nextFieldStart, _currentRecordIndex, index), ref _nextFieldStart);<br />
return null;<br />
}
|
|
|
|
|
If I understand you correctly, you would like the reader to return 2 fields for 'field,3' with values "field" and "3", instead of 1 field with value "field,3".
I would rather suggest you to split the content of "field,3" manually because that is a clear case of a malformed CSV.
using (CsvReader csv = new CsvReader(...))
{
while (csv.ReadNextRecord())
{
string[] values = csv[2].Split(',');
}
}
|
|
|
|
|
no it is one field (that file isn't malformed)
"header1","header2","header3","header4"
"field1","field2","field,3","field4"
Here I'd like 4 fields
when i use the following call : (vb .net)
New Lumen.CsvReader(New StreamReader(csvFilePath), True, ",", """", "", "#", False)
delimiter : ,
quote : '
escape caracter : blank string ("")
I receive an error because i have too many fields
when i look in the code, the case "the escape caracter is a blank string) isn't handled
i tryied to correct that in the code but it's ugly and i'm 100% sure
Thanks
|
|
|
|
|
Is the quote ' or " ? It would help me if you are precise and consistent in your problem description. Providing me with the exact csv data and code would allow me to quickly test your scenario.
The quote, delimiter, escape and comment are single char, not string. So you cannot have a blank string for escape. That would be the null char '\0'.
|
|
|
|
|
I sent you all the information by email
did you receive them ?
Philippe
|
|
|
|
|
Yes and I replied to you with this email:
On the second line, I see
"CAIS. D"EPARGNE"
That might be causing your problem. The " needs to be escaped. For example:
"CAIS. D""EPARGNE"
or
"CAIS. D\"EPARGNE"
depending what escape character you are using.
|
|
|
|
|
i didn't have your email
that's not the problem, the separator is , not .
(i hope i didn't send you a bad file)
can you give me your email adress by mp
i'll send you a complete project that don't work
|
|
|
|
|
I just replied to your email @gmail.com. If you do not receive an email, please post a reply here. The email content is:
The problem is not the ".", it is the quote inside the quoted field. You need to escape it. The reader handles using the same char for quote and escape, so you can double it if you wish.
"CAIS. D ""EPARGNE"
|
|
|
|
|
well you're right
if the line is misformed, i saw we can handle the ParseErrorAction
at this moment, can we isolate the line that pose problem, edit it, and reload this line into the parser ?
i'd like to avoid a read / write file step (the file is huge)
|
|
|
|
|
Looks like I can just do this to fill a datatable?
Table = new DataTable();
using (CsvReader csv = new CsvReader(
new System.IO.StreamReader(FilePath), false))
{
Table.Load(csv);
}
However how can I limit the number of rows read so that I just get the first 5 rows? I only need to get a quick sample of the data not read the whole file.
|
|
|
|
|
There is no built-in way to do that with the CsvReader class. However, you could use the CachedCsvReader class as follow:
- read the first N records
- close the stream
- move to start again (CachedCsvReader.MoveToStart)
- fill DataTable
|
|
|
|
|
What is the license on this software?
GPL?
(http://www.gnu.org/philosophy/categories.html)
Thanks,
Stuart
|
|
|
|
|
Aha, just read that it's MIT license.
BUT it is not included in the 1.1 source download.
Stuart
|
|
|
|
|
I just downloaded the 1.1 version to verify that and the license is included at the top of every code file. But I will also include it at the solution root in next release.
|
|
|
|
|
It's great to see a well-written, well-supported utility like this. It looks just what I need to deal with some Excel 2007 spreadsheets created by another department. I'm trying to use your code to parse a csv I created from one of them. My code begins:
using (LumenWorks.Framework.IO.Csv.CsvReader csv =
new LumenWorks.Framework.IO.Csv.CsvReader(new StreamReader(inputFileName), true))
{
csv.MissingFieldAction = LumenWorks.Framework.IO.Csv.MissingFieldAction.ReplaceByEmpty;
csv.SkipEmptyLines = true;
int fieldCount = csv.FieldCount;
As soon as it gets to csv.FieldCount, VS2005 throws a "An item with the same key has already been added" exception. Do you have any idea why?
The first three lines of the csv are (each on its own line):
Site,Division,Directorate,Department,Team,Directory Dept Name,Surname,Forename,Level 7,Extension,Official Job Title,Co. Mobile,Category,Phone ID,Note,
,,,,,,,,,,,,,,P = people,
,,,,,,,,,,,,,,J = roles,
|
|
|
|
|
I tried the code snippet you provided and I cannot reproduce the bug you describe. Here is my code:
string data = @"Site,Division,Directorate,Department,Team,Directory Dept Name,Surname,Forename,Level 7,Extension,Official Job Title,Co. Mobile,Category,Phone ID,Note,
,,,,,,,,,,,,,,P = people,
,,,,,,,,,,,,,,J = roles,";
using (CsvReader csv = new CsvReader(new System.IO.StringReader(data), true))
{
csv.MissingFieldAction = MissingFieldAction.ReplaceByEmpty;
csv.SkipEmptyLines = true;
int fieldCount = csv.FieldCount;
}
|
|
|
|
|
Thanks for trying. Ironically, neither can I, now. I recreated the csv file from the source (for the nth time) and now I'm not seeing the problem.
|
|
|
|
|
I am experiencing this same problem. In my case, if I do a raw dump of the csv data, the first line is:
"Product Group,Lit Number (short),Thumbnail URL,Title,Description,LIbraryLink,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"
I don't know why it has all those empty headers, but that's why the error is being thrown. Is there any way to suppress "EmptyHeaders" like we can suppress EmptyLines?
-- Joel Clermont
|
|
|
|
|
An option in the constructor let you set the reader to treat header as simple data. By doing so, you will need to access fields by their index, unless you build your own name->index dictionary (and skip those empty headers).
|
|
|
|
|