 |
|
 |
Very well done. Woks like a charm for me. Thanks for sharing! Keep it up!
|
|
|
|
 |
|
 |
On a specially crafted input file where the buffer fills up before the very last \r\n, the program will crash with a ParsingException saying "MaxBufferSize exceeded. Try increasing the buffer size.".
NOTE: The workaround for this bug is to literally just add a character to the input file near where the exception is being thrown. This will cause the read to wrap around to the next line and be fine.
The Buffer size is fine, the condition for throwing the exception needs to be tweaked.
Here's the setup:
intCharactersRead = this.m_txtReader.ReadBlock(this.m_caBuffer, this.m_intStartIndexOfNewData, (this.m_intMaxBufferSize - this.m_intStartIndexOfNewData));
this.m_intNumberOfCharactersInBuffer = intCharactersRead + this.m_intStartIndexOfNewData;
this.m_intReadIndex = this.m_intStartIndexOfNewData;
this.m_caBuffer[0] = '\n'
Given those values, we try to read the next character (there are no more characters). Before this read is attempted we call _CopyRemainingDataToFront(0);
It is currently:
if (intStartIndex == 0)
throw this._CreateParsingException("MaxBufferSize exceeded. Try increasing the buffer size.");
But, that can happen during normal usage (albeit with a very low probability). It needs to be:
if (intStartIndex == 0 && (this.m_intNumberOfCharactersInBuffer == this.m_intMaxBufferSize))
throw this._CreateParsingException("MaxBufferSize exceeded. Try increasing the buffer size.");
I ran this fix against the unit tests and they all now pass, including a new test I created for my edge-case input file. PM me for details on crafting the input file. I can send a cleaned version.
|
|
|
|
 |
|
 |
Sorry this took so long to turn around, but I read this the first time too quickly and had to re-read this post a few times to capture all the information you conveyed here.
I've submitted a new version to CodeProject, so you should see version 1.1.2 here shortly enough.
Thanks again for your bug report.
|
|
|
|
 |
|
 |
I am parsing a file which has whitespace as the column delimiter.
The Generic parser seems to create a new column for each whitespace character, so where as I actually have only 5 columns, the parser thinks there are 80+.
Is there a way to get the parser to ignore consecutive character delimiters?
Thanks
Ben
|
|
|
|
 |
|
 |
To answer my own question, not adding in empty columns on line 2295 seems to do the trick
this.m_lstData.Add(new string(this.m_caBuffer, intStartOfDataIndex, intEndOfDataIndex - intStartOfDataIndex + 1));
}
else
{
}
Any comments as to whether or not this is a good/dangerous thing to do?
Maybe a IgnoreEmptyData option could be added?
Thanks for any help,
Ben
|
|
|
|
 |
|
 |
The above change will prevent any empty strings from being added. If your data includes empty 'columns', you will need to do this.
If it is just your header, I would add it to the SetColumnNames method - you could just strip out the empty columns there.
You might also approach this from the perspective of wrapping the class to just exclude any columns that do not have a column name. By the very definition of what you're saying, you have an improperly formatted file or just a ton of empty columns.
Be aware, if you've really got a FixedWidth file, I would recommend switching over to use that, as it would make more sense to do such.
|
|
|
|
 |
|
 |
Thank you for your answer.
In my domain (oil and gas), ascii files are usually space separated, with extra spaces put in to make it look more readable. So ignoring the empty columns/cells is OK.
Thanks again.
|
|
|
|
 |
|
 |
Hello Andrew, I love your parser, it works easy and fast. However I'm coming across a small problem that I'm not able to fix using your parser itself. I'm importing a ; delimited file with fixed width columns. I'm importing them as delimited though and trimming the blank space. So far so good. However there is one column that contains string values as: Vo4+ Xo2o Yo3o Z+1o When I import the files everything from that string starting with the + character disappears. So the above will be imported as: Vo4 Xo2o Yo3o Z Do you have any idea how this is caused and how I can fix it? Or is this a small bug in the parser itself?
Thanks in advance, kind regards, Marcel The Netherlands
|
|
|
|
 |
|
 |
Copying and pasting what you posted doesn't cause any issue. I would bet it has to do with the encoding of the file. Use the overload of the constructor to set the proper encoding and see if that helps. If it doesn't, post back and so that I can get a hold of that text file for testing.
|
|
|
|
 |
|
 |
Hi Andrew, had another question for you. It would be great if you can help me out. Thanks.
This is my file (just 2 columns)
Year|Notes|[CR][LF]
2011|DVTIR[CR]
[CR][LF]
REF:8473[CR][LF]
[Loy,Jane]|[CR][LF]
2011|KKOEU|[CR][LF]
2011|IRUEN|[CR][LF]
My parser settings:
parser.FirstRowHasHeader = true;
parser.FirstRowSetsExpectedColumnCount = true;
parser.ColumnDelimiter = '|';
parser.CommentCharacter = null;
parser.TrimResults = true;
parser.TextQualifier = null;
parser.EscapeCharacter = null;
With this, I'm getting the right column headers, but when I read the first row (along with the header when .Read() is called), I don't get the whole content. This is what I need:
DVTIR[CR]
[CR][LF]
REF:8473[CR][LF]
[Loy,Jane]
Instead I'm getting only till [CR]:
DVTIR
But if I add a text qualifier say (double quotes) to the content, then it works fine. This is what I had for another text file from the customer. This text file doesn't have any text qualifier or escape delimiter.
Can you please let me know what I'm missing? Does null for text qualifier work?
I debugged your source code and found that _SkipToEndOfText was only called when we have a text qualifier and was not called for null text qualifier. I was skeptical to change the code because lot of other files were using this parser and want to check with you on this issue before making any change.
Cheers,
Gopi
|
|
|
|
 |
|
 |
The TextQualifier field is a char?, so it is designed to support nullable characters.
Your problem seems to be due to setting:
FirstRowSetsExpectedColumnCount = true;
Remove that setting and you should produce the following output:
| Year |
Notes |
Column1 |
| 2011 |
DVTIR |
|
| REF:8473 |
|
|
| [Loy,Jane] |
|
|
| 2011 |
KKOEU |
|
| 2011 |
IRUEN |
|
|
|
|
|
 |
|
 |
Hey, thanks for the quick reply. I removed everything and have the bare minimum as below.
parser.FirstRowHasHeader = true;
parser.ColumnDelimiter = '|';
parser.TrimResults = true;
parser.TextQualifier = null;
parser.EscapeCharacter = null;
Somehow I still don't get the desired result.
For the file I need the data to be parsed like this.
Row 1:
Year: 2011
Notes: DVTIR[CR]
[CR][LF]
REF:8473[CR][LF]
[Loy,Jane]
Above, I'm getting Notes as DVTIR only.
Row 2:
Year: 2011
Notes: KKOEU
I'm sure I'm missing something here. Thanks for all your help. Much appreciated.
Cheers,
Gopi
|
|
|
|
 |
|
 |
I forgot to mention. If that isn't what you were wanting, perhaps you can rephrase your problem.
The property FirstRowSetsExpectedColumnCount allows you to enforce the number of columns expected in the input.
Re-reading this again, it looks like you've got a multi-line input actually. The final pipe at the end is essentially used to indicate a new row has occurred. If that is the case, then you've got something else to do here. In that case, I don't think this case is easily supported by my code. You may want to take a look at A Fast CSV Reader[^] as an alternative.
|
|
|
|
 |
|
 |
Hi, yes, that's exactly what I need. It is a multi-line input. Ah, I thought your's supported it. Thanks for the Fast CSV reader link. Will take a look at it.
Cheers,
Gopi
|
|
|
|
 |
|
 |
It does support multi-line rows, but the problem is that your data isn't using a text qualifier to indicate such. It's basically keep reading the column until you reach a pipe character (ie. |). It's not a standard practice as far as I've seen.
You may have better luck with the CsvReader for this scenario.
|
|
|
|
 |
|
 |
Hey,
I used your library to do some customizations/enhancements to an open source Unit Testing / Automation Platform called MbUnit/Gallio. It's a completely open source, non-commercial project.
I know your stuff falls under the CPOL license, but I wanted to get your blessing before including it in the source code. Will you shoot me an email to discuss?
Thanks!
Aleks
modified on Wednesday, June 22, 2011 11:23 PM
|
|
|
|
 |
|
 |
Hey everyone,
I'm trying to get started with the Generic Parser solution, yet i'm encounter a pretty big problem right off the bat. Every time I read a file the character right before the delimiter gets cut off & the delimiter remains. Basically, when it reads in the following row (comma is the delimiter):
"and,bob,cat,dad,"
The generic parser will return the following columns:
Col1: an
Col2: ,bo
Col3: ,ca
Col4: ,da
I've tried changed the delimiter and several other properties but no luck. Any help would be appreciated.
|
|
|
|
 |
|
 |
You probably need to show me a bit more of your code because the following works just fine:
using System.IO;
using GenericParsing;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
using (StringReader sr = new StringReader("and,bob,cat,dad,"))
{
using (GenericParser gp = new GenericParser(sr))
{
gp.Read();
}
}
}
}
}
I created a new console project. Added the reference to the GenericParser.dll from the binaries download and I received the output of the following columns:
- gp[0] = "and"
- gp[1] = "bob"
- gp[2] = "cat"
- gp[3] = "dad"
- gp[4] = ""
Check the settings you are using for it, I'm sure that is where your problem lies. If not, it may not be in the GenericParser, but whatever you're using to feed it data.
|
|
|
|
 |
|
 |
Hi, first of all, your tool rocks! I've been using it for the past couple of days and am impressed. I ran into one issue.
This is my reader in .NET.
...
parser.ColumnDelimiter = ',';
parser.TextQualifier = '\"';
...
My sample 3 rows with [CR][LF] being the carriage return + line feed (row delimiter here).
Sample 1:
"1021","test data 1","hello there, what are you up to?"[CR][LF]
"1022","test data 2","hello there again, what are you up to?[CR]
[CR][LF]
i'm doing fine. thanks!"[CR][LF]
"1023","test data 1","hello there, what are you up to?"[CR][LF]
Reading these 3 rows works fine. Specifically the last column in 2nd row gives me correct value:
hello there again, what are you up to?[CR]
[CR][LF]
i'm doing fine. thanks!
But if I add a double quoted value in second row which has [CR][LF] in between the data, it fails to read the correct value.
Sample 2:
"1022","test data 2","hello there again, "what" are you up to?[CR]
[CR][LF]
i'm doing fine. thanks!"[CR][LF]
Note the double quotes added to "what". Now after parsing I get this value:
hello there again, "what" are you up to?
Everything after [CR] in the line gets stripped and it treats
i'm doing fine. thanks!"[CR][LF]
as a new row.
I maybe overlooking something straight forward. I looked into the help/documentation and couldn't find anything to solve this problem. Any help would be appreciated.
Cheers,
Gopi
|
|
|
|
 |
|
 |
The double quotes needs to actually be doubled up. Your sample line should have read as follows to be considered valid:
"1022","test data 2","hello there again, ""what"" are you up to?[CR]
[CR][LF]
i'm doing fine. thanks!"[CR][LF]
or, if you're using the escape character ('\\'):
"1022","test data 2","hello there again, \"what\" are you up to?[CR]
[CR][LF]
i'm doing fine. thanks!"[CR][LF]
I don't know any parsers that would work with the original text and behave as you were expecting. The case you provided is always assumed to be malformed data.
|
|
|
|
 |
|
 |
Thanks for the reply Andrew. Using double quotes or ('\\') to escape works fine. Awesome!
Cheers,
Gopi
|
|
|
|
 |
|
 |
Based on my understanding for a Fixed Length scenario, it looks like the Column Headings are derived from HeaderRow using the Column Widths specified in the xml. This may not be always practical. For e.g. Say I have a fixedlength txt file with 2 columns of data, one with width 3 and another with width 25. I would want the generated datatable to have 2 columns with names MemberID and MemberName. How do I specify the header row in the input txt file?
A better approach would be to abstract the header row info into the input xml file. That is, allow the Column Name, Width (and also DataType) for all the columns to be specified in the input xml file instead of in the input txt file. That way, we are not forced to restrict the length of the column name to the width of the column.
Please let me know if my understanding is correct, and if I can expect an update on this anytime soon.
|
|
|
|
 |
|
 |
True. You are correct.
I'd go one of the three following options:
1) Set the column name after calling GenericParserAdapter.GetDataTable().
2) Rolling your own GetDataTable in a form similar to SqlDataAdapter.Fill. It'd essentially be the same, except you'd pass in the precreated DataTable.
3) Modifying the GenericParserAdapter to read column name/type from the Xml and rewrite a bunch more code.
I'd probably go with option #2 for the minimal effort, but it depends on your situation somewhat. As for updating for this, it's not critical enough to warrant an update imo. If I put together an update later on, I'll put the below code in and possibly put together a more thorough solution. For today, I'll at least put together the Fill method for you. If that doesn't get you started in the right direction, let me know.
public DataTable GetDataTable()
{
DataTable dtData;
dtData = new DataTable();
this.Fill(dtData);
return dtData;
}
public void Fill(DataTable dtData)
{
DataRow drRow;
int intCreatedColumns, intSkipRowsAtEnd;
dtData.BeginLoadData();
dtData.Rows.Clear();
foreach (DataColumn dc in dtData.Columns)
this.m_lstColumnNames.Add(dc.ColumnName);
intCreatedColumns = dtData.Columns.Count;
while (this.Read())
{
if (this.m_lstColumnNames.Count > intCreatedColumns)
{
if (this.m_blnIncludeFileLineNumber && (intCreatedColumns < 1))
dtData.Columns.Add(GenericParserAdapter.FILE_LINE_NUMBER);
for (int intColumnIndex = intCreatedColumns; intColumnIndex < this.m_lstColumnNames.Count; ++intColumnIndex, ++intCreatedColumns)
GenericParserAdapter.AddColumnToTable(dtData, this.m_lstColumnNames[intColumnIndex]);
}
if (!this.IsCurrentRowEmpty || !this.SkipEmptyRows)
{
drRow = dtData.NewRow();
if (this.m_blnIncludeFileLineNumber)
{
drRow[0] = this.FileRowNumber;
for (int intColumnIndex = 0; intColumnIndex < this.m_lstData.Count; ++intColumnIndex)
drRow[intColumnIndex + 1] = this.m_lstData[intColumnIndex];
}
else
{
drRow.ItemArray = this.m_lstData.ToArray();
}
dtData.Rows.Add(drRow);
}
}
intSkipRowsAtEnd = this.m_intSkipEndingDataRows;
while ((intSkipRowsAtEnd-- > 0) && (dtData.Rows.Count > 0))
dtData.Rows.RemoveAt(dtData.Rows.Count - 1);
dtData.EndLoadData();
}
|
|
|
|
 |
|
 |
Thanks for responding Andrew.
Not all our input files contain header line and we need to parse a variety of input files based on different schemas. Thats why we need to have the schema info to reside outside of the input txt/csv file. Also we are looking for a the parser that allows us to quickly specify the schema info via some xml configuration files (instead of having another layer of code on top of the parser to do the complete job), i.e. where this functionality is encapsulated in the parser library. Hence option 3 would be ideal for us.
Since you are not planning to provide an update soon, I looked around for other solutions. Although your solution has a lot more cool and useful features, I find the following solution which allows xml configuration of schema and other params meeting our main requirement:
A Modified C# Implementation of Tony Selke's TextFieldParser[^]
So I am planning to go with this for now.
|
|
|
|
 |
|
 |
Np. Sorry, I couldn't be of much more use.
|
|
|
|
 |