|
|
Comments and Discussions
|
|
 |

|
Hi,
This looks almost perfect for my needs. I would like to be able to read lines from a file between 2 byte offsets. Would it be something you can add easily?
Background:
I want to be able to quickly divide a file into logical chunks which will then be loaded in parallel into SQL Server. This will be a major performance boost over the current BCP program and will negate the need to chainsaw the file first.
With my limited c# skills, I have already written the code to determine the start and end byte offsets of the lines that each thread will process.
|
|
|
|

|
I'm not 100% sure what you're asking for, but you could easily pass the parser whatever 'Stream' (see MemoryStream as a starting point) you want, which could contain the sectioned pieces to parse.
As for performance, I would think that having a thread to parse columns from each row would be horribly inefficient. Perhaps, a thread for every 200kb of data might be a good place to start.
Anyways, you likely don't need any help from me on this to use this parser in your scenario.
|
|
|
|

|
How would you recommend retaining error rows? Currently, GenericParser throws an exception for bad data including "Expected column count of 4 not found. [Row: 10, Column: 2]". My goal is to end up with both a DataTable of parsed rows, and perhaps...
Dictionary<row number, raw text>
I got as far as changing:
while (this.Read())
to
bool continueParsing = this.Read();
while (continueParsing)
{
...
try
{
continueParsing = this.Read();
}
catch (ParsingException)
{
}
}
but discovered it is a character-by-character parser. How would I get the text of the entire row, and move the read index to the end of the row? Is there a better way you would recommend?
|
|
|
|

|
It really depends on what information you want to present back up the chain.
If you want the full text of the line (which might be quite large), you'll need to keep a running tally of this as you extract each column, 'reverse engineer' it by taking the parsed columns and reconstructing what the data would be, or just provide what the parsed values were.
From that point, your next question is how to continue the parsing. There are only two places in the code that call _CreateParsingException that allow for you to continue parsing (within _HandleEndOfRow and _ExtractColumn). You could just modify them to allow you to skip to the end of the row before throwing the exception. When the next read is called, it should progress as if nothing happened.
|
|
|
|
|
|

|
Hello I have a file that has a header that starts out with a semicolon ; and the next segment of data starts out with Colon :Template, which represents an object definition, then the next line is a colon : with the header info separated by columns followed by the info that goes into those columns separated by columns. How would I specify properties for this kind of file ?
|
|
|
|

|
My gut feeling is that you probably will not be able to use this parser or any 'generic' parser out there for something like that, as the format sounds very unique. If you can provide a sample of what the data would look like, as your description is a little unclear, I could confirm my suspicions quite easily.
|
|
|
|

|
Here is the sample - I will cut it down a bit but you will get the jist of it. ; Created 3/11/2012 Project Sample
:TEMPLATE=ShoeFinish.Black :ObjectName,FieldA,FieldB,FieldC .... all the way to FieldXYZ Toledos,leather,,Strings,brown,,,,Special :TEMPLATE=ShoeFinish.Brown :ObjectName,FieldA,FieldB,FieldC .... all the way to FieldXYZ Toledos,leather,,Strings,brown,,,,Special The first ':' is the beginning of the group of items, the second ':' is the group of items fields with their respective values following. BTW: I was thinking if this would not work that I might need to read it in with Stream Reader one line at a time and then parse with RegEx. but I am not sure about the best approach..
|
|
|
|

|
Yeah, this is a pretty unique file format.
I'd say your best bet is RegEx, String.Split, or parsing the strings manually - either would suffice. Based on the complexity of the data encoded, you may find one form or another to be more successful at parsing the file. But that will be up to you to decide.
Good luck.
|
|
|
|

|
If you use the GetDataTable method on a file that has a header row, but no content rows, the resulting DataTable will not have any Columns. This seems like incorrect behavior to me and led to an obscure bug in our software because it complained that a core column was missing from the DataTable. Can you resolve this please?
|
|
|
|

|
It is a pretty simple fix. Update the final return statement in the Read method, as such:
public bool Read()
{
return ((this.m_lstData.Count > 0) || !this.m_blnSkipEmptyRows || (this.m_blnHeaderRowFound && (this.m_intDataRowNumber == 0)));
}
This should fix the problem you're having. Oddly enough, it only manifests itself if you have a blank row after the header. I believe someone else had reported a similar issue, but I unfortunately only resolved it for their specific case.
Applying the above change, should resolve your issue. I'll see if I can roll out this change sometime soon, but for now you can take the code and recompile it to suit your immediate needs.
|
|
|
|

|
Please feel free to redirect me if this is not an appropriate posting. I have been programming since the 80's, but sadly I am quite new to the Visual Studio environment (... well I used to program in VB4.0...).
I've spent about an hour attempting to figure out how to add this code (or project) to my existing project. I've then spent about another hour trying to learn about projects & solutions and adding references in VS, but again I have failed to get this code working when I compile. Can anyone give me step-by-step instructions for including this code?
Feel free to point me to something in this thread that I missed the already explains it. Thanks ahead of time!
-marcus
|
|
|
|

|
I'm not sure what exactly you are having a problem with, but I'd assume your issue is something at a very fundamental level. I'd recommend starting with a clean solution, adding a console application, and then including the references to the DLLs included in the article.
If that doesn't help you resolve your problem, feel free to post back with the specific error messages you're receiving and I'll try to help as best I can.
|
|
|
|

|
Hi, I have some files with a fixed number of rows at the top where there's some document titles and other random stuff. The real csv header always starts on row 7 and the data follows.
I've modified the code and inserted a new property SkipStartingHeaderRows and modified the ParseRowType method to skip header rows before evaluating m_blnFirstRowHasHeader. I have sample files for testing. Would I be able to submit this for consideration into the project?
Thank you.
|
|
|
|

|
Sure, you definitely can.
You can either email me via this message or upload them via GitHub[^], whichever suits you.
I'll review the changes and, if everything looks good, incorporate them and add you as a contributer.
Thanks.
|
|
|
|

|
Clear and concise. I would love Andrew to add Generic processing. What i mean is the use enumeration (or other methodology) to enable setting the reader based on some text token the semantics of which is the type of reader to process the input and return a dataset.
Much appreciated
|
|
|
|

|
Hi Andrew,
There seems to be a problem with the zip file of the source - neither WinZip nor WinRAR recognize it to be a valid file. Could you possibly post a working version?
Thanks so much,
Robert
|
|
|
|

|
Hi Andrew,
After using your excellent tool to import flat file (Adapter mode), I have noticed the following:
bugs? (using Adapter mode)
Rows with a blank are considered in the final result and displayed to the user.
Rows with smaller or larger number of columns to those in ExpectedColumnCount property are also displayed. Maybe a property named AddOnlyRowsFulFillExpectedColumns will be nice.
new functionality
Possibility of allowing the developer to assign a string array with the headers used in the table used to display data in adapter mode.
Create a collection with rows ignored in the importing process.
Thank you.
modified 5 Nov '12 - 14:39.
|
|
|
|

|
Hi Andrew,
I have just read your article, which is interesting. My question: once ploted, is it possible to save as well as print the chart? Does this offer only line charts or other chart types?
Thank you in advance for your reply
|
|
|
|

|
You may want to re-read the article one more time. The GenericParser is used to parse data from flat files and does not contain any charting capability.
The charts I presented in the article are used solely for a performance comparison and are generated using Excel.
|
|
|
|

|
thank you for this great tool. Really useful.
|
|
|
|
|

|
I've found that the utility does not work if the text file only has the header row, and no other rows. It thinks there are no columns. I'm hoping this can be fixed as I use the utility to determine the column order, even if there is no data in the file.
|
|
|
|

|
using (var sr = new StringReader("a,b,c,d"))
using (var gp = new GenericParser(sr))
{
gp.FirstRowHasHeader = true;
var result = gp.Read(); var columnNames = Enumerable.Range(0, gp.LargestColumnCount).Select(x => gp.GetColumnName(x)).ToArray(); }
Try that and let me know if you have any further problems.
|
|
|
|
 |
|
|
General News Suggestion Question Bug Answer Joke Rant Admin
|
GenericParser is a C# implementation of a parser for delimited and fixed width format files.
| Type | Article |
| Licence | CPOL |
| First Posted | 19 Sep 2005 |
| Views | 481,673 |
| Downloads | 11,555 |
| Bookmarked | 232 times |
|
|