|
Newbie - any updates regarding use of ADO.NET 2.0? I will try using code now, but I am not too familiar with ADO yet. Would there be any improvements to code with the use of ADO.NET 2.0?
|
|
|
|
|
Possibly but it will run fine on 2.0 as is. Remember it's only a sample, something designed for understanding the methods in question. If you just want to import delimited text files there are several excellent libraries on CodeProject that are pre-built and very efficient for doing that such as FileHelpers.
|
|
|
|
|
It has been great for me to use the sample you have provided. I am trying to learn C# and I am trying to think of projects I can do to learn it! So viewing your sample has been a great experience. A good task might be for me to see if there are any changes to upgrade ADO 2.0!
The comments are excellent and really have help also!
|
|
|
|
|
This is the first hit I got for parsing delimited files in C#, and will be the last I go...this is exactly what I needed. I've just gotta say nice work with your commenting. As a heavy commenter myself I really appreciate gaining a firm grasp of what is going on through commenting.
Sharpmike
|
|
|
|
|
I am a newbie and was wondering how you would modify this program to parse out certain data in a text file and out putting it into another text file.
|
|
|
|
|
Hello, the code illustrates two ways of parsing a text file, one is to an xml document the other to a dataset, either one could be saved to file.
If you mean extracting certain fields and then rewriting out another delimited text file with just those fields, I would probably use the dataset example as a starting point to write a program that allowed a user to scan the first row of the input data, use those fields or field names as a basis to build a user interface selection row for the user so they can select the fields they want to import, probably I would just import all the data without skipping any fields into a dataset as in the example, then when writing it back out to a text file would be the point I would skip the unrequired fields. That way you can check that the data source is "clean" i.e. self consistent, no missing fields etc without trying to pick and choose out the fields in the import part of the process. (unless of course we're talking about gigabyte sized files or something)
Although if you are in the habit of working with data in text files then you may want to consider the XML version since that's a natural fit for textual data storage.
If you are looking at writing an application that will repeatedly deal with the same format text file and repeatedly write out another format text file as a result then you could probably use much of the code in the example with very little changes other than redefining the fields as you require them from my old apache web log fields to whatever works for you see comments in source code about doing exactly that.
If you just need to do ad-hoc data transformation of text files I wouldn't bother writing a program for that, there are so many out there, Microsoft Access is excellent for general purpose data conversion.
|
|
|
|
|
Hi,
Will this tool convert a CSV comma delimited file
to a text file? Some of the fields in the cvs file
do not contain data. Will the offsets be correct?
On a project attempting same.
Thanks.
|
|
|
|
|
Hi Boomer, well, for starters it's not really a tool, it's a technique you can adapt to your situation, if your expecting a tool that you can run to immediately do what you want it won't do that.
If a CSV file contains no data in some fields that's no problem depending upon how you code it, it's looking for the delimiter so as long as the empty fields are still delimited then you're ok. If they are not delimited then there is nothing in the world that is going to make that ok because there is no way of any software telling what goes in what field unless all the missing delimiters are at the right side of the line of data in which case you could code it so that when it comes to a carriage return or line feed character at the end of the line and hasn't counted enough fields it can substitute empty data for the missing fields.
If, however, there is a situation where a comma was missing early in the line of fields but then there was more data after it, that's a mess that is probably next to impossible to deal with in any automated way.
"A preoccupation with the next world pretty clearly signals an inability to cope credibly with this one."
|
|
|
|
|
Thanks John,
I am going to look at the code to see
if I can use it as a preprocessor.
Isaac
|
|
|
|
|
as u said, u hv got good xperience in C/C++....that is reflected
in this program even though u just started with C#,,when u wrote this program.
This code will b useful to me for sure!..
Thanks a lot.
Srinivas Varukala
virginia, USA.
nivas.org
|
|
|
|
|
That code was very helpful.
|
|
|
|
|
Curious as to why you went through the trouble of regular expressions to parse the file when an Ole connection object would have done the samething. For instance:
<br />
OleDbConnection conn = new OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\\;Extended Properties='text;HDR=No;FMT=Delimited'");<br />
OleDbDataAdapter da = new OleDbDataAdapter("SELECT * FROM ["+ dataFile + "]", conn);<br />
<br />
DataSet ds = new DataSet("File");<br />
da.FillSchema(ds, SchemaType.Source, dataFile);<br />
da.Fill(ds, dataFile);<br />
Would have done the samething...
Fear not my insanity, fear the mind it protects.
|
|
|
|
|
Probably because I didn't know that was a possibility.;)
I'll have to experiment with that the next time I need to do something similar, however, I would be concerned with doing it that way in a case of an improperly formatted source file, doing it with regular expressions and looping allows for handling malformed "rows" in the source file, what would happen with the above if a "row" had an extra field or there was an extra cr/lf at the end of a line?
There is much to be said in favor of modern journalism. By giving us the opinions of the uneducated, it keeps us in touch with the ignorance of the community.<br />
- Oscar Wilde<br />
|
|
|
|
|
If the data file was malformed it would throw an error, of which you would need to catch and handle accordingly. I can see though a time when either method might be used. Yours would be perfect for hand written data files or when you need extra control of the data parsing for one reason or another. Using the ADO tends to lock you down a bit too much at times I'll admit.
|
|
|
|
|
How do i convert tab delimited files using this code? I tried using:
r = new Regex("\\s(?=([^\"]*\"[^\"]*\")*(?![^\"]*\"))");
for my tab delimited files and it doesnt work unless each field is enclosed in quotes. In my file, each field is separated only by tabs and no quotation marks are used as "text qualifiers".
I tried using \t also, instead of \s, but it doesn't work.
HELP, anyone?
|
|
|
|
|
I should really update this article, the original method can be simplified greatly. The method in the article relies on finding the spaces between the fields whereas the easier way is to simply return the words that are separated by the separator characters.
Try this:
\S+
as your expression.
In my code I use it like this:
Regex r = new Regex(<br />
@"\S+",<br />
RegexOptions.IgnoreCase<br />
| RegexOptions.Multiline<br />
| RegexOptions.IgnorePatternWhitespace<br />
| RegexOptions.Compiled<br />
);
Which you can use with a matchcollection like this:
MatchCollection mc=r.Matches(INPUTSTRING);<br />
foreach (Match m in mc)<br />
{ <br />
string sTemp=m.Value;
}
Also, if you don't have it already, Expresso is your friend in these matters:
http://www.codeproject.com/dotnet/expresso.asp?target=expresso[^]
CLIP CLOP CLIP CLOP BANG! CLIP CLOP - Amish Drive-by shooting<br />
|
|
|
|
|
thanks for the quick reply, i still have the same problem though.
i have a row of data, fields are separated by tabs, for example,
henry stark <tab> 34, Santa Monica <tab> 11/24/1966 <tab> 556256
if i use \S+, it will consider the spaces as field separators also, instead of the tabs only, so i get 7 fields instead of only 4. still, using \t doesn't work.
why isn't \t recognized? HELP?
|
|
|
|
|
juliusPH wrote:
why isn't \t recognized?
It *is* recognized, problem is that it's matching \t's instead of anything that is NOT \t. You could use the technique in the article with that regex because the article shows a method that works by matching to the separator character rather than just extracting the text you want.
That's why the S is capitalized because it's a match if absent, not match if included. Because there is no capital T equivalent for tabs you need to get a bit more complex.
What you want is this:
[^\t^\n^\r]+
I.E. match one or more (+) occurences of any character but \t or \n or \r. (assuming your file has cr/lf at the end of each record)
In code format:
Regex regex = new Regex(<br />
@"[^\t^\n^\r]+",<br />
RegexOptions.None<br />
);
CLIP CLOP CLIP CLOP BANG! CLIP CLOP - Amish Drive-by shooting<br />
|
|
|
|
|
Oh, I meant ^\t
It's working now! It didn't work before with only ^\t and the RegEx options that I used!
Thanks a lot!
|
|
|
|
|
This works great unless there are consecutive tabs i.e one of the fields is missing on the line so we might have:
henry white <tab><tab> 11/24/1966 <tab> 556256
instead of:
henry stark <tab> 34, Santa Monica <tab> 11/24/1966 <tab> 556256
Then the missing field gets overwritten by the next field and you get a null value in the bound datagrid for the last field because everything got "bumped up" by one. If I import a tab delimited text file into Excel it will recognize this and preserve the "empty" field and I was looking for something similar here. Is it in the reg expression?
Thanks for any help you can provide.
|
|
|
|
|
In that case you would probably want to do it the original way in the article which is to look for the separator character instead of the text between separators.
I wouldn't be at all surprised if there is a Regex way to resolve the missing field issue but I don't know the answer.
|
|
|
|
|
Solution is using split on datagrid rows.
First split by rows and then split rows with tabs. This gets all the empty cells as empty string as well.
string[] s = sClipreply.Split(new string[] {"\r\n"}, StringSplitOptions.None);
foreach (string str in s)
{
string[] s2= str.Split(new string[]{ "\t" },false);
}
|
|
|
|
|
Did you get this done ?
I have the same problem with missing fields gets overwritten on datagrid control.
Thx.
|
|
|
|
|
Did you get this done ?
I have the same problem with missing fields gets overwritten on datagrid control.
Thx.
|
|
|
|
|
There is nothing to "get done":
What I wrote was: "In that case you would probably want to do it the original way in the article which is to look for the separator character instead of the text between separators."
If you do it the way it is originally written in the article instead, you have full control over handling missing separators yourself.
Not to sound too harsh here, but I'm not sure why people keep asking about this because there is obviously no magic way to handle badly formed source data.
If the input data is missing some fields, it's just missing some fields, nothing you can do about that but program to handle that situation by counting the number of fields you expect as you parse each line of text and if some fields are missing you need to deal with it in your code. Either abort and warn the user or fill in something yourself.
The original way the parsing was done in the article is very handy for handling the missing fields situation. If your input data is known to be consistent with no missing fields then you can use the much faster second method I mentioned in these threads.
An election is nothing more than the advanced auction of stolen goods.<br />
- Ambrose Bierce<br />
|
|
|
|