Click here to Skip to main content
11,433,220 members (63,170 online)
Rate this: bad
good
Please Sign up or sign in to vote.
See more: Visual-Studio , +
My file has data with each line starting with a specific pattern

1000000179|abcd.....
1000000180|wedwedw...
1000000181|wnewedwed...


there are 10 numerals followed by a pipe.

How to find/replace lines that DO NOT have this pattern.. Eg.. the second line below is invalid


1000000179|abcd.....
%d20000180|wedwedw...
1000000181|wnewedwed...
Posted 28-Mar-13 2:52am
Mohan M368
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 5

What you need is called negative lookahead.
The expression will be this one: ^(?!\d{10}).{10}\|.*$ (with MultiLine option).
Please note it's logic: the first formula is a lookahead that checks for substrings that do not match the "ten digits" pattern. The rest is a general pattern that allows both good and bad strings with the pipe at the 11th position.
For more details about lookaround, read this article[^].
  Permalink  
Comments
Brian A Stephens at 8-Apr-13 21:31pm
   
Ah, yes: negative lookahead; that's the elegant way to do it. However, the regex you provided doesn't match lines that fail the requirement of a pipe in the 11th position. With a slight modification, it will match those too: ^(?!\d{10}\|).*$
Zoltán Zörgő at 9-Apr-13 2:18am
   
You'r right. I haven't tested it with more input.
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 4

The challenge here, as Mohan implied, is finding the "negative" match. It's straightforward to find a line that starts with 10 digits and a pipe, but how do you find one that doesn't?

Here's a regex that will do it:
^([^|]*[^|\d][^|]*\||.{10}[^|])
It matches only the last 4 of these input lines:
1000000179|abcd.....|abc|
1000000180|wedwedw...|234|
1000000179|abcd.....
%d20000180|wedwedw...
3214a23642|abcd
123456789|whatever
1234567890_abcde

Breaking down the regex, it's looking for one of two conditions at the beginning of a line:
1) Any non-digit before the first pipe ( [^|]*[^|\d][^|]*\| )
2) Any non-pipe character in the 11th position ( .{10}[^|] )
  Permalink  
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 3

The pattern to check for a valid line would be :

string pattern = @"^\d{10}\|.*$";

Which would match if the line is beginning with 10 digits, followed by a pipe, followed by any number of any character.
  Permalink  
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 1

Just test for non-numeric characters in the first 10.
  Permalink  
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 2

See regex for VALID line:
^\d{10}\|
  Permalink  
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 6

It's easy to define the valid lines, but slightly more difficult to define the invalid lines.
So, you define the (in pseudo code) line = valid or invalid.
Even it sounds trivial, this covers all the lines.

Loop over all lines and process the matches where the invalid part is available.
E.g.
string filePath = @"...";
string data = File.ReadAllText(filePath);
string linePattern = @"^(?:\d{10}\|.*|(.*))$";
var invalidLines = from m in Regex.Matches(data, linePattern, RegexOptions.Multiline).Cast<Match>()
                   where m.Groups[1].Success
                   select m.Groups[1].Value;
foreach(string invalidLine in invalidLines)
{
   //...
}

Since the above regex pattern is greedy, it takes the first match, which is either the valid one (\d{10}\|.*) or the invalid one (.*). The two are separated by the or operator (|).
To get access to the invalid data, it is enclosed in parenthesis ((.*)).
To limit the pattern to a line each, the whole pattern is enclosed in ^...$ and grouped by a non-referencing group ((?:...), i.e. a group that does not count in the Groups array of a match.

Putting it all together results in ^(?:\d{10}\|.*|(.*))$.
Note that the match option is set to Multiline to give the ^ and $ the needed meening: begin/end of each line (where in Singleline mode the ^ and $ would mean begin/end of the whole string).

Cheers
Andi
  Permalink  
v2

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



Advertise | Privacy | Mobile
Web03 | 2.8.150428.2 | Last Updated 8 Apr 2013
Copyright © CodeProject, 1999-2015
All Rights Reserved. Terms of Service
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100