Click here to Skip to main content
Rate this: bad
good
Please Sign up or sign in to vote.
See more: Visual-Studio , +
My file has data with each line starting with a specific pattern
 
1000000179|abcd.....
1000000180|wedwedw...
1000000181|wnewedwed...
 

there are 10 numerals followed by a pipe.
 
How to find/replace lines that DO NOT have this pattern.. Eg.. the second line below is invalid
 

1000000179|abcd.....
%d20000180|wedwedw...
1000000181|wnewedwed...
Posted 28-Mar-13 2:52am
Mohan M361
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 5

What you need is called negative lookahead.
The expression will be this one: ^(?!\d{10}).{10}\|.*$ (with MultiLine option).
Please note it's logic: the first formula is a lookahead that checks for substrings that do not match the "ten digits" pattern. The rest is a general pattern that allows both good and bad strings with the pipe at the 11th position.
For more details about lookaround, read this article[^].
  Permalink  
Comments
Brian A Stephens at 8-Apr-13 21:31pm
   
Ah, yes: negative lookahead; that's the elegant way to do it. However, the regex you provided doesn't match lines that fail the requirement of a pipe in the 11th position. With a slight modification, it will match those too: ^(?!\d{10}\|).*$
Zoltán Zörgő at 9-Apr-13 2:18am
   
You'r right. I haven't tested it with more input.
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 4

The challenge here, as Mohan implied, is finding the "negative" match. It's straightforward to find a line that starts with 10 digits and a pipe, but how do you find one that doesn't?
 
Here's a regex that will do it:
^([^|]*[^|\d][^|]*\||.{10}[^|])
It matches only the last 4 of these input lines:
1000000179|abcd.....|abc|
1000000180|wedwedw...|234|
1000000179|abcd.....
%d20000180|wedwedw...
3214a23642|abcd
123456789|whatever
1234567890_abcde
 
Breaking down the regex, it's looking for one of two conditions at the beginning of a line:
1) Any non-digit before the first pipe ( [^|]*[^|\d][^|]*\| )
2) Any non-pipe character in the 11th position ( .{10}[^|] )
  Permalink  
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 3

The pattern to check for a valid line would be :
 
string pattern = @"^\d{10}\|.*$";
 
Which would match if the line is beginning with 10 digits, followed by a pipe, followed by any number of any character.
  Permalink  
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 1

Just test for non-numeric characters in the first 10.
  Permalink  
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 2

See regex for VALID line:
^\d{10}\|
  Permalink  
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 6

It's easy to define the valid lines, but slightly more difficult to define the invalid lines.
So, you define the (in pseudo code) line = valid or invalid.
Even it sounds trivial, this covers all the lines.
 
Loop over all lines and process the matches where the invalid part is available.
E.g.
string filePath = @"...";
string data = File.ReadAllText(filePath);
string linePattern = @"^(?:\d{10}\|.*|(.*))$";
var invalidLines = from m in Regex.Matches(data, linePattern, RegexOptions.Multiline).Cast<Match>()
                   where m.Groups[1].Success
                   select m.Groups[1].Value;
foreach(string invalidLine in invalidLines)
{
   //...
}
 
Since the above regex pattern is greedy, it takes the first match, which is either the valid one (\d{10}\|.*) or the invalid one (.*). The two are separated by the or operator (|).
To get access to the invalid data, it is enclosed in parenthesis ((.*)).
To limit the pattern to a line each, the whole pattern is enclosed in ^...$ and grouped by a non-referencing group ((?:...), i.e. a group that does not count in the Groups array of a match.
 
Putting it all together results in ^(?:\d{10}\|.*|(.*))$.
Note that the match option is set to Multiline to give the ^ and $ the needed meening: begin/end of each line (where in Singleline mode the ^ and $ would mean begin/end of the whole string).
 
Cheers
Andi
  Permalink  
v2

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
0 OriginalGriff 8,284
1 Sergey Alexandrovich Kryukov 7,327
2 DamithSL 5,614
3 Manas Bhardwaj 4,986
4 Maciej Los 4,920


Advertise | Privacy | Mobile
Web03 | 2.8.1411023.1 | Last Updated 8 Apr 2013
Copyright © CodeProject, 1999-2014
All Rights Reserved. Terms of Service
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100