It's easy to define the valid lines, but slightly more difficult to define the invalid lines.
So, you define the (in pseudo code)
line = valid or invalid
.
Even it sounds trivial, this covers all the lines.
Loop over all lines and process the matches where the invalid part is available.
E.g.
string filePath = @"...";
string data = File.ReadAllText(filePath);
string linePattern = @"^(?:\d{10}\|.*|(.*))$";
var invalidLines = from m in Regex.Matches(data, linePattern, RegexOptions.Multiline).Cast<Match>()
where m.Groups[1].Success
select m.Groups[1].Value;
foreach(string invalidLine in invalidLines)
{
}
Since the above regex pattern is greedy, it takes the first match, which is either the valid one (
\d{10}\|.*
) or the invalid one (
.*
). The two are separated by the or operator (
|
).
To get access to the invalid data, it is enclosed in parenthesis (
(.*)
).
To limit the pattern to a line each, the whole pattern is enclosed in
^...$
and grouped by a non-referencing group (
(?:...)
, i.e. a group that does not count in the
Groups
array of a match.
Putting it all together results in
^(?:\d{10}\|.*|(.*))$
.
Note that the match option is set to
Multiline
to give the
^
and
$
the needed meening: begin/end of each line (where in
Singleline
mode the
^
and
$
would mean begin/end of the whole string).
Cheers
Andi