To parse source code, you should:
- treat the whole file as "singleline", i.e. the Regex should not treat new lines as special characters
- allow whiespaces between language tokens (all these
\s*
look a bit ugly, but is necessary to catch all entries) - carefully tokenize, e.g. if you only expect not escaped identifier[^], use
\w+
, otherwise you need to be more creative ;-)
string text = "...";
string pattern = @"\bpublic\s+\w+\s*\.\s*def\s*\.\s*ghi\s*\.\s*(\w+)\s+(\w+)\s*;";
foreach(Match m in Regex.Matches(text, pattern, RegexOptions.Singleline))
{
Console.WriteLine("1. {0}", m.Groups[1].Value);
Console.WriteLine("2. {0}", m.Groups[2].Value);
}
If you have multiple "Word." layers (e.g.
A.B.C.def.ghi...
), you may extend the pattern as follows:
string pattern = @"\bpublic\s+(?:\w+\s*\.\s*)+def\s*\.\s*ghi\s*\.\s*(\w+)\s+(\w+)\s*;";
Cheers
Andi