|
Good one. Now we all know a bit more about a different regex engine (Emacs/sed).
Cheers,
Peter
Software rusts. Simon Stephenson, ca 1994. So does this signature. me, 2012
|
|
|
|
|
I have this string:
var Line = "Dev: 0 Model: TOSHIBA MK3265GSX Serial: 20FDF20WS FW: GJ002H STW: 0 MaxLBA: 625142447 FDESUPPORTED: 0 PREBOOT: 0 DRIVETRUSTENABLED: 0 DRIVETRUSTSUPPORTED: 0 w128: 41 FULLFW: GJ002H SERVOFW: SDLSUPPORTED: 1 PLATFORM: 0 SAFE: 1 DSTTIMEOUT: 103 ISBOOTORSYSTEM: 1";
I am trying to use RegEx to extract all Key/Value pairs. So far I have:
private static string getWord(ref string Line, string WordToFind)
{
var retVal = string.Empty;
if (Line.Contains(WordToFind))
{
var expression = @"(?:(?'Key'\S+): (?'Value'.*?)) ";
Match match = Regex.Match(Line, expression,RegexOptions.IgnoreCase);
if (match.Success)
{
string key = match.Groups[1].ToString();
}
}
return retVal;
}
When I pass in "Model:" I get back "Dev". What am I doing wrong here?
If it's not broken, fix it until it is
|
|
|
|
|
Is this C#? I've not seen that method of creating named capture groups. Though, if you are going to name your capture groups, you may as well retrieve them by name rather than by index.
Also, you only check if the line contains the string to find. Nowhere in your code do you actually look for that as a key or value.
And consider that regular expressions are greedy. Your "Value" capture group will capture everything after the first ": ".
Also, according to your regular expression, you can have an empty value ("*" means zero or more). I suspect that is not what you want, but maybe you do.
Finally, I'm not really sure how "SERVOFW: SDLSUPPORTED: 1" fits into the key/value scenario. Maybe that's a typo.
Based on the assumption that values can have spaces, but keys can't have spaces, this may be a more appropriate regular expression:
(?<KEY>(?!:| ).)+: (?<VALUE>((?!:).)+)(?= (((?!:| ).)+:)|$)
Keep in mind that this will find all matches, so you don't watch to just find the first match as you are doing in your example. You want to loop through them all and compare the key/value against the word you are looking for.
modified 19-Jun-12 18:41pm.
|
|
|
|
|
AspDotNetDev wrote: I'm not really sure how "SERVOFW: SDLSUPPORTED: 1" fits into the key/value scenario
Now that I think about it, "SERVOFW" is probably a key with an empty value. That'd take a bit more time to figure out. I'll leave that as an exercise for the reader.
|
|
|
|
|
I'm piecing toget code formv various places becuase it all looks like greek to me. I'm fairly frustrated at this point. This shouldn't be this difficult.
Anyway, here's my code:
if (Line.Contains(WordToFind))
{
Regex r = new Regex(
"(?:(?'Key'\\S+): (?'Value'.*?))",
RegexOptions.RightToLeft
| RegexOptions.CultureInvariant
| RegexOptions.Compiled
);
Match m = r.Match(WordToFind);
if (m.Success)
{
}
}
I get no matches for the word "Model".
Also, once I get a match, how do I get the Key/Value data???
If it's not broken, fix it until it is
|
|
|
|
|
Why did you edit your response? The code I see in the email notification I was sent for your reply to my message looks more correct than the code I now see.
For one, you're going to need a for loop to iterate over the result of Matches (not Match). Match finds a single result, and Matches finds all results.
For two, r.Match(WordToFind) is searching "WordToFind" rather than "Line".
Kevin Marois wrote: once I get a match, how do I get the Key/Value data
You can get groups based on the name. Something like this (I don't have my compiler open, so it may vary slightly):
String key = m.Groups["Key"].Value;
String value = m.Groups["Value"].Value;
Kevin Marois wrote: RegexOptions.RightToLeft
This may drastically change the functionality, but I'm not exactly sure what it does, so you may want to Google this.
Kevin Marois wrote: I'm fairly frustrated at this point. This shouldn't be this difficult.
Regular expressions are complicated, but powerful. You'll get used to them over time. From what others say, Expresso is a good tool to learn regular expressions, though I use a custom tool I built for myself.
|
|
|
|
|
pretty new to regex. this is a great article.
can someone tell me whats wrong with my expression? I'm using this from C#. I'm getting the response from a blog in a malformed xml format. Need to extract entries out of it. In a simple format inpu is similar to the following.
string input = @"<entry><id>tag:myblog.com try</entry><entry><id>tag:myblog.com tryagain</entry><entry><id>tag:myblog.com hello </enty>";
I need to identify the number of entries, and then processing each of them.
Regex blogsRegEx = new Regex(@"<entry><id>tag:myblog.*</entry>");
MatchCollection blogEntries = blogsRegEx.Matches(input);
I always get just 1 entry. It matches the whole thing instead of matching multiple strings in the pattern <entry<id>tag:myblog....</entry>.
Can someone help what am I missing here? do i need to use subexpressions here?
modified 29-Apr-12 19:28pm.
|
|
|
|
|
Not sure about the regex (It makes my brain hurt), but you can use linq instead
string findText = @"<entry><id>tag:myblog";
int entriesCount = blogsText.Count(t => t.equals(findText));
Have not tested it but it should give you the correct results by just tweaking your findText variable.
|
|
|
|
|
By default, regex matching is greedy. That is, wildcards will match the longest possible chunk of input, so you need to make your .* non-greedy. You do that by putting a ? after it, so your line becomes
Regex blogsRegEx = new Regex(@"<entry><id>tag:myblog.*?</entry>");
If you are going to do ANYTHING nontrivial with regexes, get a copy of Expresso. (See our Free Tools forum for details.)
Cheers,
Peter
Software rusts. Simon Stephenson, ca 1994.
|
|
|
|
|
I try to get the date number format from a date format string. The date number format is - if present - either a single d or a double d. But the whole string can contain a place holder for the dayname also - ddd or dddd. How can I get the d or dd, if present?
I tried [^d](?<pattern>d{1,2})[^d] which works well with the standard German format (dddd, d. MMMM yyyy ). But that fails when the daynumber comes first (say: d. dddd MMMM yyyy ; here, ^(?<pattern>d{1,2})[^d] does the trick), and it fails when the daynumber comes last.
Actually I need something like a "either start of the string or not a d", i.e. ^|[^d] - but that does not work. How can that be solved?
|
|
|
|
|
Solved it with @"\b(?<pattern>d{1,2})\b"
|
|
|
|
|
Given the name of an SQL table, I am looking to add quotes as necessary (with an loose definition of "necessary"). With SQL Server, the parts of the name should be wrapped in brackets ([ and ] ), MySQL uses backticks (` ). So, for example:
database.schema.table
database.schema.[table]
database.[schema].table
[database].[schema].table
should all be transformed to:
[database].[schema].[table]
What I have working; is capturing Wrapped and Unwrapped sections separately, wrapping the Unwrapped sections, and joining the sections back together. But it occurred to me that if I could Match only the Unwrapped sections, I could use Replace.
However, I have so far been unsuccessful in my attempts (otherwise I wouldn't be posting). Does anyone out there have an idea of how to do this? I'm thinking it may involve Balancing Groups, but I've never used them before so I'm finding them confusing.
This is not urgent.
Edit: I must have been over-thinking it . What I have now is (?<=^|\.)[^\[\]\.]+(?=$|\.)
modified 24-Jan-12 11:00am.
|
|
|
|
|
Are you looking for this?
string sql = @"database.schema.table
database.schema.[table]
database.[schema].table
[database].[schema].table
should all be transformed to:
[database].[schema].[table]
";
string pattern = @"(?:\[?(\w+)\]?)?\.\[?(\w+)\]?";
Func<Match, string> replace = m =>
(m.Groups[1].Success ? "[" + m.Groups[1].Value + "]" : "") + ".[" + m.Groups[2].Value + "]";
Console.WriteLine("{0}", Regex.Replace(sql, pattern, m=>replace(m)));
|
|
|
|
|
Does that work for names that contain SPACEs? my database.my schema.my table
And Excel worksheet names that include a dollar sign ($) at the end?
(I realize those were not listed in the original spec.)
|
|
|
|
|
You will run into problems here.
The problem arises with spaces in the name, since the following pattern:
...([\w\s]+)...
matches
delete my database
as well as
delete database
In this case I guess you don't get away without a parser (use the Regex for tokenizing, use the parser to detect all commands and translate the arguments where needed).
Any names without spaces get easily translated, though, e.g.:
string pattern = @"(?:\[?([\w\$]+)\]?)?\.\[?([\w\$]+)\]?";
And if you have optional spaces around "[" and ".", the following a bit more complicted regex will do:
...
string open = @"(?:\[\s*?)?";
string close = @"(?:\s*?\])?";
string ident = @"([\w\$]+)";
string prefix = @"(?:" + open + ident + close + @"\s*?)?";
string suffix = @"(?:" + open + ident + close + @")";
string pattern = prefix + @"\.\s*?" + suffix;
...
|
|
|
|
|
Andreas Gieriet wrote: matches
delete my database as well
as
delete database
I expect the string to contain only the database, schema, and table names.
|
|
|
|
|
The Regex sees a line like
aaa bbb ccc . ddd . eee fff
What part of aaa bbb ccc is the database name? Only ccc or bbb ccc , etc.? You see the problem?
The same for eee fff .
Non-escaped/non-wrapped spaces in names is guess work to make them wrapped into [...] .
I.e. to get from aaa bbb ccc .... to aaa [bbb ccc] .... is rather difficult, unless you know what aaa means or you say from outside that bbb ccc is a single name.
Quite a challenge.
Cheers
Andi
|
|
|
|
|
That should result in
[aaa bbb ccc ].[ ddd ].[ eee fff]
|
|
|
|
|
The line
aaa bbb ccc.ddd eee ...
could be
ALTER TABLE dbo.tVersion ADD ...
which in your approach would result in
[ALTER TABLE dbo].[tVersion ADD] ...
Forget about spaces or get as input the individual names (db name, table name, etc.) or make a parser that detects all language constructs and their db, table, etc. positions...
I still think it's not worth the effort with names that contain spaces - too fragile.
Cheers
Andi
|
|
|
|
|
No, the string contains only the database, schema, and table name separated by periods as per the original post.
|
|
|
|
|
I was confused since I understood (say: assumed...) that you have an SQL script that you want to patch... Never assume anything
In that case your initial regex is probably the simplest solution.
Cheers
Andi
|
|
|
|
|
Wait till Smitha tackles that post!
|
|
|
|
|
Hi Luc,
Aaaah! You read my whine...
I was quite upset - but I cooled down again
Cheers
Andi
[Edit] PS: ...and the tip is reverted to the "original" state again... [/Edit]
modified 10-Apr-12 16:05pm.
|
|
|
|
|
So here I was, trying to figure out why I was having problems with my ASP RegEx validator control. The objective was to match to an IP address, but to allow it to end in "*" in the final octets (that is, 255.255.255.255 is valid, and so is 255.255.255.* or 255.255.*.* or 255.*.*.*).
But the validation was failing for 0.0.0.200, or anything over 199 in the last octet.
Here is what I had:
ValidationExpression="((1?\d?\d)|(2[0-4]\d)|(25[0-5]))\.(([*]\.[*]\.[*])|(((1?\d?\d)|(2[0-4]\d)|(25[0-5]))\.(([*]\.[*])|(((1?\d?\d)|(2[0-4]\d)|(25[0-5]))\.([*]|((1?\d?\d)|(2[0-4]\d)|(25[0-5])))))))"
Here is the fix:
ValidationExpression="((25[0-5])|(2[0-4]\d)|(1?\d?\d))\.(([*]\.[*]\.[*])|(((25[0-5])|(2[0-4]\d)|(1?\d?\d))\.(([*]\.[*])|(((25[0-5])|(2[0-4]\d)|(1?\d?\d))\.([*]|((25[0-5])|(2[0-4]\d)|(1?\d?\d)))))))"
By reversing the order (priority) of the octet matches, I solved the problem...
|
|
|
|
|
You might consider making it more readable, e.g.:
string b = @"25[0-5]|2[0-4]\d|1?\d?\d";
string n = @"(?:"+b+@")";
string w = @"(?:\*|"+b+@")";
string d = @"\.";
string ip = n+d+w+d+w+d+w;
Cheers
Andi
|
|
|
|