|
I'm not convinced by your Regex - you're matching the literal string " -~" and capturing it to a group. To indicate a range of characters, you need some square brackets:
([\x20-\x7e])
(I prefer to use hex values, since it's more obvious that they're not decimal values; you don't have to remember that the default is octal.)
Since you just want to remove any characters that aren't in the specified range, the simplest approach is to negate the range and use the Replace method:
set
{
host = Regex.Replace(value, @"[^\x20-\x7e]", string.Empty);
}
However, for a simple case like this, Regex is probably overkill; you could simply loop through the characters, discarding any that you don't want:
set
{
if (!string.IsNullOrEmpty(value))
{
char[] validChars = new char[value.Length];
int validCharIndex = 0;
foreach (char c in value)
{
if ('\x20' <= c && c <= '\x7e')
{
validChars[validCharIndex] = c;
validCharIndex++;
}
}
if (validCharIndex == 0)
{
value = string.Empty;
}
else if (validCharIndex != value.Length)
{
value = new string(validChars, 0, validCharIndex);
}
}
host = value;
}
"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer
|
|
|
|
|
Richard,
thank you very much for your answer. I have developed the Regex a bit further, so it parses only matching strings, causing an InvalidOperationException in case the string contains any invalid characters. This behavior makes sense, since a hostname with removed characters may cause confusion.
I have also added a length maximum of 255 characters, and the string must have at least a single character.
Regex regex = new Regex(@"(^[\40-\176]{1,254}$)");
Match match = regex.Match(value);
if (match.Groups.Count == 2)
{
host = match.Groups[1].Value;
}
else
{
throw new InvalidOperationException(string.Format("The string {0} for host is misformatted. It may only contain printable US-ASCII characters.", value));
}
I am not fully convinced by your idea of iterating through the string - The RegEx makes the code easier to read, and easier to change if someone suddenly requires to allow special characters. In Addition to that, I fear a performance loss when iterating through the string.
Clean-up crew needed, grammar spill... - Nagy Vilmos
|
|
|
|
|
You'd have to test it, but I doubt a Regex is going to be faster than a simple loop through the string.
I'd be inclined to move the length test out of the Regex, and possibly negate the Regex test:
if (value == null)
{
throw new ArgumentNullException("value");
}
if (value.Length < 1 || value.Length > 254)
{
throw new ArgumentException("The host must be between 1 and 254 characters long.", "value");
}
if (Regex.IsMatch(value, "[^\40-\176]"))
{
throw new InvalidOperationException(string.Format("The string {0} for host is misformatted. It may only contain printable US-ASCII characters.", value));
}
host = value;
"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer
modified 7-Feb-14 8:14am.
|
|
|
|
|
Richard Deeming wrote: I'd be inclined to move the length test out of the Regex,
Why?
Clean-up crew needed, grammar spill... - Nagy Vilmos
|
|
|
|
|
Checking the length of a string with the Length property is almost certainly going to be faster than using a Regex. It also makes the code easier to decipher - you don't have to work out the Regex syntax to see what the length restrictions are - and allows you to generate more granular exceptions, rather than having a single exception if the value doesn't match the Regex.
"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer
|
|
|
|
|
Gotcha! Thank you very much
Clean-up crew needed, grammar spill... - Nagy Vilmos
|
|
|
|
|
Let me start by saying that I'm not the greatest with Regular Expressions, but from research it seems that they are the best way to go for what I need; I just don't fully understand how to use them with matching groups.
In my application I need to parse a boolean string that can have any number of parameters. The following are some examples:
Name=John or Name=Jane
Name=John or Name=Jane and Date=Now
I need to be parse any variation of these strings ('and' and 'or' are the only boolean operators that can be present) so that I get a list such as:
Name=John
or
Name=Jane
and
Date=Now
I don't need to parse the strings with the '=' sign, but I do need the 'or' & 'and' operators so when I build my query in code I know how to connect each statement to the next (ie:Name=John should be ORed with Name=Jane which should be ANDed with Date=Now) So far I have only been able to get a list such as
Name=John
Name=Jane
Date=Now
by using the following code:
Dim Pattern As String = "(\S+\x3D\S+)"
While Regex.Matches(query, Pattern).Count > 0
Dim oMatches As MatchCollection = Regex.Matches(query, Pattern)
For Each oMatch In oMatches
Dim Operand1 = oMatch.Groups(0).Value
Next
End While
but I lose the boolean operators in the process. If anyone could please help me with a regular expression so I would get the groups I have now, but with the operators in between the appropriate expressions, it would be greatly appreciated.
Thanks.
|
|
|
|
|
I'd suggest doing the matches in phases so that you can track operator presedence: and is higher presendence than or :
In fact you probably don't need to use Regex , String.Split should be able to do the job:
First divide the input based on 'and':
Dim andSeparators() As String = {" and "}
Dim andTerms() as String
andTerms = String.Split(andSeparators, query)
Next for each of the andTerms do a similar String.Split on " or "
|
|
|
|
|
You can use the alternation syntax "(a|b)". A match in your case is an expression in form "parameter=value", or an operator. Operator can be "or" or "and" and must have a space before and after.
Following pattern produces the results you wanted. For an input string "Name=John or Name=Jane and Date=Now" you get matches "Name=John", " or ", "Name=Jane", " and ", "Date=Now":
"(\w+=\w+|\s(or|and)\s)"
If you want to use regex for validation only, you can do it this way (note that you get only a single match with this pattern):
Regex.IsMatch(inputStr, "^(\w+=\w+|\s(or|and)\s)+$");
And you can go even further with validation. Following pattern uses positive look ahead/behind syntax to ensure that operators are enclosed with valid expressions:
Regex.IsMatch(inputStr, "^(\w+=\w+|(?<=\w+=\w+)\s(or|and)\s(?=\w+=\w+))+$");
Gabriel Szabo
modified 8-Nov-13 3:57am.
|
|
|
|
|
Given this input pattern \[.+" and this string w2rddddd["oQookkkkkk"]rrrrrrrrrrr the pattern returns w2rddddd["oQookkkkkk"]rrrrrrrrrrr, that is, up to and including the second double-quote. What I'd like to return is up to and including the first double-quote like w2rddddd["oQookkkkkk"]rrrrrrrrrrr
Clearly my pattern is incorrect but I can't see what I'm doing wrong. Anybody see it?
If there is one thing more dangerous than getting between a bear and her cubs it's getting between my wife and her chocolate.
|
|
|
|
|
I get ["oQookkkkkk" when I try that, which is what I expected. Maybe you want .+\[" ?
|
|
|
|
|
Oops. My bad...
I screwed up my original question. What I meant was, given
yakkety [ "123"]
I want to get
yakkety [ "123"]
That is, everything from the opening square bracket up to and including the first double quote only. There could be any amount of white space between the bracket and the quote. That's the only bit I'm after: square bracket through double quote. I can't work out the pattern that will give me just that.
If there is one thing more dangerous than getting between a bear and her cubs it's getting between my wife and her chocolate.
|
|
|
|
|
How about \[.*?"
\s rather than the dot.
|
|
|
|
|
Thanks, that worked just perfick!
If the solution had been a snake it could have bitten me. It was so obvious it slid straight past me. (slaps self in face).
If there is one thing more dangerous than getting between a bear and her cubs it's getting between my wife and her chocolate.
|
|
|
|
|
|
Just to complete things, I also wanted to pick up the other bit of text at the other end. That is, given:
yapyapyap ["fred"] moreyapyap I want to get the closing double quote through the closing square bracket whether there's white space between them or not. I did that using alternation as in this regular expression:
("]|"\s*])
It probably doesn't teach anyone anything they don't know already but I thought it would be worth a mention if only to complete what I need to do.
If there is one thing more dangerous than getting between a bear and her cubs it's getting between my wife and her chocolate.
|
|
|
|
|
Given an input string like this:
Some text containing % complete and not much else
If I use a regex of \bcomplete\b I can find the whole word complete just fine. What I want to do is find % complete. I've tried \b% complete\b and \b\% complete\b and \b\x25 complete\b and other variations I can think of but I can never get it to select % complete. Does anyone know if it's possible to do it and how? I can find nothing anywhere that says % cannot participate in an expression.
I've tried two different apps, such as RegexBuilder and RegexBuddy but I can't get it working in either. Does anyone have any ideas? Thanks.
If there is one thing more dangerous than getting between a bear and her cubs it's getting between my wife and her chocolate.
|
|
|
|
|
The % character won't work with the \b anchor. From MSDN:
The \b anchor specifies that the match must occur on a boundary between a word character (the \w language element) and a non-word character (the \W language element). Word characters consist of alphanumeric characters and underscores; a non-word character is any character that is not alphanumeric or an underscore. The match may also occur on a word boundary at the beginning or end of the string.
If you're using .NET, you could try something like:
((?<=\W)|^)%\s+\bcomplete\b
"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer
|
|
|
|
|
Richard, I will try your example. I did though find the answer and indeed, the % has to be outside the \b symbols.
So, %~\bcomplete\b worked but \b%~complete\b did not. I use the tilde ~ to illustrate where a space character would be but it's not part of the expression itself.
If there is one thing more dangerous than getting between a bear and her cubs it's getting between my wife and her chocolate.
|
|
|
|
|
Have you tried:
%\scomplete
The universe is composed of electrons, neutrons, protons and......morons. (ThePhantomUpvoter)
|
|
|
|
|
OG, that works partly fine but it will also detect % completehorses whereas I must only find the whole word % complete only.
If there is one thing more dangerous than getting between a bear and her cubs it's getting between my wife and her chocolate.
|
|
|
|
|
Sorry!
Did you try combining it with your solution?
%\s\bcomplete\b
The universe is composed of electrons, neutrons, protons and......morons. (ThePhantomUpvoter)
|
|
|
|
|
OG, the background is I've been assigned the wonderful job of localising the strings in our app. In total, we have about 23,000 split over several projects. I'm using an extractor tool called lingobit and it lets you put in filters to eliminate specific strings. Once I've harvested the strings and the regex format you run the scan and it discards all strings that match the regex. In theory, you are then left with the strings to be localised and it will assign names to them and change the code while creating the resx file. There are a number of projects to be converted. Once I've eliminated the strings we don't want to convert I can then reuse the regex pattern on another project and add any additional patterns as I go along.
In effect then, the regex string is input to a third-party app. It just takes a while to harvest all the things that musn't be converted.
If there is one thing more dangerous than getting between a bear and her cubs it's getting between my wife and her chocolate.
|
|
|
|
|
By the way, the thing I found most terrible when doing globalization/localization was string concatenation. I.e. something like
label1.Text = "Progress: " + n + " % complete";
instead of
label1.Text = string.Format("Progress: {0} % complete", n);
The translators will need the whole phrase for a useful translation, not single words, I had to change a few hundred such concatenations.
|
|
|
|
|
F*** off! Tell me it ain't so, Bernard.
Seriously, I have the same problem. There are few string.Format calls and 98% is concatenation. The context of the strings is the problem. I mean, there are sql statements containing strings like " and " and elsewhere there are messages boxes with " and ". There was one bastard of a string set to "Enter the " + something else and further on was something like reply.StartsWith("Enter the credit"). What a***hole would produce such awful stuff? And no, it wasn't me.
If there is one thing more dangerous than getting between a bear and her cubs it's getting between my wife and her chocolate.
|
|
|
|