|
|
|
Richard Deeming wrote: or a Vulcan mating ritual.
Are you sure? There probably could be more than one but I suspect the names are going to be pretty unique.
|
|
|
|
|
It's simply not possible. There is no expression that will be able to tell you whether the name is a person or a city. NONE AT ALL.
If you're trying to extract the names from the XML file, you DO NOT USE A REGEX FOR THIS. You create classes to hold each type of record and deserialize the XML into a data structure using those classes.
But, since you're get both city names and person names in the same record type (whatever "ab" means), there is no code you could ever write to tell you whether that is a person or a city.
|
|
|
|
|
Based on your description, it would seem that New York is not a valid name for either a person or a city. I'm pretty sure it is a city. So is Stoke-on-Trent . There's probably other names for both people and cities that don't fit your expected pattern.
Consider, is Regina a person or a city? I know people named Regina. I know of a city named Regina. How would you differentiate between the two?
I don't think that a regex is the right tool for this. I'm pretty sure both person and city names are far more complex than you've allowed for.
"A little song, a little dance, a little seltzer down your pants"
Chuckles the clown
|
|
|
|
|
It's impossible. Suppose you are just looking at surname and city, then my local city defeats this. Am I looking for the magician with the surname Durham[^], or the city in England[^]?
|
|
|
|
|
KiranKumar V 2024 wrote: regular expression for cityname and name of person
You stated in the other post
Go to ParentXML element looks like for name of person
<ab ov="Jeff" v="Jeff" id="1">
And in the same XML for cityname
<ab ov="Birmingham" v="Birmingham" id="2">
And in same XML cityname having all caps letter like
<ab ov="BIRMINGHAM" v="BIRMINGHAM" id="3">
As suggestion from another response it is NOT possible for you to determine from the above which is a city and which is a persons name.
HOWEVER, what you posted is not valid XML. It would seem possible to me that there are other XML elements that you can use.
But if not then I would immediately point out to whoever assigned this to you that it is NOT deterministic. A computer can NOT solve the problem correctly. Doesn't matter how you do it.
But with you posted the ONLY solution you have right now would be with the following.
- You must buy a city database. That is a product/service that one pays money for.
- You then use XML to parse the data. You do NOT use regular expressions to parse it.
- You look up the each value in the database. If you find it is a city. If you don't it is a name.
Following is an actual list of cities named after people. So of course these are the one that a computer cannot tell the difference. Actually human will not be able to tell it either.
https://en.wikipedia.org/wiki/List_of_places_in_the_United_States_named_after_people
Now in terms of other possibilities.
- There is in fact a person name AND city name in each record. So you could use that in combination with the above.
- As I said there are other elements/attributes in the XML that define exactly what it is.
- You can request that they change the XML to make it clear which is a city and which is a name.
|
|
|
|
|
jschell wrote: If you find it is a city. If you don't it is a name.
Paris[^]? Durham[^]? London[^]? Adelaide[^]? Etc.
There are plenty of examples that could be either a city or a name. Using not in the list of cities === person test might give you a good start, but its never going to be 100% accurate.
"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer
|
|
|
|
|
Richard Deeming wrote: but its never going to be 100% accurate.
Was the phrasing in my response not clear? I thought I was pointing that out in several places.
|
|
|
|
|
I am trying to create a filter for AH01264 errors. This is for bots trying to run standard "php"s or "pl"s off of my home server.
While I have figured out how enter and activate new filters in fail2ban, the regex required is way beyond my capabilities.
So was hoping someone here has done this before and could help me out.
Thank you
|
|
|
|
|
This forum is for regex in general. So specifics to fail2ban are not likely to succeed.
Googling I found other posts about this although I did not look into it deeply.
fail2ban ah01264
Based on that seems possible at least.
But in general "error messages" across all applications are seldom well regulated. Thus attempts to capture them either lead to too few or too many. Until one has enough actual examples to code to. Even then updates might change that.
For this forum you can post a regex and examples and we can correct the regex from that.
|
|
|
|
|
Good morning
I need to "purify" sentences to be able to use them in an app.
Thanks a lot for your help in helping me build the right code
I think the pattern is:
- 'any sentence' (0 or 1 time) {text here} (0-N times)
example
AB CD {xxxxx}
or AB CD {xxxxx,yyyy}
or {xxxxx}
or AB CD
or AB CD {xxxxx} AB CD {xxxxx}
or AB CD {xxxxx} {xxxxx} {xxxxx}
etc
- the {text here} block looks like {digit|some text}.
For example:
{1|xxxxxxxxx}
- the 'some text' block can (not mandatory) contain 'default=xxx' at any place in the text
ex: {3|abc=d,default=my value} or {2|a b c=d,default=my value,another=valueThatIDontNeed} or {1|default=my value}
I need to isolate the following parts and return them to a string.
- 'any sentence' text (if exists)
- xxx of the 'default=xxx' pattern, as per above explanation
This does not need to be done in one pass, I can script that in loops in Python for example.
Here are a few examples
Example 1
Store bulk masses greater than {0|message=<specify mass="" value="">|filter=^(_)?MASS_VALUE.+|add space after=false.+}{1|message=<specify mass="" unit="">|filter=^(_)?P413_MASS_UNIT.+} at temperatures not exceeding {2|message=<specify temperature="" value="">|filter=^(_)?TEMP_VALUE_.+|add space after=false.+}{3|message=<specify temperature="" unit="">|filter=^(_)?P413_TEMP_UNIT_.+}
this should give
Store bulk masses greater than at temperatures not exceeding
Example 2
Inhoud onder {0|message=<geschikt(e) vloeistof="" of="" gas="" specificeren="">|default=inert gas|filter=^(_)?P231_STORAGE_.+} gebruiken en bewaren. Tegen vocht beschermen.
Should give
Inhoud onder inert gas gebruiken en bewaren. Tegen vocht beschermen.
Example 3
EN CAS DE CONTACT AVEC LA PEAU: Laver abondamment{0|message=<préciser un="" produit="" de="" nettoyage="">|default=à l’eau|filter=^(_)?P352_WASH_.+}. Appeler immédiatement {1|message=<préciser qui="" pourra="" émettre="" comme="" il="" convient="" n="" avis="" médical="" en="" cas="" d’urgence="">|default=un CENTRE ANTIPOISON ou un médecin|filter=^(_)?P310_EMERGENCY_.+}.
Should give
EN CAS DE CONTACT AVEC LA PEAU: Laver abondamment à l’eau . Appeler immédiatement un CENTRE ANTIPOISON ou un médecin
Example 4
{0|message=<specificeren of="" dumpingvoorschriften="" van="" toepassing="" zijn="" op="" inhoud,="" container="">|default=Inhoud/verpakking|filter=^(_)?P501_REQUIREMENT_.+} afvoeren naar {1|message=<specificeer welke="" lokale="" regionale="" nationale="" internationale="" wetgeving="">|default=…|filter=^(_)?P501_DISPOSAL_.+}.
Should give
Inhoud/verpakking afvoeren naar … .
thanks !
|
|
|
|
|
I've been able to isolate text vs {} blocks in RegExr: Learn, Build, & Test RegEx[^] and Regex Tester and Debugger Online - Javascript, PCRE, PHP[^]
using
((?![{}])\w| )*|(({.*?}))
I'd then expert to use a python or php script to loop on groups , and for each group launch another regex to grep the "default=xxx" text only.
but using the same regex in python does not work (
The regex does not isolate block the same way as the 2 websites do
Looking for a bit of help here
Thanks so much
|
|
|
|
|
Hello all,
I'm a newbie to regex and need some help. What I want to do is to check if a filename contains two or more _ beneath each other.
For example: painting__test.pdf or painting___test2.pdf
those should be replaced by only one _
hope you can assist in solving.
kind regards
Franz-Georg
|
|
|
|
|
|
original folder "datasets to optimize" contains 1000s files like this
variable 2-3 dots and 1 dot with numbers 1.87.184 and 0.35643634, and sometimes they contain (#num#) at the end of number
how to copy matching files from "datasets to optimize" to "result"?
~/Documents/optimizer/datasets to optimize #
-rwxr-xr-x 1 root root 170K Nov 11 14:11 'MethodStats 1.87.184_vardata_L6_probin.txt'
-rwxr-xr-x 1 root root 193K Nov 11 14:11 'MethodStats 0.206117(4)_vardata_L6_probin.txt'
-rwxr-xr-x 1 root root 199K Nov 11 14:11 'MethodStats 9.58.155_vardata_L6_probin.txt'
-rwxr-xr-x 1 root root 187K Nov 11 14:11 'MethodStats 9.61.114_vardata_L6_probin.txt'
-rwxr-xr-x 1 root root 181K Nov 11 14:11 'MethodStats 9.6.185_vardata_L6_probin.txt'
-rwxr-xr-x 1 root root 212K Nov 11 14:11 'MethodStats 9.64.191_vardata_L6_probin.txt'
-rwxr-xr-x 1 root root 171K Nov 11 14:11 'MethodStats 9.66.150_vardata_L6_probin.txt'
-rwxr-xr-x 1 root root 231K Nov 11 14:11 'MethodStats 9.72.194_vardata_L6_probin.txt'
-rwxr-xr-x 1 root root 241K Nov 11 14:11 'MethodStats 9.73.138_vardata_L6_probin.txt'
-rwxr-xr-x 1 root root 176K Nov 11 14:11 'MethodStats 9.83.123_vardata_L6_probin.txt'
-rwxr-xr-x 1 root root 179K Nov 11 14:11 'MethodStats 9.83.125_vardata_L6_probin.txt'
-rwxr-xr-x 1 root root 181K Nov 11 14:11 'MethodStats 9.83.195_vardata_L6_probin.txt'
-rwxr-xr-x 1 root root 176K Nov 11 14:11 'MethodStats 9.85.133_vardata_L6_probin.txt'
-rwxr-xr-x 1 root root 218K Nov 11 14:11 'MethodStats 9.85.167_vardata_L6_probin.txt'
-rwxr-xr-x 1 root root 195K Nov 11 14:11 'MethodStats 9.86.177_vardata_L6_probin.txt'
-rwxr-xr-x 1 root root 178K Nov 11 14:11 'MethodStats 23.92.166(1)_vardata_L6_probin.txt'
-rwxr-xr-x 1 root root 210K Nov 11 14:11 'MethodStats 9.89.189_vardata_L6_probin.txt'
-rwxr-xr-x 1 root root 218K Nov 11 14:11 'MethodStats 0.2048140_vardata_L6_probin.txt'
and output folder "result" has the same but before that is added FW Retest
/root/Documents/optimizer/result
-rwxr-xr-x 1 root root 3.4M Nov 11 14:14 'FW Retest - MethodStats 1.87.184_vardata_L6_probin.txt'
-rwxr-xr-x 1 root root 2.7M Nov 11 14:15 'FW Retest - MethodStats 0.2048140_vardata_L6_probin.txt'
-rwxr-xr-x 1 root root 2.5M Nov 11 14:15 'FW Retest - MethodStats 0.206117(4)_vardata_L6_probin.txt'
-rwxr-xr-x 1 root root 2.4M Nov 11 14:15 'FW Retest - MethodStats 23.92.166(1)_vardata_L6_probin.txt'
how to copy matching files from "datasets to optimize" to "result"?
|
|
|
|
|
I doubt this is a regular expression problem.
Best I can figure out you have files in directory A and you want to copy some of those to B.
Then I can't follow what you want because it is either to overwrite or for the missing ones.
In either case you will need a loop - which has nothing to do with regular expressions.
|
|
|
|
|
I am looking for a substution RegEx which prefixes all words with a caret, except when followed by a bracket. E.g. "func(x)=y" must be replaced by "func(^x)=^y" .
I came up with (in Python code):
re.sub(r'([A-Za-z][A-Za-z0-9_]*)([^(]|$)', r'^\1\2', 'func(x)=y')
but that does not work. "f(x)=y" is replaced correctly, but it fails when the word before the bracket has more characters.
I suspect it can be done with a single substitution, but I can't figure it out. I guess my solution does not work because the first bracketed regular expression is not greedy enough. What am I doing wrong?
|
|
|
|
|
I asked ChatGPT and it suggested \b for word boundaries and (?!.) for negative look ahead assertion, so:
re.sub(r'(\b[A-Za-z][A-Za-z0-9_]*\b)(?!\()', r'^\1', 'val1+function(x)=y+val2+f(xx)')
'^val1+function(^x)=^y+^val2+f(^xx)'
works.....
|
|
|
|
|
When you want to search and replace specific patterns of text, use regular expressions. They can help you in pattern matching, parsing, filtering of results, and so on. Once you learn the regex syntax, you can use it for almost any language. Press Ctrl 0R to open the search and replace pane.
|
|
|
|
|
Good morning everyone,
I can't use regular formulas
I basically have to look for initials
(jolly)"se_meta":90
And I have to replace
Work09Document2023”se_meta”:90
Thanks and best regards
Roberto Grigis
|
|
|
|
|
Quote: I can't use regular formulas Because you aren't allowed to, or because you don't know how to? It makes a big difference in the answer ...
What have you tried?
Where are you stuck?
What help do you need?
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
"Common sense is so rare these days, it should be classified as a super power" - Random T-shirt
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
I can’t use regular formulas
I have all the files containing:
Work09Document2022"se_meta":9001
Work09Document2022"se_meta":9002
Work09Document2022"se_meta":9003
Work09Document2022"se_meta":9004
Work09Document2022"se_meta":9005
.
.
.
(and so on)
I basically have to look for initials
(Wildcard)"se_meta":90
And I have to replace
Work09Document2023"se_meta":90
Thanks for your help
Roberto Grigis
|
|
|
|
|
At the risk of repeating myself...
Quote: Because you aren't allowed to, or because you don't know how to? It makes a big difference in the answer ...
What have you tried?
Where are you stuck?
What help do you need?
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
"Common sense is so rare these days, it should be classified as a super power" - Random T-shirt
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
For the REGEX program or also NOTEPAD++
Thanks and best regards
|
|
|
|