Regular Expressions Discussion Boards

Re: Regular expression for City name

Dave Kreskowiak8-Feb-24 3:56

Dave Kreskowiak

8-Feb-24 3:56

Richard Deeming wrote:
or a Vulcan mating ritual.

Asking questions is a skill
CodeProject Forum Guidelines
Google: C# How to debug code
Seriously, go read these articles.
Dave Kreskowiak

Re: Regular expression for City name

Richard MacCutchan8-Feb-24 4:36

Richard MacCutchan

8-Feb-24 4:36

Richard Deeming wrote:
"Paris" could be a city or a person.

Paris Hilton - Wedding, Photos, Videos, Celebrity, Entrepreneur, Advocate[^]

Re: Regular expression for City name

jschell8-Feb-24 4:44

jschell

8-Feb-24 4:44

Richard Deeming wrote:
or a Vulcan mating ritual.

Are you sure? There probably could be more than one but I suspect the names are going to be pretty unique.

Re: Regular expression for City name

Dave Kreskowiak8-Feb-24 3:55

Dave Kreskowiak

8-Feb-24 3:55

It's simply not possible. There is no expression that will be able to tell you whether the name is a person or a city. NONE AT ALL.

If you're trying to extract the names from the XML file, you DO NOT USE A REGEX FOR THIS. You create classes to hold each type of record and deserialize the XML into a data structure using those classes.

But, since you're get both city names and person names in the same record type (whatever "ab" means), there is no code you could ever write to tell you whether that is a person or a city.

Asking questions is a skill
CodeProject Forum Guidelines
Google: C# How to debug code
Seriously, go read these articles.
Dave Kreskowiak

Re: Regular expression for City name

k50547-Feb-24 20:02

k5054

7-Feb-24 20:02

Based on your description, it would seem that New York is not a valid name for either a person or a city. I'm pretty sure it is a city. So is Stoke-on-Trent. There's probably other names for both people and cities that don't fit your expected pattern.

Consider, is Regina a person or a city? I know people named Regina. I know of a city named Regina. How would you differentiate between the two?

I don't think that a regex is the right tool for this. I'm pretty sure both person and city names are far more complex than you've allowed for.

"A little song, a little dance, a little seltzer down your pants"
Chuckles the clown

Re: Regular expression for City name

Pete O'Hanlon8-Feb-24 4:10

Pete O'Hanlon

8-Feb-24 4:10

It's impossible. Suppose you are just looking at surname and city, then my local city defeats this. Am I looking for the magician with the surname Durham[^], or the city in England[^]?

Advanced TypeScript Programming Projects

Re: Regular expression for City name

jschell8-Feb-24 4:58

jschell

8-Feb-24 4:58

KiranKumar V 2024 wrote:
regular expression for cityname and name of person

You stated in the other post

Go to ParentXML element looks like for name of person
<ab ov="Jeff" v="Jeff" id="1">

And in the same XML for cityname
<ab ov="Birmingham" v="Birmingham" id="2">

And in same XML cityname having all caps letter like
<ab ov="BIRMINGHAM" v="BIRMINGHAM" id="3">

As suggestion from another response it is NOT possible for you to determine from the above which is a city and which is a persons name.

HOWEVER, what you posted is not valid XML. It would seem possible to me that there are other XML elements that you can use.

But if not then I would immediately point out to whoever assigned this to you that it is NOT deterministic. A computer can NOT solve the problem correctly. Doesn't matter how you do it.

But with you posted the ONLY solution you have right now would be with the following.
- You must buy a city database. That is a product/service that one pays money for.
- You then use XML to parse the data. You do NOT use regular expressions to parse it.
- You look up the each value in the database. If you find it is a city. If you don't it is a name.

Following is an actual list of cities named after people. So of course these are the one that a computer cannot tell the difference. Actually human will not be able to tell it either.

https://en.wikipedia.org/wiki/List_of_places_in_the_United_States_named_after_people

Now in terms of other possibilities.
- There is in fact a person name AND city name in each record. So you could use that in combination with the above.
- As I said there are other elements/attributes in the XML that define exactly what it is.
- You can request that they change the XML to make it clear which is a city and which is a name.

Re: Regular expression for City name

Richard Deeming8-Feb-24 5:07

Richard Deeming

8-Feb-24 5:07

jschell wrote:
If you find it is a city. If you don't it is a name.

Paris[^]? Durham[^]? London[^]? Adelaide[^]? Etc.

There are plenty of examples that could be either a city or a name. Using not in the list of cities === person test might give you a good start, but its never going to be 100% accurate. Smile | :)

"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer

Re: Regular expression for City name

jschell9-Feb-24 12:02

jschell

9-Feb-24 12:02

Richard Deeming wrote:
but its never going to be 100% accurate.

Was the phrasing in my response not clear? I thought I was pointing that out in several places.

Does anyone have experience with creating the regex rule for a fail2ban filter?

Member 161948782-Feb-24 4:44

Member 16194878

2-Feb-24 4:44

I am trying to create a filter for AH01264 errors. This is for bots trying to run standard "php"s or "pl"s off of my home server.

While I have figured out how enter and activate new filters in fail2ban, the regex required is way beyond my capabilities.

So was hoping someone here has done this before and could help me out.

Thank you

Re: Does anyone have experience with creating the regex rule for a fail2ban filter?

jschell7-Feb-24 5:38

jschell

7-Feb-24 5:38

This forum is for regex in general. So specifics to fail2ban are not likely to succeed.

Googling I found other posts about this although I did not look into it deeply.

fail2ban ah01264

Based on that seems possible at least.

But in general "error messages" across all applications are seldom well regulated. Thus attempts to capture them either lead to too few or too many. Until one has enough actual examples to code to. Even then updates might change that.

For this forum you can post a regex and examples and we can correct the regex from that.

Help for a regexp

Member 1616830421-Dec-23 23:43

Member 16168304

21-Dec-23 23:43

Good morning

I need to "purify" sentences to be able to use them in an app.
Thanks a lot for your help in helping me build the right code Smile | :)

I think the pattern is:

- 'any sentence' (0 or 1 time) {text here} (0-N times)

example
AB CD {xxxxx}
or AB CD {xxxxx,yyyy}
or {xxxxx}
or AB CD
or AB CD {xxxxx} AB CD {xxxxx}
or AB CD {xxxxx} {xxxxx} {xxxxx}
etc

- the {text here} block looks like {digit|some text}.
For example:
{1|xxxxxxxxx}

- the 'some text' block can (not mandatory) contain 'default=xxx' at any place in the text

ex: {3|abc=d,default=my value} or {2|a b c=d,default=my value,another=valueThatIDontNeed} or {1|default=my value}

I need to isolate the following parts and return them to a string.
- 'any sentence' text (if exists)
- xxx of the 'default=xxx' pattern, as per above explanation

This does not need to be done in one pass, I can script that in loops in Python for example.

Here are a few examples

Example 1
Store bulk masses greater than {0|message=<specify mass="" value="">|filter=^(_)?MASS_VALUE.+|add space after=false.+}{1|message=<specify mass="" unit="">|filter=^(_)?P413_MASS_UNIT.+} at temperatures not exceeding {2|message=<specify temperature="" value="">|filter=^(_)?TEMP_VALUE_.+|add space after=false.+}{3|message=<specify temperature="" unit="">|filter=^(_)?P413_TEMP_UNIT_.+}

this should give
Store bulk masses greater than at temperatures not exceeding

Example 2
Inhoud onder {0|message=<geschikt(e) vloeistof="" of="" gas="" specificeren="">|default=inert gas|filter=^(_)?P231_STORAGE_.+} gebruiken en bewaren. Tegen vocht beschermen.

Should give
Inhoud onder inert gas gebruiken en bewaren. Tegen vocht beschermen.

Example 3
EN CAS DE CONTACT AVEC LA PEAU: Laver abondamment{0|message=<préciser un="" produit="" de="" nettoyage="">|default=à l’eau|filter=^(_)?P352_WASH_.+}. Appeler immédiatement {1|message=<préciser qui="" pourra="" émettre="" comme="" il="" convient="" n="" avis="" médical="" en="" cas="" d’urgence="">|default=un CENTRE ANTIPOISON ou un médecin|filter=^(_)?P310_EMERGENCY_.+}.

Should give
EN CAS DE CONTACT AVEC LA PEAU: Laver abondamment à l’eau . Appeler immédiatement un CENTRE ANTIPOISON ou un médecin

Example 4

{0|message=<specificeren of="" dumpingvoorschriften="" van="" toepassing="" zijn="" op="" inhoud,="" container="">|default=Inhoud/verpakking|filter=^(_)?P501_REQUIREMENT_.+} afvoeren naar {1|message=<specificeer welke="" lokale="" regionale="" nationale="" internationale="" wetgeving="">|default=…|filter=^(_)?P501_DISPOSAL_.+}.

Should give
Inhoud/verpakking afvoeren naar … .

thanks !

Re: Help for a regexp

Member 1616830422-Dec-23 2:53

Member 16168304

22-Dec-23 2:53

I've been able to isolate text vs {} blocks in RegExr: Learn, Build, & Test RegEx[^] and Regex Tester and Debugger Online - Javascript, PCRE, PHP[^]
using

((?![{}])\w| )*|(({.*?}))

I'd then expert to use a python or php script to loop on groups , and for each group launch another regex to grep the "default=xxx" text only.

but using the same regex in python does not work Frown | :(

(
The regex does not isolate block the same way as the 2 websites do Frown | :(

Looking for a bit of help here Smile | :)

Thanks so much

Regex: more than one identical character next to each other?

Member 1615168129-Nov-23 7:07

Member 16151681

29-Nov-23 7:07

Hello all,
I'm a newbie to regex and need some help. What I want to do is to check if a filename contains two or more _ beneath each other.

For example: painting__test.pdf or painting___test2.pdf

those should be replaced by only one _

hope you can assist in solving.

kind regards
Franz-Georg

Re: Regex: more than one identical character next to each other?

Richard Deeming29-Nov-23 22:28

Richard Deeming

29-Nov-23 22:28

Find _{2,} and replace with _.

regex101: build, test, and debug regex[^]

"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer

Help me regexp gurus please problem is matching files and copy

Gin Mador11-Nov-23 15:27

Gin Mador

11-Nov-23 15:27

original folder "datasets to optimize" contains 1000s files like this

variable 2-3 dots and 1 dot with numbers 1.87.184 and 0.35643634, and sometimes they contain (#num#) at the end of number

how to copy matching files from "datasets to optimize" to "result"?

~/Documents/optimizer/datasets to optimize #

-rwxr-xr-x 1 root root 170K Nov 11 14:11 'MethodStats 1.87.184_vardata_L6_probin.txt'

-rwxr-xr-x 1 root root 193K Nov 11 14:11 'MethodStats 0.206117(4)_vardata_L6_probin.txt'

-rwxr-xr-x 1 root root 199K Nov 11 14:11 'MethodStats 9.58.155_vardata_L6_probin.txt'

-rwxr-xr-x 1 root root 187K Nov 11 14:11 'MethodStats 9.61.114_vardata_L6_probin.txt'

-rwxr-xr-x 1 root root 181K Nov 11 14:11 'MethodStats 9.6.185_vardata_L6_probin.txt'

-rwxr-xr-x 1 root root 212K Nov 11 14:11 'MethodStats 9.64.191_vardata_L6_probin.txt'

-rwxr-xr-x 1 root root 171K Nov 11 14:11 'MethodStats 9.66.150_vardata_L6_probin.txt'

-rwxr-xr-x 1 root root 231K Nov 11 14:11 'MethodStats 9.72.194_vardata_L6_probin.txt'

-rwxr-xr-x 1 root root 241K Nov 11 14:11 'MethodStats 9.73.138_vardata_L6_probin.txt'

-rwxr-xr-x 1 root root 176K Nov 11 14:11 'MethodStats 9.83.123_vardata_L6_probin.txt'

-rwxr-xr-x 1 root root 179K Nov 11 14:11 'MethodStats 9.83.125_vardata_L6_probin.txt'

-rwxr-xr-x 1 root root 181K Nov 11 14:11 'MethodStats 9.83.195_vardata_L6_probin.txt'

-rwxr-xr-x 1 root root 176K Nov 11 14:11 'MethodStats 9.85.133_vardata_L6_probin.txt'

-rwxr-xr-x 1 root root 218K Nov 11 14:11 'MethodStats 9.85.167_vardata_L6_probin.txt'

-rwxr-xr-x 1 root root 195K Nov 11 14:11 'MethodStats 9.86.177_vardata_L6_probin.txt'

-rwxr-xr-x 1 root root 178K Nov 11 14:11 'MethodStats 23.92.166(1)_vardata_L6_probin.txt'

-rwxr-xr-x 1 root root 210K Nov 11 14:11 'MethodStats 9.89.189_vardata_L6_probin.txt'

-rwxr-xr-x 1 root root 218K Nov 11 14:11 'MethodStats 0.2048140_vardata_L6_probin.txt'

and output folder "result" has the same but before that is added FW Retest

/root/Documents/optimizer/result

-rwxr-xr-x 1 root root 3.4M Nov 11 14:14 'FW Retest - MethodStats 1.87.184_vardata_L6_probin.txt'

-rwxr-xr-x 1 root root 2.7M Nov 11 14:15 'FW Retest - MethodStats 0.2048140_vardata_L6_probin.txt'

-rwxr-xr-x 1 root root 2.5M Nov 11 14:15 'FW Retest - MethodStats 0.206117(4)_vardata_L6_probin.txt'

-rwxr-xr-x 1 root root 2.4M Nov 11 14:15 'FW Retest - MethodStats 23.92.166(1)_vardata_L6_probin.txt'

how to copy matching files from "datasets to optimize" to "result"?

Re: Help me regexp gurus please problem is matching files and copy

jschell30-Nov-23 6:17

jschell

30-Nov-23 6:17

I doubt this is a regular expression problem.

Best I can figure out you have files in directory A and you want to copy some of those to B.

Then I can't follow what you want because it is either to overwrite or for the missing ones.

In either case you will need a loop - which has nothing to do with regular expressions.

pattern replace with specific exception

Member 1612263424-Oct-23 1:06

Member 16122634

24-Oct-23 1:06

I am looking for a substution RegEx which prefixes all words with a caret, except when followed by a bracket. E.g. "func(x)=y" must be replaced by "func(^x)=^y" .
I came up with (in Python code):
re.sub(r'([A-Za-z][A-Za-z0-9_]*)([^(]|$)', r'^\1\2', 'func(x)=y')
but that does not work. "f(x)=y" is replaced correctly, but it fails when the word before the bracket has more characters.
I suspect it can be done with a single substitution, but I can't figure it out. I guess my solution does not work because the first bracketed regular expression is not greedy enough. What am I doing wrong?

Re: pattern replace with specific exception

Member 1612263426-Oct-23 1:02

Member 16122634

26-Oct-23 1:02

I asked ChatGPT $Shucks | :-\$ and it suggested \b for word boundaries and (?!.) for negative look ahead assertion, so:

re.sub(r'(\b[A-Za-z][A-Za-z0-9_]*\b)(?!\()', r'^\1', 'val1+function(x)=y+val2+f(xx)')

'^val1+function(^x)=^y+^val2+f(^xx)'

works.....

Re: pattern replace with specific exception

Justice Marc20-Nov-23 5:10

Justice Marc

20-Nov-23 5:10

When you want to search and replace specific patterns of text, use regular expressions. They can help you in pattern matching, parsing, filtering of results, and so on. Once you learn the regex syntax, you can use it for almost any language. Press Ctrl 0R to open the search and replace pane.

Replacement of initial words

Luca Smith12-Oct-23 0:47

Luca Smith

12-Oct-23 0:47

Good morning everyone,
I can't use regular formulas

I basically have to look for initials

(jolly)"se_meta":90

And I have to replace

Work09Document2023”se_meta”:90

Thanks and best regards

Roberto Grigis

Re: Replacement of initial words

OriginalGriff12-Oct-23 0:50

OriginalGriff

12-Oct-23 0:50

Quote:
I can't use regular formulas

Because you aren't allowed to, or because you don't know how to? It makes a big difference in the answer ...

What have you tried?
Where are you stuck?
What help do you need?

"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
"Common sense is so rare these days, it should be classified as a super power" - Random T-shirt
AntiTwitter: @DalekDave is now a follower!

Re: Replacement of initial words

Luca Smith12-Oct-23 0:54

Luca Smith

12-Oct-23 0:54

I can’t use regular formulas

I have all the files containing:

Work09Document2022"se_meta":9001
Work09Document2022"se_meta":9002
Work09Document2022"se_meta":9003
Work09Document2022"se_meta":9004
Work09Document2022"se_meta":9005
.
.
.
(and so on)

I basically have to look for initials
(Wildcard)"se_meta":90

And I have to replace

Work09Document2023"se_meta":90

Thanks for your help

Roberto Grigis

Re: Replacement of initial words

OriginalGriff12-Oct-23 1:18

OriginalGriff

12-Oct-23 1:18

At the risk of repeating myself...

Quote:
Because you aren't allowed to, or because you don't know how to? It makes a big difference in the answer ...

What have you tried?
Where are you stuck?
What help do you need?

Re: Replacement of initial words

Luca Smith12-Oct-23 1:40

Luca Smith

12-Oct-23 1:40

For the REGEX program or also NOTEPAD++

Thanks and best regards

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.