Regular Expressions Discussion Boards

Richard Deeming12-Oct-23 1:45

12-Oct-23 1:45

For a manual process, use something like PowerRename[^].

If you want to do this in code, you're going to need to provide a lot more detail. But since you keep ignoring OG's questions, I won't hold my breath.

"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer

Re: Replacement of initial words

Luca Smith12-Oct-23 1:59

Luca Smith

12-Oct-23 1:59

Thanks for your help,
that's not what I'm interested in. Find and replace within the file

Re: Replacement of initial words

Dave Kreskowiak12-Oct-23 3:01

Dave Kreskowiak

12-Oct-23 3:01

STOP TYPING AS LITTLE AS POSSIBLE AND ACTUALLY EXPLAIN WHAT YOU'RE HAVING A PROBLEM DOING!!!!

Asking questions is a skill
CodeProject Forum Guidelines
Google: C# How to debug code
Seriously, go read these articles.
Dave Kreskowiak

Re: Replacement of initial words

Terry R 202313-Oct-23 9:22

Terry R 2023

13-Oct-23 9:22

Given the conversation here and that you were equally sparse on words in your question on the Notepad++ forum https://community.notepad-plus-plus.org/topic/25021/regular-formulas it does seem that you don't read the replies properly and respond with a LOT more information.

If it wasn't for the pigeon English feel I'd almost say this is an AI conversation.

Terry

Re: Replacement of initial words

OriginalGriff12-Oct-23 2:38

OriginalGriff

12-Oct-23 2:38

I think we are witnessing a Miracle Of Life: the birth of a Help Vampire. Fascinating - you don't often get to see them in the wild until they are at least 3 years old ... Big Grin | :-D

"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
"Common sense is so rare these days, it should be classified as a super power" - Random T-shirt
AntiTwitter: @DalekDave is now a follower!

Re: Replacement of initial words

jschell12-Oct-23 5:21

jschell

12-Oct-23 5:21

Luca Smith wrote:
NOTEPAD++

Following has basics of a replacement for Notepad++ using regular expression. You will need to adjust for your needs.

Find/Replace with Regex and Wildcard in the middle of the strings | Notepad++ Community[^]

Regular Expression for a repeating pattern?

Les Stockton6-Oct-23 7:50

Les Stockton

6-Oct-23 7:50

I've got some data input by a user.
it's like <span>    </span> and this can be any number of this non-breaking space up until the close of the span tag, with nothing else.
I'm trying to figure out a reasonable way to be able to detect an occurance of a span like this. It would be the span and then 30 occurances of the non-breaking space, or it could be 300 occurances, or any number between.
I was hoping there'd be a way to detect this repeating pattern within a regular expression.

XML

<span>&nbsp; &nbsp; </span>

Re: Regular Expression for a repeating pattern?

PIEBALDconsult6-Oct-23 7:59

PIEBALDconsult

6-Oct-23 7:59

You might need to test for the Unicode value.

Re: Regular Expression for a repeating pattern?

RedDk6-Oct-23 8:08

RedDk

6-Oct-23 8:08

Scarf this down:
Regex Quantifier Tutorial: Greedy, Lazy, Possessive[^]

modified 25-Oct-23 2:06am.

Re: Regular Expression for a repeating pattern?

jschell9-Oct-23 5:26

jschell

9-Oct-23 5:26

Specifics of where/which regex is used matters.

But in general

{code}
\s*( \s*)+
{code}

Les Stockton wrote:
until the close of the span tag

In valid XML looking for the closing tag is pointless. But you can add it if you want.

Les Stockton wrote:
XML

Just noting that regexes to parse XML is not a good idea. Primarily this comes down to blocks embedded in other blocks. You cannot parse that with a regex. But there are other complex issues also that would require hideous regexes (which means slow) also.

Also there can be other variances in what you posted.
1. Multiline
2. Spaces in the tags
3. Attributes in the tag.

Re: Regular Expression for a repeating pattern?

k50549-Oct-23 6:12

k5054

9-Oct-23 6:12

In general, trying to parse XML (or HTML) with regex is not a good idea, and almost certainly doomed to failure. However, to match this specific case you might try:

RegEx

<span>(&nbsp; *)+</span>"

That's an extended POSIX regex, and seems to do the job. It matches any of the following:

XML

<span>&nbsp;</span>
<span>&nbsp; </span>
<span>&nbsp; &nbsp; </span>
<span>&nbsp; &nbsp; &nbsp; </span>
<span>&nbsp;&nbsp; &nbsp;</span>

If you need to accept any white space you might try using ( [[:space:]]*) as the sub-pattern.
If you may have line breaks in the span text, then you may need to tell your regex engine to not treat them as end-of-text markers.

Keep Calm and Carry On

Exclude Uppercase for conjoined names

Plastmannen2-Oct-23 22:08

Plastmannen

2-Oct-23 22:08

Hello! I'm new to learning regex and I cant seem to figure out how to exclude certain characters from my regex.
In my example I want to structure names as FIRST_NAME, LAST_NAME. Meaning "Rog er , Green" becomes "Roger, Green".
Which I have acomplished with the RegEx (?=[A-Z]). My current issue is in Sweden where I recide conjoined names such as "Lars-Erik" is rather common and with my current RegEx that becomes "Lars- Erik" which I dont want.
It also does not take into account nordic uppercase letters such as Ö etc.
Is there a way of excluding Uppercase letters that have a Hyphen prefix, as well as including more than US characters?

Re: Exclude Uppercase for conjoined names

OriginalGriff2-Oct-23 22:20

OriginalGriff

2-Oct-23 22:20

When you use square brackets, you are limiting it to just the characters (and ranges) you include within the brackets: [A-Z] includes only the characters 'A' to 'Z'. I fyou want to extent that to accented characters, you either need to add them to the brackets: [A-Za-zÄÀÉÏÔÕÖÜßàâäèéêëîôõöûüçÇ] or use the less specific "alphanumeric character" code \w instead - this has the disadvantage of including '0' to '9' as well, but ... it's a lot easier to read ... Big Grin | :-D

You can also include the hyphen in your "permitted characters" list to allow for hyphenated names: [\w-]

If you are going to use regular expressions, you need a helper tool. Get a copy of Expresso[^] - it's free, and it examines and generates Regular expressions.

Re: Exclude Uppercase for conjoined names

Plastmannen2-Oct-23 23:25

Plastmannen

2-Oct-23 23:25

I see, \w would match all characters right? Not just Uppercase letters. So in my case I would need to individually add all Uppercase variants of accented characters into my brackers after A-Z?

Re: Exclude Uppercase for conjoined names

OriginalGriff3-Oct-23 0:42

OriginalGriff

3-Oct-23 0:42

Yes.

Re: Exclude Uppercase for conjoined names

Plastmannen3-Oct-23 1:04

Plastmannen

3-Oct-23 1:04

Alright thank you!

Re: Exclude Uppercase for conjoined names

Richard Deeming2-Oct-23 23:53

Richard Deeming

2-Oct-23 23:53

It sounds like you're building a great way to thoroughly annoy your users. D'Oh! | :doh:

Even ignoring the famous Falsehoods Programmers Believe About Names[^] list, if Mr Green wants to be known as "Rog er", why should he be forced to change his name to "Rog-Er" or "Roger" just to fit in with your system's rules?

I understand that you're probably just trying to prevent typos. But you're building a system that can't cope with anything even slightly unusual "just in case" someone can't type their own name.

"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer

Re: Exclude Uppercase for conjoined names

Plastmannen3-Oct-23 1:26

Plastmannen

3-Oct-23 1:26

I see your concern but this RegEx is to be implemented with a OCR engine which sometimes (unfortunately) adds whitespaces where there are none.
So in my example "Rog er" has actually entered "Roger" and the OCR interprets it as two parts. I want to eliminate those cases, if an actual "Rog er" appears then my customer is informed that hey have to look for "Roger".
"Rog-er / Rog-Er" is accounted for and allowed!

Re: Exclude Uppercase for conjoined names

trønderen3-Oct-23 8:08

trønderen

3-Oct-23 8:08

Norwegian names frequently omit the hyphen: In my school class there were both Per Erik, Hans Petter, Gunn Marit and Marit Irene (all first names, not first+family names). Others had double first names, but used only one of them except at formal occasions. Both my parents had double, un-hyphenated first names (and my father's second first name was used so rarely that I didn't know of it until my mid-teens!).

(In Sweden, you are quite likely to get in contact with people with names Norwegian style.)

The list of false assumptions in the article linked by Richard Dennings is great!

Re: Exclude Uppercase for conjoined names

Richard Deeming3-Oct-23 21:36

Richard Deeming

3-Oct-23 21:36

trønderen wrote:
Richard Dennings

Who?

"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer

Re: Exclude Uppercase for conjoined names

trønderen4-Oct-23 8:40

trønderen

4-Oct-23 8:40

Sorry. My mistake. When you open a reply window, the list of messages is by default hidden, and I took the name from my (incorrect) memory of the author of the message no longer in view.

I hope you were not terribly offended. I am sorry for my mistake anyway.

Re: Exclude Uppercase for conjoined names

jschell4-Oct-23 4:58

jschell

4-Oct-23 4:58

So the real problem is the OCR solution.

The reality is that you are unlikely to be able to deal with all of the error cases. Your solution might introduce more errors.

So it is a trade off.

If error reduction is considered a significant issue, then perhaps better to look into getting a different OCR solution and use both of them. Then compare the output from both and only apply fixes when there is a difference.

If as I said it is a significant problem then any additional cost should not be a problem. But if the cost is a problem then perhaps it isn't as significant as thought.

Re: Exclude Uppercase for conjoined names

trønderen4-Oct-23 9:02

trønderen

4-Oct-23 9:02

Doing this in a language such as C# would have been a trivial task.

It would have given you a lot more flexibility in handling e.g. standard name parts that are not capitalized, such as Ludvigvan Beethoven, Charlesde Gaulle or Bengtaf Klintberg. Lots of other special cases and variations could be handled in a much more maintainable way.
I have linked to this several times earlier, but it cannot be repeated too often: Geek & Poke: Yesterday's regex[^]

You cannot expect your name matching to be perfect on the first try. Or second. Or third. E.g. a list of prepositions such as "van", "de", "af" ... will grow and grow. Adding them to a C# list is far easier than updating your regex.

Re: Exclude Uppercase for conjoined names

OriginalGriff12-Oct-23 1:02

OriginalGriff

12-Oct-23 1:02

There are also English (and Welsh) surnames that start with "ff": it indicated "son of" in Middle Age English and was a single letter - literally an uppercase "F" was written as "ff" Until the end of the Middle Ages the initial capitalization of any name wasn't a thing - names were all written in lowercase. Some rich people^* kept the lowercase starter to this day (and can get very shirty if you use uppercase!)

* Who mostly were the only ones with surnames anyway, they didn't become common practice until the aftermath of the Black Death.

learning regex isn't easy :-)

Kardock18-Sep-23 4:22

Kardock

18-Sep-23 4:22

hi all,

so, I'm new to regex, trying to understand and i admit i'm lost.

here's what i need right now;

i have a list of string where i wish to extract the email address of users, each line looks like this:

DisplayName;Surname;Givenname;Mail;Company

which gives me something like:

$line = 'jsmith;john;smith;john.smith@someemail.com;acme'

since I'm new and not sure how this work, i do these to test and learn, and the results. now i'm trying to understand why the last 2 shown here are failing.

$line -match '\w+' = True
$line -match '\w+;' = true
$line -math '\w+;\w+;' = true
$line -match '\w+;\w+;\w+' = true
$line -match '\w+;\w+;\w+;' = false
$line -match '\w+;\w+;\w+;\.*' = false

at first i thought that this regex would give me the email but it fails.

$regex = '\w+;\w+;\w+;(\w+@\w+);\w+'

thanks for helping me.

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.