|
For a manual process, use something like PowerRename[^].
If you want to do this in code, you're going to need to provide a lot more detail. But since you keep ignoring OG's questions, I won't hold my breath.
"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer
|
|
|
|
|
Thanks for your help,
that's not what I'm interested in. Find and replace within the file
|
|
|
|
|
STOP TYPING AS LITTLE AS POSSIBLE AND ACTUALLY EXPLAIN WHAT YOU'RE HAVING A PROBLEM DOING!!!!
|
|
|
|
|
Given the conversation here and that you were equally sparse on words in your question on the Notepad++ forum https://community.notepad-plus-plus.org/topic/25021/regular-formulas it does seem that you don't read the replies properly and respond with a LOT more information.
If it wasn't for the pigeon English feel I'd almost say this is an AI conversation.
Terry
|
|
|
|
|
I think we are witnessing a Miracle Of Life: the birth of a Help Vampire. Fascinating - you don't often get to see them in the wild until they are at least 3 years old ...
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
"Common sense is so rare these days, it should be classified as a super power" - Random T-shirt
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
|
I've got some data input by a user.
it's like <span> </span> and this can be any number of this non-breaking space up until the close of the span tag, with nothing else.
I'm trying to figure out a reasonable way to be able to detect an occurance of a span like this. It would be the span and then 30 occurances of the non-breaking space, or it could be 300 occurances, or any number between.
I was hoping there'd be a way to detect this repeating pattern within a regular expression.
<span> </span>
|
|
|
|
|
You might need to test for the Unicode value.
|
|
|
|
|
|
Specifics of where/which regex is used matters.
But in general
{code}
\s*( \s*)+
{code}
Les Stockton wrote: until the close of the span tag
In valid XML looking for the closing tag is pointless. But you can add it if you want.
Les Stockton wrote: XML
Just noting that regexes to parse XML is not a good idea. Primarily this comes down to blocks embedded in other blocks. You cannot parse that with a regex. But there are other complex issues also that would require hideous regexes (which means slow) also.
Also there can be other variances in what you posted.
1. Multiline
2. Spaces in the tags
3. Attributes in the tag.
|
|
|
|
|
In general, trying to parse XML (or HTML) with regex is not a good idea, and almost certainly doomed to failure. However, to match this specific case you might try:
<span>( *)+</span>"
That's an extended POSIX regex, and seems to do the job. It matches any of the following:
<span> </span>
<span> </span>
<span> </span>
<span> </span>
<span> </span> If you need to accept any white space you might try using ( [[:space:]]*) as the sub-pattern.
If you may have line breaks in the span text, then you may need to tell your regex engine to not treat them as end-of-text markers.
Keep Calm and Carry On
|
|
|
|
|
Hello! I'm new to learning regex and I cant seem to figure out how to exclude certain characters from my regex.
In my example I want to structure names as FIRST_NAME, LAST_NAME. Meaning "Rog er , Green" becomes "Roger, Green".
Which I have acomplished with the RegEx (?=[A-Z]). My current issue is in Sweden where I recide conjoined names such as "Lars-Erik" is rather common and with my current RegEx that becomes "Lars- Erik" which I dont want.
It also does not take into account nordic uppercase letters such as Ö etc.
Is there a way of excluding Uppercase letters that have a Hyphen prefix, as well as including more than US characters?
|
|
|
|
|
When you use square brackets, you are limiting it to just the characters (and ranges) you include within the brackets: [A-Z] includes only the characters 'A' to 'Z'. I fyou want to extent that to accented characters, you either need to add them to the brackets: [A-Za-zÄÀÉÏÔÕÖÜßàâäèéêëîôõöûüçÇ] or use the less specific "alphanumeric character" code \w instead - this has the disadvantage of including '0' to '9' as well, but ... it's a lot easier to read ...
You can also include the hyphen in your "permitted characters" list to allow for hyphenated names: [\w-]
If you are going to use regular expressions, you need a helper tool. Get a copy of Expresso[^] - it's free, and it examines and generates Regular expressions.
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
"Common sense is so rare these days, it should be classified as a super power" - Random T-shirt
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
I see, \w would match all characters right? Not just Uppercase letters. So in my case I would need to individually add all Uppercase variants of accented characters into my brackers after A-Z?
|
|
|
|
|
Yes.
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
"Common sense is so rare these days, it should be classified as a super power" - Random T-shirt
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
|
It sounds like you're building a great way to thoroughly annoy your users.
Even ignoring the famous Falsehoods Programmers Believe About Names[^] list, if Mr Green wants to be known as "Rog er", why should he be forced to change his name to "Rog-Er" or "Roger" just to fit in with your system's rules?
I understand that you're probably just trying to prevent typos. But you're building a system that can't cope with anything even slightly unusual "just in case" someone can't type their own name.
"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer
|
|
|
|
|
I see your concern but this RegEx is to be implemented with a OCR engine which sometimes (unfortunately) adds whitespaces where there are none.
So in my example "Rog er" has actually entered "Roger" and the OCR interprets it as two parts. I want to eliminate those cases, if an actual "Rog er" appears then my customer is informed that hey have to look for "Roger".
"Rog-er / Rog-Er" is accounted for and allowed!
|
|
|
|
|
Norwegian names frequently omit the hyphen: In my school class there were both Per Erik, Hans Petter, Gunn Marit and Marit Irene (all first names, not first+family names). Others had double first names, but used only one of them except at formal occasions. Both my parents had double, un-hyphenated first names (and my father's second first name was used so rarely that I didn't know of it until my mid-teens!).
(In Sweden, you are quite likely to get in contact with people with names Norwegian style.)
The list of false assumptions in the article linked by Richard Dennings is great!
|
|
|
|
|
trønderen wrote: Richard Dennings
Who?
"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer
|
|
|
|
|
Sorry. My mistake. When you open a reply window, the list of messages is by default hidden, and I took the name from my (incorrect) memory of the author of the message no longer in view.
I hope you were not terribly offended. I am sorry for my mistake anyway.
|
|
|
|
|
So the real problem is the OCR solution.
The reality is that you are unlikely to be able to deal with all of the error cases. Your solution might introduce more errors.
So it is a trade off.
If error reduction is considered a significant issue, then perhaps better to look into getting a different OCR solution and use both of them. Then compare the output from both and only apply fixes when there is a difference.
If as I said it is a significant problem then any additional cost should not be a problem. But if the cost is a problem then perhaps it isn't as significant as thought.
|
|
|
|
|
Doing this in a language such as C# would have been a trivial task.
It would have given you a lot more flexibility in handling e.g. standard name parts that are not capitalized, such as Ludvigvan Beethoven, Charlesde Gaulle or Bengtaf Klintberg. Lots of other special cases and variations could be handled in a much more maintainable way.
I have linked to this several times earlier, but it cannot be repeated too often: Geek & Poke: Yesterday's regex[^]
You cannot expect your name matching to be perfect on the first try. Or second. Or third. E.g. a list of prepositions such as "van", "de", "af" ... will grow and grow. Adding them to a C# list is far easier than updating your regex.
|
|
|
|
|
There are also English (and Welsh) surnames that start with "ff": it indicated "son of" in Middle Age English and was a single letter - literally an uppercase "F" was written as "ff" Until the end of the Middle Ages the initial capitalization of any name wasn't a thing - names were all written in lowercase. Some rich people* kept the lowercase starter to this day (and can get very shirty if you use uppercase!)
* Who mostly were the only ones with surnames anyway, they didn't become common practice until the aftermath of the Black Death.
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
"Common sense is so rare these days, it should be classified as a super power" - Random T-shirt
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
hi all,
so, I'm new to regex, trying to understand and i admit i'm lost.
here's what i need right now;
i have a list of string where i wish to extract the email address of users, each line looks like this:
DisplayName;Surname;Givenname;Mail;Company
which gives me something like:
$line = 'jsmith;john;smith;john.smith@someemail.com;acme'
since I'm new and not sure how this work, i do these to test and learn, and the results. now i'm trying to understand why the last 2 shown here are failing.
$line -match '\w+' = True
$line -match '\w+;' = true
$line -math '\w+;\w+;' = true
$line -match '\w+;\w+;\w+' = true
$line -match '\w+;\w+;\w+;' = false
$line -match '\w+;\w+;\w+;\.*' = false
at first i thought that this regex would give me the email but it fails.
$regex = '\w+;\w+;\w+;(\w+@\w+);\w+'
thanks for helping me.
|
|
|
|