|
At the risk of repeating myself...
Quote: Because you aren't allowed to, or because you don't know how to? It makes a big difference in the answer ...
What have you tried?
Where are you stuck?
What help do you need?
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
"Common sense is so rare these days, it should be classified as a super power" - Random T-shirt
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
For the REGEX program or also NOTEPAD++
Thanks and best regards
|
|
|
|
|
What's a "regular formula"? Are you talking about "regular expressions"?
You're not going to get any answers to whatever question you have until you spell out, in detail, the actual question.
|
|
|
|
|
For a manual process, use something like PowerRename[^].
If you want to do this in code, you're going to need to provide a lot more detail. But since you keep ignoring OG's questions, I won't hold my breath.
"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer
|
|
|
|
|
Thanks for your help,
that's not what I'm interested in. Find and replace within the file
|
|
|
|
|
STOP TYPING AS LITTLE AS POSSIBLE AND ACTUALLY EXPLAIN WHAT YOU'RE HAVING A PROBLEM DOING!!!!
|
|
|
|
|
Given the conversation here and that you were equally sparse on words in your question on the Notepad++ forum https://community.notepad-plus-plus.org/topic/25021/regular-formulas it does seem that you don't read the replies properly and respond with a LOT more information.
If it wasn't for the pigeon English feel I'd almost say this is an AI conversation.
Terry
|
|
|
|
|
I think we are witnessing a Miracle Of Life: the birth of a Help Vampire. Fascinating - you don't often get to see them in the wild until they are at least 3 years old ...
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
"Common sense is so rare these days, it should be classified as a super power" - Random T-shirt
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
|
I've got some data input by a user.
it's like <span> </span> and this can be any number of this non-breaking space up until the close of the span tag, with nothing else.
I'm trying to figure out a reasonable way to be able to detect an occurance of a span like this. It would be the span and then 30 occurances of the non-breaking space, or it could be 300 occurances, or any number between.
I was hoping there'd be a way to detect this repeating pattern within a regular expression.
<span> </span>
|
|
|
|
|
You might need to test for the Unicode value.
|
|
|
|
|
|
Specifics of where/which regex is used matters.
But in general
{code}
\s*( \s*)+
{code}
Les Stockton wrote: until the close of the span tag
In valid XML looking for the closing tag is pointless. But you can add it if you want.
Les Stockton wrote: XML
Just noting that regexes to parse XML is not a good idea. Primarily this comes down to blocks embedded in other blocks. You cannot parse that with a regex. But there are other complex issues also that would require hideous regexes (which means slow) also.
Also there can be other variances in what you posted.
1. Multiline
2. Spaces in the tags
3. Attributes in the tag.
|
|
|
|
|
In general, trying to parse XML (or HTML) with regex is not a good idea, and almost certainly doomed to failure. However, to match this specific case you might try:
<span>( *)+</span>"
That's an extended POSIX regex, and seems to do the job. It matches any of the following:
<span> </span>
<span> </span>
<span> </span>
<span> </span>
<span> </span> If you need to accept any white space you might try using ( [[:space:]]*) as the sub-pattern.
If you may have line breaks in the span text, then you may need to tell your regex engine to not treat them as end-of-text markers.
Keep Calm and Carry On
|
|
|
|
|
Hello! I'm new to learning regex and I cant seem to figure out how to exclude certain characters from my regex.
In my example I want to structure names as FIRST_NAME, LAST_NAME. Meaning "Rog er , Green" becomes "Roger, Green".
Which I have acomplished with the RegEx (?=[A-Z]). My current issue is in Sweden where I recide conjoined names such as "Lars-Erik" is rather common and with my current RegEx that becomes "Lars- Erik" which I dont want.
It also does not take into account nordic uppercase letters such as Ö etc.
Is there a way of excluding Uppercase letters that have a Hyphen prefix, as well as including more than US characters?
|
|
|
|
|
When you use square brackets, you are limiting it to just the characters (and ranges) you include within the brackets: [A-Z] includes only the characters 'A' to 'Z'. I fyou want to extent that to accented characters, you either need to add them to the brackets: [A-Za-zÄÀÉÏÔÕÖÜßàâäèéêëîôõöûüçÇ] or use the less specific "alphanumeric character" code \w instead - this has the disadvantage of including '0' to '9' as well, but ... it's a lot easier to read ...
You can also include the hyphen in your "permitted characters" list to allow for hyphenated names: [\w-]
If you are going to use regular expressions, you need a helper tool. Get a copy of Expresso[^] - it's free, and it examines and generates Regular expressions.
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
"Common sense is so rare these days, it should be classified as a super power" - Random T-shirt
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
I see, \w would match all characters right? Not just Uppercase letters. So in my case I would need to individually add all Uppercase variants of accented characters into my brackers after A-Z?
|
|
|
|
|
Yes.
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
"Common sense is so rare these days, it should be classified as a super power" - Random T-shirt
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
|
It sounds like you're building a great way to thoroughly annoy your users.
Even ignoring the famous Falsehoods Programmers Believe About Names[^] list, if Mr Green wants to be known as "Rog er", why should he be forced to change his name to "Rog-Er" or "Roger" just to fit in with your system's rules?
I understand that you're probably just trying to prevent typos. But you're building a system that can't cope with anything even slightly unusual "just in case" someone can't type their own name.
"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer
|
|
|
|
|
I see your concern but this RegEx is to be implemented with a OCR engine which sometimes (unfortunately) adds whitespaces where there are none.
So in my example "Rog er" has actually entered "Roger" and the OCR interprets it as two parts. I want to eliminate those cases, if an actual "Rog er" appears then my customer is informed that hey have to look for "Roger".
"Rog-er / Rog-Er" is accounted for and allowed!
|
|
|
|
|
Norwegian names frequently omit the hyphen: In my school class there were both Per Erik, Hans Petter, Gunn Marit and Marit Irene (all first names, not first+family names). Others had double first names, but used only one of them except at formal occasions. Both my parents had double, un-hyphenated first names (and my father's second first name was used so rarely that I didn't know of it until my mid-teens!).
(In Sweden, you are quite likely to get in contact with people with names Norwegian style.)
The list of false assumptions in the article linked by Richard Dennings is great!
|
|
|
|
|
trønderen wrote: Richard Dennings
Who?
"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer
|
|
|
|
|
Sorry. My mistake. When you open a reply window, the list of messages is by default hidden, and I took the name from my (incorrect) memory of the author of the message no longer in view.
I hope you were not terribly offended. I am sorry for my mistake anyway.
|
|
|
|
|
So the real problem is the OCR solution.
The reality is that you are unlikely to be able to deal with all of the error cases. Your solution might introduce more errors.
So it is a trade off.
If error reduction is considered a significant issue, then perhaps better to look into getting a different OCR solution and use both of them. Then compare the output from both and only apply fixes when there is a difference.
If as I said it is a significant problem then any additional cost should not be a problem. But if the cost is a problem then perhaps it isn't as significant as thought.
|
|
|
|