Regular Expressions

Regex search and replace? Tabs ...

14-Aug-23 6:01

trønderen wrote:
are you serious about using a regex to compare entire text lines for being identical

Myself?

No I would not have attempted it with regex at all. I probably would have created a one shot perl script, not for the regex capabilities, but rather because reading files is easier to set up. And running it for iteration testing is easier also.

And I would note that the editor I use does have a fairly decent regex. So the lack of that would not have impacted my decision.

Member 1483514610-Jul-23 9:57

Member 14835146

10-Jul-23 9:57

I have a regexp that works, in my software I search for timestamps with this:

[01]?[0-9]:[0-5][0-9] and a macro replaces the carriage return with a tab and then I proceed from there. But it's very time-consuming when the timestamps go over 10 minutes as then 2 tabs are required (it's a weird thing but that's how it goes).

1. So, to outline, from 0:00 to 9:99 timestamps, one tab is needed afterwards.

2. But from 10:00 and up, i.e., timestamps like 22:46 and 1:35:05 for example, require 2 tabs aftewards.

If it's any help, here is what my script looks like that goes through the entire document and deletes the carriage return and puts one tab after the timestamp (but where the timestamp needs one tab only between 0:00 and 9:99, then 2 tabs for larger timestamp times.

document.selection.Find("[01]?[0-9]:[0-5][0-9]",eeFindNext | eeFindReplaceRegExp);
document.selection.EndOfLine(false,eeLineView);
document.selection.Text="\x09";
document.selection.Delete(1);

Thank you!

Re: Regex search and replace? Tabs ...

jschell14-Aug-23 6:13

How to insert a space at the beginning of a line in a "for next" code loop (.NET regular expressions)

14-Aug-23 6:13

Member 14835146 wrote:
But it's very time-consuming

That is not specific. As in it takes 10 seconds? Or 10 hours?

Regexes meet specific needs but speed is not necessarily one of them. For starters a regex is always interpreted in the process. Even 'compiled' ones still end up in a form that is at best halfway to an actual compiled solution.

And your problem is in fact something that likely could be solved by real code. So that is likely something that would be faster.

But other than that it appears you might be attempting to do a regex solution for an entire file ('document') rather than doing it line by line. If you do in fact have lines which have a fixed number of timestamps then looping might provide a better solution especially if you can anchor the regex.

Piotr Przeklasa4-Jul-23 2:20

Piotr Przeklasa

4-Jul-23 2:20

How to insert a space at the beginning of a line in a "for next" code loop (.NET regular expressions)

I am trying to add spaces at the beginning of lines matched with look arounds

for
line of code
line of code
line of code
line of code
next


and this is the output I want to get


for
  line of code
  line of code
  line of code
  line of code
next


Please help me with .NET regular expressions

modified 5-Jul-23 3:49am.

Re: How to insert a space at the beginning of a line in a "for next" code loop (.NET regular expressions)

Andre Oosthuizen5-Jul-23 1:15

Andre Oosthuizen

5-Jul-23 1:15

You did not supply a lot of information but based on your question you can use the 'Regex.Replace' method to insert a space at the beginning of each line -

.NET

Imports System
Imports System.Text.RegularExpressions

Public Module Program
    Public Sub Main()
        Dim code As String = "For i As Integer = 0 To 9" & vbCrLf & "    Console.WriteLine(i)" & vbCrLf & "Next"

        ' Regex pattern to match the start of each line
        Dim pattern As String = "(^|\n)"

        ' Insert a space at the beginning of each line
        Dim myCode As String = Regex.Replace(code, pattern, " $1")

        ' Output the modified code
        Console.WriteLine(myCode)
    End Sub
End Module

Re: How to insert a space at the beginning of a line in a "for next" code loop (.NET regular expressions)

Richard MacCutchan5-Jul-23 2:21

Re: How to insert a space at the beginning of a line in a "for next" code loop (.NET regular expressions)

5-Jul-23 2:21

Any of the many source code editors can do that with a couple of keystrokes. You need to explain where this text is coming from and what you are trying to do. Is this just a part of a file that you want to change, or something more complicated?

jschell5-Jul-23 5:31

Need help with Regular expression

5-Jul-23 5:31

Piotr Przeklasa wrote:
How to insert a space at the beginning of a line in a "for next" code loop (.NET regular expressions)

Nope. Wrong way to attempt this.

This is a common assumption that regex can handle this but the very nature of regex processing precludes it.

You need to do it with regular code.

The limits of using regex start showing up with recursion problems. For example the following.

for
  line of code
  for
    line of code
  next
  line of code
  line of code
next

Member 1601263423-May-23 9:36

Member 16012634

23-May-23 9:36

I have different file name pattern as follows.

ABCD_ABCDEFGH_PARB_ALLB_CCYYMMDD-HHMMSS.TXT
SDCD_NKEDHEI_ALLIA_PARTN_CCYYMMDD-HHMMSS.TXT
UN_URKSLJIE_EXTRACT_DATA_ALLT_PART_CCYYMMDD-HHMMSS.TXT

And I was trying to use the following regex expression but it doesn't work all types of file names as above ...

^.*_(ALLB|ALLIA|ALLT|AMERI|BCBS|CCH|EASB|EASTP|EAST|SANDH|SANT|SANB|TRIB|TRILL|TRIT|UHC|VAYAH|VAYT|VAYB|WELLC)(?_PARTN|?_PART)_\d{8}-\d{6}"\.TXT$

Can someone help me with this?

Re: Need help with Regular expression

PIEBALDconsult23-May-23 9:39

PIEBALDconsult

23-May-23 9:39

Maybe the " in -- {6}"\.

Unsure what the second ? is doing in -- (?_PARTN|?_PART)

Edit: (?_PARTN|?_PART) should maybe be (_PARTN?)?

modified 23-May-23 16:10pm.

Re: Need help with Regular expression

jschell24-May-23 5:58

Redirection with exclusions RegEx

24-May-23 5:58

Use code tags when you post code on this site.

Why there is a double quote in what you posted?

Member 1598311319-Apr-23 3:07

Member 15983113

19-Apr-23 3:07

I'm using the Redirection plugin by John Godley on our WordPress site, and I want to create a redirect rule that applies to multiple URLs, but also excludes 2 specific urls.

With a little help from Google as well as ChatGPT, I found that the syntax that should work is as follows:

location ~ ^/group/(?!members-only|harp-for-the-lord)(.*)$ {
    return 301 /courses/$1;
}

However, apparently that code is supposed to be added to the NGINX configuration file, but I’ve never SSH’d into a server.

So I thought I'd ask here in case anyone could help me set this up so I can get the same result using the Redirection plugin.

Thoughts?

I need help with a complex regex

Member 127465326-Apr-23 3:57

Member 12746532

6-Apr-23 3:57

Challenge:
I have a file with genealogy information which I would like to extract (in Google Sheets) using regex.

Data:
One cell contains text information. Basically it is four main parts, two of which are optional and can have slightly different formats and contents

First comes always a number followed by a period. (This is the generation number.)
Second comes the name. It consists of one or more first and last names
These two are always there

They can be followed by birth and/or death information
If there is birth information, it always comes directly after the name and starts with "b. ".
It can have a date, and or a location
The date can be preceded by "circa", "before" and "before circa". It is then followed by either a 4 digit year, or more commonly by the month name, date, and year. Example: "March 4, 1888"
After the year might follow a location (free text)

If there is death information, it starts with "d. " and can contain the same information as above, i.e. a date and/or a location.

My best shot is close, but not handling the special cases of "before" etc too well:

=ARRAYFORMULA(IFERROR(SPLIT(REGEXREPLACE(A:A,"^(\d+)\.\s(.+?)(\s(b\.?\s?(\w+\s\d{1,2},\s\d{4})?,?\s?(.*?))?(; d\.\s(\w+\s\d{1,2},\s\d{4})?, \s?(.+)?)?)?$","$1|$2|$3|$4|$5|$6|$7|$8|$9"),"|")))

So the regex part of it is:

^(\d+)\.\s(.+?)(\s(b\.?\s?(\w+\s\d{1,2},\s\d{4})?,?\s?(.*?))?(; d\.\s(\w+\s\d{1,2},\s\d{4})?, \s?(.+)?)?)?$

It works well for entries like this one:

2. Gunnar Helg Andersson b. October 22, 1921, Ormöga No. 3, Bredsättra, Kalmar, Sweden; d. January 1, 2021, Köpingsvik

But not for entries like:

7. Kierstin Danielsdotter b. before circa 1706
9. Lussa Elofsdotter b. circa 1680; d. May 16, 1758, Bredsättra
7. Olof Jönsson b. 1742, Sverige (Sweden); d. September 4, 1811
9. Nils Knutsson b. circa 1676, Istad, Alböke; d. circa April 17, 1729

Terry R 202312-Apr-23 10:53

Terry R 2023

12-Apr-23 10:53

I have tried to decipher what your intent is. I can see you hope to get 9 fields by dividing the original information, but I fail to see where the different parts of the "born" and "death" fields occur.

What I have done thus far is to create a regex which gets the "record" number, the "name", the "birth" info if it exists and the "death" info if it exists. These last 2 fields can be further defined (and divided) if only I knew what your intent was.

Perhaps you can explain what should be in each of the 9 fields (if they exist). Perhaps show a "fully filled" out record as an example, then show what the result should look like.

But here is what I have thus far (this has been formulated on Notepad++):
^(\d+\.\s*)(.+?)(?=(?:b|d)\.)(b\.\s*.+?(?=(?:d\.|$)))?(d\.\s*.+?(?=$))?
To explain it we have:
^(\d+\.\s*) - start of line followed by number(s), a period and possible spaces
(.+?) - gather characters (as few as possible) until...
(?=(?:b|d)\.) - next character should be either a "b" or a "d" followed by a period. The (?: refers to a non-capturing group.
(b\.\s*.+?(?=(?:d\.|$)))? - gather characters until either a "d." follows or end of line.
(d\.\s*.+?(?=$))? - similar to previous line but for the "d." field. This assumes the "d." field will always be last.
Maybe it can give you some more inspiration. At the very least you can see how splitting the problem into smaller chunks may be beneficial. Even if you then have to further divide the "b." and "d." fields in a later step it may still be easier to define them.

Terry

Richard MacCutchan12-Apr-23 20:41

12-Apr-23 20:41

It would be easier to write a string parsing routine of your own.

jschell13-Apr-23 5:29

13-Apr-23 5:29

Richard MacCutchan wrote:
It would be easier to write a string parsing routine of your own.

I strongly agree with this.

It is going to be easier to understand, easier to debug and quite possibly faster to run.

And just is case you think I have a bias I have been using regexes for 40 years extensively (via perl). Which is why I understand both their advantages and disadvantages.

Richard MacCutchan13-Apr-23 5:52

13-Apr-23 5:52

jschell wrote:
is case you think I have a bias

Nothing would be further from my mind, even if you advocated a Regex. I respect everyone's opinions here; after all most people know lots of things that I do not.

jschell14-Apr-23 11:28

14-Apr-23 11:28

Richard MacCutchan wrote:
Nothing would be further from my mind

My post was phrased poorly since that part was not actually intended for you.

It was directed at the OP and/or other readers who might come across my comment.

Member 1274653217-Apr-23 20:50

Member 12746532

17-Apr-23 20:50

Richard MacCutchan wrote:
It would be easier to write a string parsing routine of your own.

Hi Richard,
Thanks for the tip. Although I don't fully understand what you mean with "string parsing routine", I solved the issue by writing a regex for each column needed instead of a "catch-all" regex. Perhaps that is what you meant.

Richard MacCutchan17-Apr-23 22:04

What is the regular expression for this string?

17-Apr-23 22:04

No, my suggestion was to abandon the use of Regex patterns. You can easily split the string into an array of strings separated by spaces. All words before an entry of "b." are parts of the name. All words after the "b." and before "d." or the end of the text, relate to the birth date. All items after "d." relate to the date of death. And apart from anything else it makes your code much clearer.

Member 1595912122-Mar-23 11:29

Member 15959121

22-Mar-23 11:29

I am looking for the regular expression for the following.

Microsoft.Sql/servers/*.*/databases

That can match

Microsoft.Sql/servers/some-text/databases
Microsoft.Sql/servers/some--other-text/databases

etc...

The words

"Microsoft.Sql/servers/"

"/databases"

should be an exact match.. and where

*.*

can match any text.

Re: What is the regular expression for this string?

Richard MacCutchan22-Mar-23 23:12

Re: What is the regular expression for this string?

22-Mar-23 23:12

Try this:

Microsoft.Sql/servers/.*/databases

The .* will match an character followed by any other character, so you may need to modify that if you want to exclude any specific characters (e.g. any that are not valid in path names). You can get yourself a free copy of Expresso Regular Expression Tool[^] which will help develop REs.

[edit]
As suggested by k5054 below, the RE should have anchors so it is restricted to the actual text starting at Microsoft and ending at databases.

^Microsoft.Sql/servers/.*/databases$

[/edit]

modified 23-Mar-23 9:58am.

k505423-Mar-23 3:52

k5054

23-Mar-23 3:52

You might want to add anchors to that, too eg:

^Microsoft.Sql/servers/.*/databases$

so you don't also match my-Microsoft.Sql/servers/some-text/databases.info

Keep Calm and Carry On

Re: What is the regular expression for this string?

Richard MacCutchan23-Mar-23 3:55

Re: What is the regular expression for this string?

23-Mar-23 3:55

Thanks, I forgot about those.

jschell23-Mar-23 6:24