Click here to Skip to main content
14,734,554 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Hi. I am trying to create a regex for this string:

My super data plan 10gb mobile 99,00 1 jul - 31 jul 2020 99,00
99,00 is a variable, which could be either 1,00 or 1000000000,00

I need to extract following elements:
[My super data plan 10gb mobile][99,00][1 jul][31 jul][2020][99,00]

For now I have this to start with:
([a-zA-Z0-9_ ]*(\d{1,9},[0-9]*)\s*([0-9]*\s*jul)\s*[-]\s*([0-9]*\s*jul)\s*[0-9]*\s*(\d{2,4},[0-9]*))

But this does not work, because it produces the following result:
[My super data plan 10gb mobile 9][9,00][1 jul][31 jul][2020][99,00]

What I have tried:

I need it to match everything, but untill
99,00
so from my point of view it should match string until another regex, so something like this:
match
[a-zA-Z0-9_ ]*
but until
[0-9]*(,[0-9]*)
, so 10gb (which is number) should match in a string because it is part of a name, but 99,00 should not because it is a price.
Can someone help?
Thx
Posted
Updated 21-Sep-20 21:33pm
v3

Quote:
Can someone help?

The principle of regex is to match variable strings, to explain what you want, you need to give examples inputs and matched strings to let us know what you want, a few input that will be rejected may help too.
Technically, this regex will match what you told us:
My super data plan 10gb mobile

But I fear it is not what you want. Your explanation is not an explanation for people that don't already know what you want.
Show few examples like:
Input: "MyInput"
Match: "My" or not match
Condition: Because of ...


Just a few interesting links to help building and debugging RegEx.
Here is a link to RegEx documentation:
perlre - perldoc.perl.org[^]
Here is links to tools to help build RegEx and debug them:
.NET Regex Tester - Regex Storm[^]
Expresso Regular Expression Tool[^]
RegExr: Learn, Build, & Test RegEx[^]
Online regex tester and debugger: PHP, PCRE, Python, Golang and JavaScript[^]
This one show you the RegEx as a nice graph which is really helpful to understand what is doing a RegEx: Debuggex: Online visual regex tester. JavaScript, Python, and PCRE.[^]
This site also show the Regex in a nice graph but can't test what match the RegEx: Regexper[^]

[Update]
Force a space
([a-zA-Z0-9_ ]*\s(\d{1,9},[0-9]*)\s*([0-9]*\s*jul)\s*[-]\s*([0-9]*\s*jul)\s*[0-9]*\s*(\d{2,4},[0-9]*))
               ^ force space here
   
v3
Comments
csrss 22-Sep-20 2:03am
   
I have updated my question
csrss 22-Sep-20 2:36am
   
First of, thank you. It works.
This is strange, I am testing regex on regex101 website, while this works when I open it on one computer I am now, does not when opened on another laptop - on which I was testing. Magic.
Maciej Los 22-Sep-20 2:38am
   
Alternativelly you can match groups ;) Please, see my answer.
Maciej Los 22-Sep-20 2:36am
   
5ed!
Patrice T 22-Sep-20 2:39am
   
Thank you
[EDIT]
If you would like optionally to get first occurance of 99,00...
For lines:
My super data plan 10gb mobile 99,00 1 jul - 31 jul 2020 99,00
My super data plan 10gb mobile 1 jul - 31 jul 2020 99,00


Try this:
^(?<text>[\w\s]+)(?<num1>[\d]{2,},[\d]{2}){0,}\s(?<fromdate>\d{1,}\s\w{3})\s-\s(?<todate>\d{1,}\s\w{3})\s(?<year>\d{4})\s(?<num2>\d{2,},\d{2})$


Example - version 5[^]

Matches:
1.
Group `text`	0-31	My super data plan 10gb mobile 
Group `num1`	31-36	99,00
Group `fromdate`	37-42	1 jul
Group `todate`	45-51	31 jul
Group `year`	52-56	2020
Group `num2`	57-62	99,00

2.
Group `text`	63-93	My super data plan 10gb mobile
Group `fromdate`	94-99	1 jul
Group `todate`	102-108	31 jul
Group `year`	109-113	2020
Group `num2`	114-119	99,00


As you can see, in both cases group text is fetched correctly (to the first occurance of 99,99 - num1 or to the first occurance of 1 jul - fromdate). num1 is fetched only if exists! {0,} - is used to define the number of occurencies for this group.

Good luck!
   
v2
Comments
Patrice T 22-Sep-20 2:40am
   
+5 too
Maciej Los 22-Sep-20 2:52am
   
Thank you.
csrss 22-Sep-20 2:47am
   
Yep, thanks. Still space is needed of course:
^(?P<text>\w.*)\s+(?P<num1>[\d]{2,},[\d]{2})\s(?P<fromdate>\d{1,}\s\w{3})\s-\s(?P<todate>\d{1,}\s\w{3})\s(?P<year>\d{4})\s(?P<num2>\d{2,},\d{2})$
Maciej Los 22-Sep-20 2:56am
   
Nope. You don't need a space, because i'm using the wildcards to grab a space too ;)
BUT (!) it depends on your needs :)
csrss 22-Sep-20 3:15am
   
Hmmm, I have modified your regex this way:
^(?P<text>\w.*)\s+(?:(?P<num1>[\d]{2,},[\d]{2}))?\s(?P<fromdate>\d{1,}\s\w{3})\s-\s(?P<todate>\d{1,}\s\w{3})\s(?P<year>\d{4})\s(?P<num2>\d{2,},\d{2})$
Because I want to make a <num1> optional. But so far I have no luck in making it work, because if I remove <num1>, then a line is required 2 spaces. If I make one space optional - this is not working. Any idea, how can <num1> can be made optional?
Maciej Los 22-Sep-20 3:19am
   
^(?P<text>\w.*)(?P<fromdate>\d{1,}\s\w{3})\s-\s(?P<todate>\d{1,}\s\w{3})\s(?P<year>\d{4})\s(?P<num2>\d{2,},\d{2})$

;)
csrss 22-Sep-20 3:22am
   
Yes, but in this case, you have removed <num1> from regex
Maciej Los 22-Sep-20 3:36am
   
I thought that "optional" means "don't need this group". If you don't want to capture it, try this:

^(?P<text>\w.*)(?:[\d]{2,},[\d]{2}\s+)(?P<fromdate>\d{1,}\s\w{3})\s-\s(?P<todate>\d{1,}\s\w{3})\s(?P<year>\d{4})\s(?P<num2>\d{2,},\d{2})$

Example[^]
csrss 22-Sep-20 3:44am
   
Nope, doesn't work. If you use this regex and remove first 99,00 - nothing will be matched.
Maciej Los 22-Sep-20 3:48am
   
This works as well. Please, follow the link.
First occurence of 99,00 is used but it's not captured.
csrss 22-Sep-20 3:53am
   
Sorry, maybe I expressed myself in a wrong way. Lets say, there are 2 lines:
My super data plan 10gb mobile 99,00 1 jul - 31 jul 2020 99,00
My super data plan 10gb mobile 1 jul - 31 jul 2020 99,00
I am trying to understand how to match both of them
Maciej Los 22-Sep-20 4:02am
   
In this cas you need to change non-captured group by adding alternative (via |) this way:
(?:[\d]{2,},[\d]{2}\s+|\s+)

which means grab
- nn,nn_
or
- _

;)

Example v4[^]
csrss 22-Sep-20 4:11am
   
Thanks, is it possible to capture <num1> in first case, while matching both lines?
Maciej Los 22-Sep-20 4:15am
   
Sorry, i don't get you... What result you expect to get (for each version of line)?
csrss 22-Sep-20 4:31am
   
Yeah, probably hard to explain over the net. There is your regex:
^(?P<text>\w.*)(?P<num1>[\d]{2,},[\d]{2}|)\s(?P<fromdate>\d{1,}\s\w{3})\s-\s(?P<todate>\d{1,}\s\w{3})\s(?P<year>\d{4})\s(?P<num2>\d{2,},\d{2})$

And those are my 2 lines:
My super data plan 10gb mobile 99,00 1 jul - 31 jul 2020 99,00
My super data plan 10gb mobile 1 jul - 31 jul 2020 99,00

When I put everything in regex101 website, on the right side, I can see the matches. If I have those 2 lines, I will have 2 matches with the above regex, however <num1> will not be captured. If I remove 'or' from <num1> group, it will match first line and capture <num1> but will not match 2nd line. I am trying to match both lines, and capture <num1> in a first case, but omit it in 2nd case, so lets say I would have an empty group or no group - I have no idea how it works with regexes because I have little to no knowledge about regexes.
Maciej Los 22-Sep-20 6:32am
   
I think i understand what you meant... See Example - Version 5[^]
csrss 22-Sep-20 3:45am
   
And if you make it like this:
^(?P<text>\w.*)(?:[\d]{2,},[\d]{2}\s+)?(?P<fromdate>\d{1,}\s\w{3})\s-\s(?P<todate>\d{1,}\s\w{3})\s(?P<year>\d{4})\s(?P<num2>\d{2,},\d{2})$

Then 99,00 will be part of <text>
Maciej Los 22-Sep-20 3:52am
   
Nope. What language do you use (C#, PHP, JavaScript)?
csrss 22-Sep-20 3:54am
   
For now I am working in regex101 website, PCRE (PHP) is selected as Flavor
Jon McKee 22-Sep-20 3:34am
   
+5! Nice regex. I wasn't aware of the ?P<> syntax since I mostly do C# which uses just ?<>. TIL.
Maciej Los 22-Sep-20 3:40am
   
Thank you, Jon. P switch is specific for PHP. As you mentioned, in c# ?<group-name> is enough.
Jon McKee 22-Sep-20 4:07am
   
I've somehow managed to never use PHP, lol. On a side note, wouldn't it be wonderful if everyone followed the regex standard? It's so irritating learning a slightly different syntax for every language.
Maciej Los 22-Sep-20 4:13am
   
Let's say: there's standard for each language ;)
Jon McKee 22-Sep-20 4:16am
   
*IEEE cries in a corner*

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)




CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900