Click here to Skip to main content
15,886,919 members
Please Sign up or sign in to vote.
1.00/5 (3 votes)
I have string input, and I want to use Regex to parse block data:

<L
  <H1 Category1>
  <L
    <L
      <H1 Food1>
      <L
        <H2 Banana>
        <H2 Orange>
      >
    >
    <L
      <H1 Wine1>
      <L
        <H2 Whisky>
        <H2 Vodka>
        <H2 Beer>
      >
    >
  >
>.


I then use Regex to parse, I want to get 2 match/group blocks like below:

<H1 Food1>
<L
   <H2 Banana>
   <H2 Orange>
>
	  	  
<H1 Wine1>
<L
   <H2 Whisky>
   <H2 Vodka>
   <H2 Beer>
>


What I have tried:

I tried get, but something did not match, demo link regex101: build, test, and debug regex[^]

My Regex:

(<H1\s+\w+>)(\s+<L\s+){1}(<H2.+?>\s*>)
Posted
Updated 14-Sep-23 6:14am
v9
Comments
Richard Deeming 18-Aug-23 3:50am    
REPOST
How many times do you need to ask the same question?!
How to get total element tag in C#?[^]
How to parse string by element tag?[^]
headshot9x 20-Aug-23 7:51am    
Hi, it's difference, don't conflict of them.

1 solution

Start by simplifying your regex and see what you get - remove the second and third classes:
RegEx
(<H1\s+\w+>)
If you run that, you will see that you get three groups:
Results
<H1 Category1>
<H1 Food1>
<H1 Wine1>
And that explains why your regex doesn't work: the first phrase captures the outer "Category1" phrase - so when you add the second and third, it assumes a single "L" phrase followed by an "H1" phrase, which it doesn't get.

Add regex to detect and ignore the outer phrase and you'll get there.
 
Share this answer
 
v3
Comments
headshot9x 18-Aug-23 2:41am    
look at my solution for more detail. I can get 2 block, which I need.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900