Click here to Skip to main content
14,333,875 members
Rate this:
Please Sign up or sign in to vote.
I have an XML as follows:

<break name="article_1-1">
  <page num="1" />Some heading</h1>
  Human name Contributing Writer</bl>
<p>First Paragraph</p>
<p>Second Paragraph</p>
<p>Third Paragraph</p>
  Some value
  Fourth Paragraph with italic values
  <img src="images/img_1-1.jpg" width="1553" height="1050" alt="" />
	Image caption

I want to make it like:

<break name="article_1-1">
<h1><page num="1" />Some heading</h1>
<bl>Human name Contributing Writer</bl>
<p>First Paragraph</p>
<p>Second Paragraph</p>
<p>Third Paragraph</p>
<bq>Some value</bq>
<p>Fourth Paragraph with italic values</p>
<fig><img src="images/img_1-1.jpg" width="1553" height="1050" alt="" /><fc>Image caption</fc><cr>PHOTOGRAPHS BY SOME HUMAN</cr></fig>
<h3>CITY, STATE</h3>

I am removing the indentation at a later stage but my main focus is on bringing the opening and closing XML tags in the same line.

I want a regex for this. I have tried something but I think there is a better way.

Please help.


What I have tried:

string pattern = @"(?:(?:(<\w.>)|(<\w>)|(<\w..>|(<p>)|(\/>)))(\s+)|((<\/(?!(title)|(head)|(break)|(body))\w+>)(\s+)(<\/(?!(title)|(head)|(break)|(body))\w+>))|((<\/fc>)(\s+)(<cr>)))";

string substitution2 = @"$1$2$3$8$14$20$22";
Updated 11-Apr-19 0:51am

1 solution

Rate this:
Please Sign up or sign in to vote.

Solution 1

Your regex have 22 groups, but only 7 are used in substitution, you probably can remove some of them.
Use Debuggex in following links, it show you a nice graph of your RegEx.
As far as I understand it, the code you show do not match the result you want, so it is difficult to know what is what.

Just a few interesting links to help building and debugging RegEx.
Here is a link to RegEx documentation:
perlre -[^]
Here is links to tools to help build RegEx and debug them:
.NET Regex Tester - Regex Storm[^]
Expresso Regular Expression Tool[^]
RegExr: Learn, Build, & Test RegEx[^]
Online regex tester and debugger: PHP, PCRE, Python, Golang and JavaScript[^]
This one show you the RegEx as a nice graph which is really helpful to understand what is doing a RegEx: Debuggex: Online visual regex tester. JavaScript, Python, and PCRE.[^]
This site also show the Regex in a nice graph but can't test what match the RegEx: Regexper[^]
CPallini 11-Apr-19 9:26am
Patrice T 11-Apr-19 9:48am
Thank you

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100