I have an XML as follows:
<break name="article_1-1">
<h1>
<page num="1" />Some heading</h1>
<bl>
Human name Contributing Writer</bl>
<h3>
OPINION
</h3>
<p>First Paragraph</p>
<p>Second Paragraph</p>
<p>Third Paragraph</p>
<bq>
Some value
</bq>
<p>
Fourth Paragraph with italic values
</p>
<fig>
<img src="images/img_1-1.jpg" width="1553" height="1050" alt="" />
<fc>
Image caption
</fc>
<cr>PHOTOGRAPHS BY SOME HUMAN</cr>
</fig>
<h3>
CITY, STATE
</h3>
</break>
I want to make it like:
<break name="article_1-1">
<h1><page num="1" />Some heading</h1>
<bl>Human name Contributing Writer</bl>
<h3>OPINION</h3>
<p>First Paragraph</p>
<p>Second Paragraph</p>
<p>Third Paragraph</p>
<bq>Some value</bq>
<p>Fourth Paragraph with italic values</p>
<fig><img src="images/img_1-1.jpg" width="1553" height="1050" alt="" /><fc>Image caption</fc><cr>PHOTOGRAPHS BY SOME HUMAN</cr></fig>
<h3>CITY, STATE</h3>
</break>
I am removing the indentation at a later stage but my main focus is on bringing the opening and closing XML tags in the same line.
I want a regex for this. I have tried something but I think there is a better way.
Please help.
Regards
What I have tried:
string pattern = @"(?:(?:(<\w.>)|(<\w>)|(<\w..>|(<p>)|(\/>)))(\s+)|((<\/(?!(title)|(head)|(break)|(body))\w+>)(\s+)(<\/(?!(title)|(head)|(break)|(body))\w+>))|((<\/fc>)(\s+)(<cr>)))";
string substitution2 = @"$1$2$3$8$14$20$22";