Click here to Skip to main content
15,894,405 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
Hello!

I have an element in my XML as follows:

<p>This is a paragraph. < b >This is a bold tag.< /b > < b >This is another bold tag.< /b > < i >This is an italic tag.< /i ></p>.


My output should be:

<p>This is a paragraph. This is a bold tag. This is another bold tag. This is an italic tag.</p>


The output I am getting:

<p>This is a paragraph.This is a bold tag.This is another bold tag.This is an italic tag.</p>


All the values inside

tag are concatenating and the space at the end of the tags are missing.

Is there any way to remove the inner nodes and keep the values with the spaces? Or it there a way to prevent this concatenation without spaces?

I need the values to be the way they are in the main XML.

Please help.

Regards
Aman

What I have tried:

C#
XDocument xdoc = XDocument.Load("XMLFile.xml");
.
.
.
xdoc.Element("p").Value;


I have just written the code where I am getting the value. Please tell me what changes do I need to make here for the values as required in the question.
Posted
Updated 29-Jan-19 23:41pm
Comments
BillWoodruff 30-Jan-19 2:33am    
have you considered using RegEx ?
Primo Chalice 30-Jan-19 2:40am    
Hello!

I am using XDocument and trying to avoid RegEx.

Regards
Aman
Richard MacCutchan 30-Jan-19 4:05am    
All the spaces are outside of any tags so they will be ignored by the XML reader. If you want to preserve them then you will need to do manual parsing (e.g. Regex).

1 solution

Assuming your input is valid XML (unlike the example you provided) this can be done with the following code:
XDocument doc = XDocument.Parse(
    "<p>This is a paragraph. <b>This is a bold tag.</b> <b>This is another bold tag.</b> <i>This is an italic tag.</i></p>",
    LoadOptions.PreserveWhitespace
);
		
string valueWithoutXmlElements = doc.Root.Value;
doc.Root.SetValue(valueWithoutXmlElements);
		
string valueWithRootXmlElement = doc.Root.ToString();
		
Console.WriteLine(valueWithRootXmlElement);


Specifically I had to remove spaces inside tag names (so </b> instead of < /b > etc). I also removed the trailing punctuation as XML can't have text outside the root node. If you have to support invalid XML you will not be able to manipulate it with any thing using a proper XML syntax (XmlReader, XDocument, XmlDocument, …), but would have to do regular expressions or similar as already suggested.

The main trick you were missing is the flag:
LoadOptions.PreserveWhitespace
This instruct the XML parser to keep whitespace only text nodes (so spaces between XML element tags - spaces in and around text are always preserved).

Updating the value "with itself", then getting the XML by calling ToString() does look a bit ugly, but will get the result you want without doing any node manipulation on your own.

I use XDocument.Parse to keep the example in a single file - you can also specify the LoadOptions when using XDocument.Load.
 
Share this answer
 
v4
Comments
Maciej Los 6-Feb-19 4:23am    
5ed!

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900