Click here to Skip to main content
14,330,512 members
Rate this:
Please Sign up or sign in to vote.
See more:
Please tell me how can i remove this invisible junk characters from xml file using C# code


I want to read some xml files. when i read i found some unwanted characters like symbols presenet in it i need to remove it, can any 1 helps me
Posted
Updated 8-Oct-19 22:40pm
v2
Rate this:
Please Sign up or sign in to vote.

Solution 2

internal static void RectifyXML()
        {
            //the path to the xml file
            string path = @"C:\CodeProject\test.xml";
            //create the xmldocument
            System.Xml.XmlDocument CXML = new System.Xml.XmlDocument();
            //load the xml into the XmlDocument
            CXML.Load(path);
            string correctedXMlString = Regex.Replace(CXML.InnerXml, @"[^\u0000-\u007F]", string.Empty);
            File.Delete(path);
            CXML.LoadXml(correctedXMlString);
            CXML.Save(path);
        }
   
Rate this:
Please Sign up or sign in to vote.

Solution 3

public static string RemoveInvalidXmlChars(string text)
{
   var validChars = text.Where(ch =>System.Xml.XmlConvert.IsXmlChar(ch)).ToArray();
   return new string(validChars);
}
   
Rate this:
Please Sign up or sign in to vote.

Solution 1

var xmlPattern = "[^\u0001-\uD7FF\uE000-\uFFFD\ud800\udc00-\udbff\udfff]";

var newXml = xml.replace(new RegExp(xmlPattern , "g"), "");
   

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)




CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100