Click here to Skip to main content
15,899,026 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
I am writing a program to extract text from PDF. The extraction is working fine and I have to convert a few symbols to their respective hex codes before saving the file as XML.

The issue is that out of all the symbols, when I am saving "■" into an XML file, it is getting converted to "¦".

I am then manually replacing it before saving it to the desired file.

Please help.

What I have tried:

I just need a basic idea as to how can I get rid of this.
Posted
Updated 12-Apr-19 23:50pm
v2
Comments
[no name] 13-Apr-19 7:58am    
Check your xml encoding. Some reading here: XML Encoding[^]

1 solution

Displayed characters will vary according to which character set you use to display them. Your program needs to look at the actual character code value and adjust it as necessary to match the corresponding code in the set you are using. Take a look at the Character Map application in the Windows Accessories section for further details of the values used.
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900