Click here to Skip to main content
15,886,664 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
i am extracting text from PDF it has English and Urdu text , English text extracted as expected but ItextSharp library convert Urdu text into special characters kindly guide me

What I have tried:

PdfReader reader = new PdfReader(pdfpath);

int pageNum = reader.NumberOfPages;


for (int i = 177; i <= pageNum; i++)
{
// this line convert urdu into special character
text = PdfTextExtractor.GetTextFromPage(reader, i, new LocationTextExtractionStrategy());




}
Posted
Comments
Richard MacCutchan 8-Jun-19 4:01am    
No, iTextSharp does not convert anything. You need to use the correct font and character set to display the Urdu characters.
Noman Suleman 8-Jun-19 4:26am    
how i can change font and character ?
Richard MacCutchan 8-Jun-19 5:01am    
Assuming the PDF file displays the text in Urdu, you can get the details from the file. Alternatively you just need to set the correct font and character set in your display code.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900