Click here to Skip to main content
Rate this: bad
good
Please Sign up or sign in to vote.
See more: C# ASP.NET HTML
Hi all
 
I use ITextSharp library to convert html to pdf.
My users use persian language sentence in her/his html files, So this library can't convert persian word.
 
For resolve this and right to left problem i use bellow code:
 
Document document = new Document(PageSize.A4, 80, 50, 30, 65);
            PdfWriter.GetInstance(document, new FileStream(strPDFpath, FileMode.Create));
            document.Open();
            ArrayList objects;
            document.NewPage();
            
            var stream = new StreamReader(strHTMLpath, Encoding.Default).ReadToEnd();
            objects = iTextSharp.text.html.simpleparser.
            HTMLWorker.ParseToList(new StreamReader(strHTMLpath, Encoding.UTF8), styles);            
            BaseFont bf = BaseFont.CreateFont("c:\\windows\\fonts\\Tahoma.ttf",
                                            BaseFont.IDENTITY_H, true);
            for (int k = 0; k < objects.Count; k++)
            {
                PdfPTable table = new PdfPTable(1);
                table.RunDirection = PdfWriter.RUN_DIRECTION_RTL;
                var els = (IElement)objects[k];
                foreach (Chunk el in els.Chunks)
                {
                    #region set persian font
                   iTextSharp.text.Font f2 = new iTextSharp.text.Font(bf, el.Font.Size,
                                                    el.Font.Style, el.Font.Color);
                    el.Font = f2;
                    #endregion set persian font
                    #region Set right to left for persian words
                    PdfPCell cell = new PdfPCell(new Phrase(10, el.Content, el.Font));
                    cell.BorderWidth = 0;
                    table.AddCell(cell);
                    #endregion Set right to left for persian words
                }
                //document.Add((IElement)objects[k]);                
                document.Add(table);
            }
            document.Close();
            Response.Write(strPDFpath);
            Response.ClearContent();
            Response.ClearHeaders();
            Response.AddHeader("Content-Disposition", "attachment; filename=" + strPDFpath);
            Response.ContentType = "application/octet-stream";
            Response.WriteFile(strPDFpath);
            Response.Flush();
            Response.Close();
            if (File.Exists(strPDFpath))
            {
                File.Delete(strPDFpath);
            }
 
My right to left and convert persian words was resolved, but it have another problem.
 
My algorithm can't parse and convert content of table tag that uses in html file.
 
For example i put here an html file that it's content language in persian:
 
<pre lang="xml"><html>
<head>
<meta name="charset" content="utf-8" />
</head>
<body>
 
<p style="text-align: right;"><span style="font-family: tahoma;">سلام<br />
<br />
نامه شماره 1<br />
<br />
<br />
<table cellspacing="1" cellpadding="1" align="center">
    <tbody>
        <tr>
            <td>شماره شناسنامه SHSH</td>
            <td>نام خانوادگيFamily</td>
            <td>نامName</td>
        </tr>
        <tr>
            <td>123456789</td>
            <td>حيدربزرگHeidarbozorg</td>
            <td>سعيدSaeed</td>
        </tr>
        <tr>
            <td>258</td>
            <td>رضاييRezaee</td>
            <td>عليAli</td>
        </tr>
        <tr>
            <td>654987</td>
            <td>علي مردان خانAliMardanKhan</td>
            <td>رضاReza</td>
        </tr>
    </tbody>
</table>
<br />
<br />
مشخصات بالا را دريافت کردم</span></p>
 
</body></html>
 
Now the question is: How to parse html file that have table tag, div and paragraph tag with persian language sentence, and convert it to pdf?
Posted 8-Feb-11 3:07am
kia.sos379
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 1

There can be few items to check up.
 
What is the charset in HTML? Should be something like that:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 
It's not mandatory to have a text-file BOM matching this charset, but do you have it? (Your text editor should have options "Save as UTF-8 files", "Save as Unicode files", see Unicode standard for BOMs. The constructor of the class System.IO.StreamReader constructor has a parameter detectEncodingFromByteOrderMarks; if this is true, the reader looks at the BOM at the beginning of the file.
 
Why do you have this stream with default encoding? Look at your line:
 
var stream = new StreamReader(strHTMLpath, Encoding.Default).ReadToEnd();
 
This could be a mistake.
 
Persian language is covered by Unicode exactly as most other languages, processing Persian usually never cause any problems.
 
—SA
  Permalink  
v2
Comments
Henry Minute at 8-Feb-11 14:36pm
   
@SA I just corrected a typo. You had 'charser' instead of 'charset' :)
SAKryukov at 8-Feb-11 14:38pm
   
Thank you very much, Henry,
--SA
Sergey Alexandrovich Kryukov at 19-Jul-12 11:16am
   
Thank you very much, Henry.
--SA
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 2

Thank you for your response
I change my code to this:
 
var stream = new StreamReader(strHTMLpath, Encoding.UTF8).ReadToEnd();
 
and add this header to my html file:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

 
but it not correct Cry | :((
 
My problem is: Data in the table tag can't parse and convert to pdf
  Permalink  
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 4

Hi
I have your problem exactlly
Could you help me if your problem is solved?
I'm from Iran
  Permalink  
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 3

Any body here?
Please help me
  Permalink  

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
0 OriginalGriff 410
1 Sergey Alexandrovich Kryukov 329
2 Afzaal Ahmad Zeeshan 264
3 BillWoodruff 235
4 CPallini 195
0 OriginalGriff 5,560
1 DamithSL 4,476
2 Maciej Los 3,942
3 Kornfeld Eliyahu Peter 3,480
4 Sergey Alexandrovich Kryukov 3,175


Advertise | Privacy | Mobile
Web04 | 2.8.141216.1 | Last Updated 9 Oct 2011
Copyright © CodeProject, 1999-2014
All Rights Reserved. Terms of Service
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100