Create pdf from persian html file by ITextSharp

Question

5.00/5 (3 votes)

See more:

Hi all

I use ITextSharp library to convert html to pdf.
My users use persian language sentence in her/his html files, So this library can't convert persian word.

For resolve this and right to left problem i use bellow code:

Document document = new Document(PageSize.A4, 80, 50, 30, 65);
            PdfWriter.GetInstance(document, new FileStream(strPDFpath, FileMode.Create));
            document.Open();
            ArrayList objects;
            document.NewPage();
            
            var stream = new StreamReader(strHTMLpath, Encoding.Default).ReadToEnd();
            objects = iTextSharp.text.html.simpleparser.
            HTMLWorker.ParseToList(new StreamReader(strHTMLpath, Encoding.UTF8), styles);            
            BaseFont bf = BaseFont.CreateFont("c:\\windows\\fonts\\Tahoma.ttf",
                                            BaseFont.IDENTITY_H, true);
            for (int k = 0; k < objects.Count; k++)
            {
                PdfPTable table = new PdfPTable(1);
                table.RunDirection = PdfWriter.RUN_DIRECTION_RTL;
                var els = (IElement)objects[k];
                foreach (Chunk el in els.Chunks)
                {
                    #region set persian font
                   iTextSharp.text.Font f2 = new iTextSharp.text.Font(bf, el.Font.Size,
                                                    el.Font.Style, el.Font.Color);
                    el.Font = f2;
                    #endregion set persian font
                    #region Set right to left for persian words
                    PdfPCell cell = new PdfPCell(new Phrase(10, el.Content, el.Font));
                    cell.BorderWidth = 0;
                    table.AddCell(cell);
                    #endregion Set right to left for persian words
                }
                //document.Add((IElement)objects[k]);                
                document.Add(table);
            }
            document.Close();
            Response.Write(strPDFpath);
            Response.ClearContent();
            Response.ClearHeaders();
            Response.AddHeader("Content-Disposition", "attachment; filename=" + strPDFpath);
            Response.ContentType = "application/octet-stream";
            Response.WriteFile(strPDFpath);
            Response.Flush();
            Response.Close();
            if (File.Exists(strPDFpath))
            {
                File.Delete(strPDFpath);
            }

My right to left and convert persian words was resolved, but it have another problem.

My algorithm can't parse and convert content of table tag that uses in html file.

For example i put here an html file that it's content language in persian:

<pre lang="xml"><html>
<head>
<meta name="charset" content="utf-8" />
</head>
<body>

<p style="text-align: right;"><span style="font-family: tahoma;">سلام<br />
<br />
نامه شماره 1<br />
<br />
<br />
<table cellspacing="1" cellpadding="1" align="center">
    <tbody>
        <tr>
            <td>شماره شناسنامه SHSH</td>
            <td>نام خانوادگيFamily</td>
            <td>نامName</td>
        </tr>
        <tr>
            <td>123456789</td>
            <td>حيدربزرگHeidarbozorg</td>
            <td>سعيدSaeed</td>
        </tr>
        <tr>
            <td>258</td>
            <td>رضاييRezaee</td>
            <td>عليAli</td>
        </tr>
        <tr>
            <td>654987</td>
            <td>علي مردان خانAliMardanKhan</td>
            <td>رضاReza</td>
        </tr>
    </tbody>
</table>
<br />
<br />
مشخصات بالا را دريافت کردم</span></p>

</body></html>

Now the question is: How to parse html file that have table tag, div and paragraph tag with persian language sentence, and convert it to pdf?

Posted 8-Feb-11 2:07am

kia.sos

Add a Solution

4 solutions

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Sergey Alexandrovich Kryukov · Answer 1 · 2011-02-08T08:20:00

There can be few items to check up.

What is the charset in HTML? Should be something like that:

HTML

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

It's not mandatory to have a text-file BOM matching this charset, but do you have it? (Your text editor should have options "Save as UTF-8 files", "Save as Unicode files", see Unicode standard for BOMs. The constructor of the class System.IO.StreamReader constructor has a parameter detectEncodingFromByteOrderMarks; if this is true, the reader looks at the BOM at the beginning of the file.

Why do you have this stream with default encoding? Look at your line:

C#

var stream = new StreamReader(strHTMLpath, Encoding.Default).ReadToEnd();

This could be a mistake.

Persian language is covered by Unicode exactly as most other languages, processing Persian usually never cause any problems.

—SA

kia.sos · Answer 2 · 2011-02-08T18:44:00

Thank you for your response
I change my code to this:

C#

var stream = new StreamReader(strHTMLpath, Encoding.UTF8).ReadToEnd();

and add this header to my html file:

<br />
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" /><br />

but it not correct :((

My problem is: Data in the table tag can't parse and convert to pdf

tulip_m · Answer 3 · 2011-10-08T19:06:00

Solution 4

Hi I have your problem exactlly
Could you help me if your problem is solved?
I'm from Iran

Posted 8-Oct-11 19:06pm

tulip_m

kia.sos · Answer 4 · 2011-02-15T01:16:00

Solution 3

Any body here?
Please help me

Posted 15-Feb-11 1:16am