Click here to Skip to main content
15,913,213 members
Articles / Web Development / HTML

Converting HTML to PDF using C# and Magic!

Rate me:
Please Sign up or sign in to vote.
4.66/5 (20 votes)
22 May 2017MIT3 min read 15.7K   20   2
Converting HTML to PDF using C# and magic

Everyone at some point needs to convert an HTML page into a PDF for some reason or another. For this, I've always used a component which was cheap at the time (and is far from it now). This component has worked well for the last 8(ish) years, most ish in the last while because it doesn't deal well with https sites. When contacting the vendor, they said hey but our latest version which can be expected I guess because it's 8 years down the line but then came the it's ALL OF THE $ Smile. The last issue is that the component I used doesn't work in Azure. That led me to some investigation and then ultimately this post.

What's the Problem?

We have 3 problems:

  1. We need to convert HTML 2 PDF
  2. We do not want to break the bank as there potentially aren't high returns on its usage
  3. The last one is the specific component I used doesn't work in Azure web apps because of the way permissions work there

Easy to solve right, well if you are reading this because you needed this component then yes, but if you were me Googling with Bing for 40 minutes, then writing some code that makes me all kind of dirty then that yes becomes a ja, kind of but it works so here it is Open-mouthed smile.

What's the Primary Component?

The component being used for all the magic is wkhtmltopdf which is an open source project licensed under the LGPLv3 license. The component is command line based which renders HTML into PDF using the Qt WebKit rendering engine. It runs entirely "headless" and does not require a display or display service.

With little work, you can, for example, generate a PDF of Google's home page with the script below:

wkhtmltopdf.exe "https://google.com" "google.pdf"

That will go off and think for a couple seconds and return you a PDF. Smile

image

Super simple, right? There is also a bunch of arguments that you can pass into the EXE which you can find on the projects site from the basics like changing orientation to passing in a username and password for authenticated pages that you want to generate PDFs for.

How Are We Using It?

As mentioned above, this is a command line utility so we are basically just wrapping it in a standard execution of cmd from C# and then waiting for the pdf to generate and returning it to the caller.

C#
using System;
using System.IO;
using System.Threading.Tasks;
using System.Web;

namespace Html2Pdf.Lib
{
    public static class TheMagic
    {
        public static async Task<byte[]> Go(string url, int timeoutInSeconds = 30, 
                                   string pathToExe = null)
        {
            if (pathToExe == null)
            {
                pathToExe = $@"{Path.GetDirectoryName
                            (typeof(TheMagic).Assembly.Location)}\wkhtmltopdf.exe";
                if (!File.Exists(pathToExe))
                {
                    pathToExe = HttpContext.Current.Server.MapPath("~/bin/wkhtmltopdf.exe");
                }
            }

            var timeout = DateTime.UtcNow.AddSeconds(timeoutInSeconds);
            var savePdfTo = Path.GetTempFileName();
            var t = Task.Run(() => GeneratePdf(url, savePdfTo, pathToExe));
            while (!t.IsCompleted)
            {
                if (timeout < DateTime.UtcNow)
                {
                    break;
                }
                await Task.Delay(250);
            }
            while (!File.Exists(savePdfTo))
            {
                if (timeout < DateTime.UtcNow)
                {
                    break;
                }
                await Task.Delay(250);
            }
            while (File.GetLastWriteTimeUtc(savePdfTo).AddSeconds(2) >= DateTime.UtcNow)
            {
                if (timeout < DateTime.UtcNow)
                {
                    break;
                }
                await Task.Delay(250);
            }
            var bytes = File.ReadAllBytes(savePdfTo);
            try
            {
                File.Delete(savePdfTo);
            }
            catch
            {
                // oh well we tried
            }
            return bytes;
        }

        private static void GeneratePdf(string url, string targetLocation, string pathToExe)
        {
            ExecuteCommand(pathToExe, $@"""{url}"" 
            ""{targetLocation}""");
        }

        public static string ExecuteCommand(string pathToExe, string args)
        {
            try
            {
                System.Diagnostics.ProcessStartInfo procStartInfo = 
                            new System.Diagnostics.ProcessStartInfo(pathToExe, args);
                procStartInfo.UseShellExecute = false;
                procStartInfo.CreateNoWindow = true;
                System.Diagnostics.Process proc = new System.Diagnostics.Process();
                procStartInfo.RedirectStandardOutput = true;
                proc.StartInfo = procStartInfo;
                proc.Start();
                proc.WaitForExit();
            }
            catch
            {
            }
            return null;
        }
    }
}

And you call this code like the below example:

C#
public async Task<ActionResult> DownloadHomePageAsPdf()
{
    var bytes = await TheMagic.Go($"{Request.Url.GetLeftPart(UriPartial.Authority)}/Home");

    return File(bytes, "application/pdf");
}

I'm not sure how stable this code is to run in production but in the basic testing I've done, it gets the job done and without any issues so far.

image

So we've solved all of our problems.

  1. We are converting HTML 2 PDF
  2. It's free, can't get cheaper than that (unless you want to pay me for it Smile with tongue out)
  3. And although not shown in the blog (because I could have got the screenshots from anywhere), this code works in Azure.

Conclusion

Anything is possible. The interesting thing about this solution is that you see the guts of what looks like a messy solution but don't realize that under the covers if you are using a 3rd party component, they are probably doing something similar (or worst) but the key takeaway is that it works Open-mouthed smile.

The code used for this sample will be available on GitHub shortly if you want to download it and see it working for yourself.

Do you know of any cool converting components? Why not share them below in the comments. Smile

Happy converting!

License

This article, along with any associated source code and files, is licensed under The MIT License


Written By
Architect SSW
South Africa South Africa

Comments and Discussions

 
Praisenice Pin
prtconf14-Jun-17 22:09
prtconf14-Jun-17 22:09 
QuestionUse MemoryStream Pin
jnegron2672-Jun-17 6:22
jnegron2672-Jun-17 6:22 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.