How to do xHTML to xHTML Transformations for PDF Conversion Purposes in ASP.NET.





2.00/5 (1 vote)
The article discusses dynamic xHtml to xHtml XSL transformations for PDF output purposes.
Introduction
Converting HTML pages to different formats and especially to PDF has become a widely spread routine for web developers. The process itself is plenty straightforward, because there are quite a lot of PDF development libraries and services around the web. However, one day you may need not just to make a PDF copy of the page, but to automatically add some modifications to the result PDF output (for example, you may want to access SVG data on the page). In this article, I’m going to show a simple example of accomplishing this task in ASP.NET using some .NET and XSL tips and a PD4ML PDF library.
Step 1: Searching for xHTML Markup
ASP.NET is great for easily creating complicated pages. However, all these controls and other stuff have very little in common with result xHTML markup, which is rendered and sent to the client. That’s why the first thing we are going to do is to somehow bring it to the light. The markup is created with the help of “Render
” method of the page’s life cycle, so we need to override this method.
protected override void Render(HtmlTextWriter output)
{
//Creating String and Html writers to copy the created HTML markup
StringWriter writer = new StringWriter();
HtmlTextWriter htmlWriter = new HtmlTextWriter(writer);
//Creating HTML markup with the help of our "fake" HTmlTextWriter
base.Render(htmlWriter);
//Coping the markup to the string and saving it to the disk
string htmlMarkup = writer.ToString();
StreamWriter XMLwriter = new StreamWriter(Server.MapPath("Htmloutput.xml"));
XMLwriter.Write(htmlMarkup);
XMLwriter.Close();
//Creating actual HTML markup for display
output.Write(htmlMarkup);
}
Step 2: Getting Ready for XSL Transformation
Now we need to prepare our XSLT file. ASP.NET produces a valid xHTML markup, hence we just need to change it according to our needs, but there are still some problems you may face:
- First, don’t forget, that xHTML markup uses a default
xmlns=http://www.w3.org/1999/xhtml
namespace, so we need to create some prefix in our XSLT file, to reach the nodes. That’s why we addxmlns:xhtml=http://www.w3.org/1999/xhtml string
to our XSLT file and add xhtml to ”exclude-result-prefixes” to remove it from the result document. - Second, now we are able to do transformations, but there is another problem: lots of
xmlns=""
nodes in the output document. To get rid of them, addxmlns=http://www.w3.org/1999/xhtml
to the XSLT file namespace declaration. - Third, HTML pages contain plain text, which is not allowed in XML, therefore it’s not processed by XSLT. To get rid of text nodes, put
<xsl:template match="xhtml:body//text()">
template in your XSLT style sheet.<?xml version="1.0" encoding="utf-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:msxsl="urn:schemas-microsoft-com:xslt" xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns="http://www.w3.org/1999/xhtml" exclude-result-prefixes="msxsl xhtml"> <!--the rest of the xsl file --!>
Step 3: Creating PDF File
That is, where we come to our final goal. All we need to do is to perform XSL transformation and create PDF file. I‘ll use – PD4ML HTML to PDF converting library, because it’s possible to use it in different programming languages, like Java, PHP, Ruby, etc. I’m going to use MemoryStream
because I don’t want to save any intermediate data to hard drive.
protected void MakePDFButton_Click(object sender, EventArgs e)
{
//Doing XSL transformation
string XSLTFile = Server.MapPath("XSLTFile.xslt");
string XMLFile = Server.MapPath("HTMLoutput.xml");
// Allowing DTD in our xHTML markup
XmlReaderSettings settings = new XmlReaderSettings();
settings.ProhibitDtd = false;
XmlReader reader = XmlReader.Create(XMLFile, settings);
//Transforming the initial HTML markup and outputting it to MemoryStream
//object instance for further PDF conversion
XslCompiledTransform XSLTransform = new XslCompiledTransform();
XSLTransform.Load(XSLTFile);
Stream memoryStream = new MemoryStream();
XSLTransform.Transform(reader, null, memoryStream);
//Flushing the stream and positioning the cursor at the beginning
//of the data in the stream.
memoryStream.Flush();
memoryStream.Position=0;
reader.Close();
//Showing the markup on the page
StreamReader streamReader=new StreamReader(memoryStream);
string output=streamReader.ReadToEnd();
HTMLoutput.Text = Server.HtmlEncode(output);
//Converting result HTML page to PDF
PD4ML PDFcreator = new PD4ML();
PDFcreator.PageSize = PD4Constants.A4;
PDFcreator.DocumentTitle = "The result PDF file";
string path=Server.MapPath("Output.pdf");
StreamWriter streamWriter = new StreamWriter(path);
memoryStream.Position = 0;
PDFcreator.render(memoryStream as MemoryStream, streamWriter);
//Closing all the streams
streamReader.Close();
streamWriter.Close();
}
Conclusion
That's it! Now let's come up with a short summary:
- Use override “
Render
” method to manipulate and obtain xHTML markup. - Use custom XML namespace prefix to reach non-prefixed xHTML nodes.
- Use little xslt “
xmlns=http://www.w3.org/1999/xhtml
” hack to get rid of numerousxmlns=""
nodes. - Use
<xsl:template match="xhtml:body//text()">
if you need to get rid of plain text, which isn't wrapped by any element.
I hope that the combination of a valid xHTML markup, which is taken “for granted” by Visual Studio developers and several easy tips, which were described above will give you countless possibilities of manipulating your document's data.
History
- 1st March, 2011: Initial post