OpenOffice 2.4.1 was used during writing of article. At the end of the article there is link to the instructions that will make code work with OpenOffice 3.
I must confess that I’m not a big fan of PDF. Still, it somehow manages to wiggle in almost every project I'm on – clients want to send out documents, Word is bounded to Windows, HTML is lame, PDF it is. Unfortunately, the situation with it and C# haven’t changed much in couple past years - if there were no new, fancy, priced components, I would conclude that it’s almost the same as it was in .NET 1.1 times – it is a pain to create PDFs.
For those of you who have access to components which can convert popular formats to PDF, this article is pretty much useless. But, for those who don’t want or simply can’t shell out over $1000 for a chance to convert other formats to PDF – I hope that this solution will prove as an attractive alternative.
During a talk with my friend Toni Ruža (who is primarily a Python developer) about a way to easily convert some WordML reports to PDF, he pointed me to the headless OpenOffice mode. It seems that it has been around for quite some time, but as it is mainly targeted at Java developers, it is no wonder that there were no big fuss about it in C# groups. Still, it promises much – you install OpenOffice, start it in Service mode, send commands over the API, and get to use any feature it provides. More than anything else, my interest was to load any supported format into OpenOffice and then export it as PDF.
Just to note, in this article, I'll talk about creating PDF from other documents, not from scratch. If you are looking for a way to do that, I'm encouraging you to first take a look at my Generating Word Reports / Documents article. Follow it, and you'll easily create WordML files (like the one used here) from a database or XML.
I was saddened to find out that the headless mode of OpenOffice just minimizes GUI operations, not totally avoiding them. As someone who has a pretty nasty experience with
Word.Application.Open() (using interactive applications such as Word by programmatically mimicking user actions), I started thinking on how to isolate OpenOffice and query it independently of the main application process, thus enabling loose coupling and a more stable environment. The result was a Windows Service which wraps the OpenOffice process, taking care of the security context and the usage, while providing the needed functionality over Remoting (am I a service-oriented freak or what? :)).
Here is a diagram presenting the classes used in the process:
Figure 1 – Class diagram
ConversionToPDF is the main class when it comes to performing useful work. It employs various classes from the
unoidl.com.sun namespace to communicate with OpenOffice, and mimics operations such as opening a file, exporting it as PDF, etc. It also uses the
OfficeController class which is responsible for the lifecycle of OpenOffice’s process – it starts, monitors, and finally kills soffice.exe when not used, to preserve resources.
Receiver is the class that my Windows Service registers for usage over Remoting. It implements the
IReceiver interface (needed functionality), and serves as the bridge between the main applications and OpenOffice.
Finally, I’ve created the
GenericSender class for those not familiar with Remoting. It provides the
Init method that accepts an address on which the Windows Service wrapper listens (by default, it is tcp://localhost:6543/OpenOfficeServiceReceiver) and initializes a proxy receiver (available as a property). From that point forwards, everything is simple as
How to start everything on my machine?
Let’s do this in a step-by-step fashion:
- Download OpenOffice from here, and perform the standard installation. I’ve developed and tested a solution using version 2.4 of OO. If you are setting up everything on an x64 machine, be sure to add the OO program directory (by default: c:\Program Files (x86)\OpenOffice.org 2.4\program) to the PATH environment variable as described in this forum post. If you change the environment variable, be sure to restart the machine in order to commit and reload the changes.
- Download the source code that accompanies this article, and Build Solution using the Release configuration in Visual Studio 2008. When the build is complete, check OpenOfficeService/bin/Release and run svc_inst.bat. After that, you should see the OpenOffice Wrapper Service in the list of services when you go to Control Panel -> Administrative Tools -> Services. Right click on it, select Properties, go to the Log on tab, and check Allow service to interact with desktop.
and replacing it with (note that
LicenseAcceptDate must be later than the OpenOffice installation time):
<prop oor:name="ooSetupInstCompleted" oor:type="xs:boolean">
<prop oor:name="LicenseAcceptDate" oor:type="xs:string">
<prop oor:name="FirstStartWizardCompleted" oor:type="xs:boolean">
This step is taken from here, and I would like to thank Mirko Nasato for his great guide.
Be sure to start any OpenOffice application (Start -> Programs -> OpenOffice.org -> OpenOffice.org Writer, for example), and validate that it loads without any glitches in order to be sure that OO is properly installed and setup.
- Verify the service configuration (next chapter), start the OpenOffice Wrapper Service, and use it to convert a document. If you have downloaded the source code, you can right click on Default.aspx (Test Applications -> PDFWeb project) in Solution Explorer and choose View in Browser... Here is a code excerpt that uses the
GenericSender from OpenOfficeService.Objects.dll to perform the conversion:
protected void GiveMePDFButton_Click(object sender, EventArgs e)
string source = Server.MapPath("~/SomeWordML.xml");
byte wordML = File.ReadAllBytes(source);
byte result =
Figure 2 – Testing page
Believe it or not, that’s it! You now have a functioning PDF converter which can be queried from C#, by Remoting.
During the wrapper implementation, I thought about multi-threading and (hopefully) made calling the
ConvertToPDF thread safe. Conversion requests are queued and processed one by one, so the Open Office Wrapper Service can be used by more than one application and, why not, from multiple machines too (the generic sender for the application running on other machines should then be initialized with tcp://%machineHostingService%:6543/OpenOfficeServiceReceiver).
Currently, there are the following settings for the Open Office Wrapper Service:
Port – It’s the port on which the service will listen for requests. By default, it is 6543.
ProcessName – The name of the OpenOffice process (used when searching the process list to see if OO is alive). When you start OpenOffice in headless mode, it is soffice.bin (instead of soffice.exe).
PathToOpenOffice – Self-explanatory, eh? If you have installed OpenOffice on a path other than the default, you should change this setting (the default path is c:\Program Files\OpenOffice.org 2.4\program\soffice.exe; on x64 machines, add (x86) after Program Files).
SecondsIdleAllowed – When a conversion request is submitted, OpenOfficeController checks if OO is running in the background, and if not, starts soffice.exe in headless mode. By default, if no new request is made in 60 seconds, the OpenOffice process will be killed.
CheckIntervalInSeconds – The interval in which the service evaluates OpenOffice usage (bound to the previous setting). By default, it is 30 seconds.
RequestTimeoutInSeconds – The time in which a response is expected from OpenOffice. If the item stays in the queue for too long or OpenOffice gets too big a file for processing, a Timeout Exception will be thrown. The default wait is 30 seconds.
I would like to mention once again that the Windows Service I wrote is only there to provide a security context and serve as a bridge to OpenOfficeWrapper.dll that implements the main stuff when it comes to communicating with OpenOffice. If you wish, you can directly reference OpenOfficeWrapper.dll and perform PDF conversions in-process, but you must be sure that your application will be run with sufficient security privileges! In my testing, the conversion was successful only if I run the application under an account that belonged to the Administrator group.
Also, you could run into trouble when trying to run OpenOfficeWrapper on x64 versions of Windows. I’ve had tons of trouble trying to get my Web Application to convert PDF by using the OpenOfficeWrapper in process on a Windows 2003 x64 machine. So, if you really do not need to have everything in your application’s process, leave the code that wraps OpenOffice separated, and use it through a Windows Service.
Words of warnings and words of thanks
To me, the documentation of OpenOffice is terrible. OK, I could be another C# "quasi-developer" who finds it easier to look at examples than to crawl through bunch of Wiki pages, diagrams, and forum posts just to get a couple lines of code that opens a document. But, for me - after an absolute champion of useless information, unrelated links, and broken searches in the MSDN - the OO developer portal is one more example of how you do not want your documentation to be organized. From what I’ve seen, OpenOffice is a great product considering the cost ($0), and it is a shame that I can’t say the same for its documentation.
On the other hand, posts of server users on the OOoForum are really helpful; I would specifically like to thank LarsB, tcedi, and DannyB. Most of the conversion code in ConvertToPDF.cs is taken from LarsB’s Commandline PDF convertor; so, thank you man – I hope you’ll continue to post useful snippets.
With this article, I aimed at a simple goal – to provide an easy-to-follow, free, and versatile solution for converting documents to PDF by using C#. I am aware that there are technically more robust solutions, but I do not know any of them that’s free. If you know – please post it in the comments section along with an impression of this article.
- January 8th, 2009 - Added references.
- July 22nd, 2008 - Initial version of the article.