|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Announcements
Chapters
Services
Feature Zones
|
IntroductionUsing Open Office, it is possible to convert many kinds of documents to PDF in batch mode, including Microsoft Office Word, Excel and Powerpoint, plain text files, Open Office documents, JPEG, GIF, and many others. Using UNO (Unified Network Objects), the Open Office component model, a conversion process can connect to a running backroom instance of Open Office, load documents and save them as PDF. This article and accompanying Java program show how to accomplish this for multiple inputs and then optionally merge the results into a single PDF file using GPL Ghostscript. Using the CodeThe program is a command line Java program. It was developed and tested on Windows XP using the NetBeads IDE, and it was built and tested on Fedora Core 7 and Ubuntu 8.04 just using command line Usage: java -jar pdfcm.Main [-m mergeFile] [-d] file1 [file2 [file3...]]
Usage (jarfile): java -jar pdfcm.jar [-m mergeFile] [-d] file1 [file2 [file3...]]
Converts all given input files to PDF. Output filenames have the same base
filenames as the input files and the extension "pdf". PDF files on the
input are not processed.
INPUT OPTIONS
-m mergeFile
Causes converted PDF files and existing PDF (unprocessed) files on the
input to be merged into a single PDF file given by mergeFile as a final
step.
-d
Causes input files to be removed after successful processing. When used
in conjunction with the -m option, all intermediate files as well as any
PDF files on the input are removed after successful processing.
In case of a name collision between an input filename and the merge filename,
in the case that the -d option is given, the collision will be resolved. If
the -d option is not given, then an error will be generated to prevent accidental
overwrite of a file.
Points of InterestThe class that does the PDF conversion is the PDF Conversion using Open OfficeFor the benefit of those readers cutting and pasting as they read, the following packages must be included. import com.sun.star.beans.PropertyValue;
import com.sun.star.uno.XComponentContext;
import com.sun.star.uno.UnoRuntime;
import com.sun.star.lang.XMultiComponentFactory;
import com.sun.star.frame.XComponentLoader;
import com.sun.star.frame.XStorable;
import com.sun.star.io.IOException;
import com.sun.star.util.XCloseable;
import com.sun.star.lang.XComponent;
import ooo.connector.BootstrapSocketConnector;
The conversion method has the signature The classes used to talk to Open Office are: // Declare Open Office components
XComponentContext xContext = null;
XMultiComponentFactory xMCF = null;
XComponentLoader xComponentLoader = null;
XStorable xStorable = null;
XCloseable xCloseable = null;
Object desktop = null;
Object document = null;
Details about each of the above components can be found in the Open Office SDK Documentation found at documentation.openoffice.org. For our purposes, it suffices to say that these objects encapsulate the API we need to talk to Open Office. In the next code snippet, the BootstrapSocketConnector is used to connect to Open Office. Open Office will be running in another process and listening on a socket, and the BootstrapSocketConnector takes care of the connection details. The BootStrapConnector is in a jarfile included with the source code with kind permission of Hol.Sten, a regular contributor in the Open Office forums. There are many excellent resources on connection alternatives in Open Office forums, but unless more than one instance of Open Office is going to be running, an Open Office instance on another server is being used, or if there are any other fancy requirements, nothing more than the bootstrap connector should be needed. To follow the next code snippets, keep in mind that conversion status is kept in two properties on the // Try to get reference to an Open Office process
try {
// Should use OO installation lib/programs directory on your system
String ooLibFolder = ooLibPath;
// Load the Open Office context
xContext = BootstrapSocketConnector.bootstrap(ooLibFolder);
// Load the Open Office object factory
xMCF = xContext.getServiceManager();
// Get a desktop instance
desktop = xMCF.createInstanceWithContext(
"com.sun.star.frame.Desktop", xContext);
// Get a reference to the desktop interface that can load files
xComponentLoader = (XComponentLoader) UnoRuntime.queryInterface(XComponentLoader.class, desktop);
} catch (Exception ex) {
// Open Office error
statusText = "Could not get usable OpenOffice: " + ex.toString();
isError = true;
return false;
}
An exception thrown here is almost certainly due to misinstallation or misconfiguration of Open Office or the connection to Open Office. There's really nothing for the program to do here but report back that it couldn't connect to Open Office, and then Open Office can be reconfigured/reinstalled/rebooted, and you can try again. In order to debug Open Office connection problems, again make sure that the intended user can run Open Office offline and open the intended files without errors or popup registration windows. The next bits of code are in the loop processing input files and also appear in // Set the document opener to not display an OO window
PropertyValue[] loaderValues = new PropertyValue[1];
loaderValues[0] = new PropertyValue();
loaderValues[0].Name = "Hidden";
loaderValues[0].Value = new Boolean(true);
Then we convert the current file name to the URI format required by Open Office. // Convert file path to URL name format and escape spaces
String docURL = "file:///" + inputFiles[i]
.replace(File.separatorChar, '/')
.replace(" ", "%20");
lastDot = docURL.lastIndexOf('.');
Next, the document is opened and an interface is obtained that can store the file using a filter appropriate to the kind of file it is. The program will then save the results to a file of the same name, but with the PDF extension, and append the filename to a converted file list. If the file is one of the "native" types, for example if it is already a PDF file, it just gets added directly to the list. // If it is already PDF, add it to the list of files to "converted" files
if (StringArrayContains(nativeTypes, ext)) {
convertedFiles.add(docURL);
} else {
// Open the document in Open Office
document = xComponentLoader.loadComponentFromURL(
docURL, "_blank", 0, loaderValues);
// Get a reference to the document interface that can store files
xStorable = (XStorable) UnoRuntime.queryInterface(
XStorable.class, document);
// Set the arguments to save to PDF.
PropertyValue[] saveArgs = new PropertyValue[2];
saveArgs[0] = new PropertyValue();
saveArgs[0].Name = "Overwrite";
saveArgs[0].Value = new Boolean(true);
// Choose appropriate output filter
saveArgs[1] = new PropertyValue();
saveArgs[1].Name = "FilterName";
if (StringArrayContains(writerTypes, ext)) {
saveArgs[1].Value = "writer_pdf_Export";
} else if (StringArrayContains(calcTypes, ext)) {
saveArgs[1].Value = "calc_pdf_Export";
} else if (StringArrayContains(drawTypes, ext)) {
saveArgs[1].Value = "draw_pdf_Export";
} else {
buf.append("File " + i + " has unknown extension: " + ext);
isError = true;
continue; // Skip to the next file
}
// The converted file will have the same name with a PDF extension
String sSaveUrl = docURL.substring(0, lastDot) + ".pdf";
// Save the file
xStorable.storeToURL(sSaveUrl, saveArgs);
Various exceptions are handled, but it is always important to close a file when done, so code to do that goes into a finally {
// Make sure the file is closed before going to the next one
if (document != null) {
// Get a reference to the document interface that can close a file
xCloseable = (XCloseable) UnoRuntime.queryInterface(
XCloseable.class, document);
// Try to close it or explicitly dispose it
// See http://doc.services.openoffice.org/wiki/Documentation/
// DevGuide/OfficeDev/Closing_Documents
if (xCloseable != null) {
try {
xCloseable.close(false);
} catch (com.sun.star.util.CloseVetoException ex) {
XComponent xComp = (XComponent) UnoRuntime.queryInterface(
XComponent.class, document);
xComp.dispose();
}
} else {
XComponent xComp = (XComponent) UnoRuntime.queryInterface(
XComponent.class, document);
xComp.dispose();
}
}
document = null; // Javanauts, please pardon my CSharpery
Merging Using GhostscriptThe converted files are kept in the array Then a process is opened with the constructed command line and the program waits for it to finish. Note that the output and error streams need to be handled in order for the child process to finish cleanly. Though there may be more elegant means to do it, for this use case those try {
// Execute the command
Process mProc = Runtime.getRuntime().exec(cmd.toString());
// Voodoo - In order to wait for an external process, you
// have to handle its stdout(getInputStream) and stderr (getErrorStream)
// I'm just going to close them as I'm only interested in if it succeeded or not
InputStream iStr = mProc.getInputStream();
iStr.close();
InputStream eStr = mProc.getErrorStream();
eStr.close();
// Now wait
int exCode = mProc.waitFor();
if ( exCode == 0 ){
buf.append("Merge succeeded: exit code was zero.");
}
else {
isError = true;
buf.append("Merge failed: exit code was " + exCode);
}
} catch (java.io.IOException ex) {
buf.append("Merge failed: " + ex.toString());
isError = true;
statusText = buf.toString();
return false;
} catch (java.lang.InterruptedException ex) {
buf.append("Merge interrupted: " + ex.toString());
isError = true;
statusText = buf.toString();
return false;
}
Note that Ghostscript is a venerable and well maintained program. If the program fails at this step, then it is again almost certainly because the GPL Ghostscript was misinstalled or misconfigured or because an input file was bad. As with Open Office, this can be checked offline by explicitly running GPL Ghostscript in a command shell on the suspect input, so there is no need to reproduce the exact error messages. Other Program DetailsThe Requirements and Compilation DetailsOpen OfficeOpen Office can be obtained from www.openoffice.org. This program has been tested with Open Office versions 2.3.0 on Windows XP, 2.4.0 on Fedora Core 7, and 2.4.0 on Ubuntu 8.04. Previous versions of Open Office have the ability to create PDF, and as long as some flavor of Open Office version 2.X is being used, the connection mechanism used in this code should be supported. However, this code has not been built or tested against previous Open Office versions. (See threads referenced in this thread at the Open Office forums site for more information.) GPL GhostscriptGPL Ghostscript can be obtained at pages.cs.wisc.edu/~ghost. This code has been tested with Ghostscript versions 8.6.1 on Windows, 8.6.2 on Fedora Core 7, and 8.6.2 on Ubuntu 8.04. Previous versions of GPL Ghostscript should work as long as they support the command line options bootstrapconnector.jarThe bootstrapconnector.jar is a simplified mechanism for connecting to an Open Office instance created by Hol.sten at the Open Office developers' forums. It is included here with the source code for convenience, but is ultimately available from this thread. Setting up your Build EnvironmentThe code was compiled and tested using Sun JDK 1.6 on both Fedora Core 7, Ubuntu 8.04, and Windows XP SP2. It will not compile nor work with non Java 1.5 compliant versions of GCJ. Make sure that the ConfigurationThere are several configuration constants that need to be set for your system in the config.properties file.
There are probably many more, but these are the popular ones. DiscussionAs noted above, the merge process fails when ESP Ghostscript 8.15 is used. The symptom is that for several test files, ESP Ghostscript begins to peg the CPU at 100% usage and makes very slow progress creating output, only a few bytes per second on a 2.7 GHz dual core machine. The GPL Ghostscript codebase was merged with ESP Ghostscript 8.15 at GPL version 8.57, and so GPL 8.61 can be regarded as an updated product anyway. Beware of the difference between Windows and Unix style line endings when processing plain text files. Although there are no problems within a single platform, text files created on Windows with the CRLF line endings will try to open as Calc files using Open Office on Unix. In interactive mode, this has the further joyful effect of popping a dialog box to inquire about A few corners were cut in this example. Nonetheless, this program is being used in production for backroom processing. In particular, for general use:
However, the ROI on this code is at the point of diminishing returns. The program itself is just glue and the true usefulness lies in Open Office and Ghostscript. History
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||