Click here to Skip to main content
Click here to Skip to main content

How to Read, Write and Edit PDF Files and Metadata using LEADTOOLS

By , 7 Jan 2013
 

Editorial Note

This article is in the Product Showcase section for our sponsors at CodeProject. These reviews are intended to provide you with information on products and services that we consider useful and of value to developers.

Introduction

PDF is arguably one of the most influential and widely used file formats in the world. It is no surprise that software developers go to great lengths to provide solutions that support PDF and its many features.

LEADTOOLS Document and Medical Imaging SDKs can use LEAD's Advanced PDF Plug-in to add robust PDF support to their .NET applications. In addition to loading and saving text-searchable and image-based PDF files, LEADTOOLS can extract and edit text (without requiring OCR), merge and split pages, read and update bookmarks, links, jumps, metadata and much more.

In this article, we will walk through several of the core features included with the new LEADTOOLS Advanced PDF Plug-in.

Key Features in the LEADTOOLS Advanced PDF Plug-in

PDF Document Features

  • Load and view any PDF document
  • Extract text (characters, words and lines), fonts, images, rectangles and hyperlinks with location and size
  • Full Unicode support including Chinese, Japanese, Arabic and Hebrew
  • Parse the document structure by reading PDF bookmarks (Table of Contents) and internal links (jumps)
  • Generate a raster image or thumbnail of any page

PDF File Features

  • Comprehensive multipage support including
    • Merge existing PDF files into a single PDF
    • Split a single PDF into multiple PDF files
    • Extract, delete, insert or replace any page in existing PDF files
  • Read and update the Table of Contents (TOC) of existing PDF files
  • Convert any existing PDF to PDF/A
  • Linearize (optimize for web viewing) any existing PDF
  • Encrypt/decrypt documents and convert to and from any PDF version
  • Read, write and update all PDF metadata such as author, title, subject and keywords
  • Read, write and update the PDF document Table of Contents
  • Convert (Distill) postscript to PDF with optimization for eBook, Screen and Prepress

SDK Products the Advanced PDF Plug-in can be added to

Using the Code

LEADTOOLS Advanced PDF features are built upon two classes within the Leadtools.Pdf namespace: PDFFile and PDFDocument. The PDFFile class is used for modifying metadata, pages and conversion.  PDFDocument handles the parsing and modifying the document object structure of PDF files.

In the example below, we use the PDFFile and PDFDocumentProperties classes to load a PDF and modify its metadata.

string fileName = @"C:\Document.pdf";
// Load it
PDFFile file = new PDFFile(fileName);
// Update the properties
file.DocumentProperties = new PDFDocumentProperties();
file.DocumentProperties.Author = "Me";
file.DocumentProperties.Title = "My Title";
file.DocumentProperties.Subject = "My Subject";
file.DocumentProperties.Creator = "My Application";
file.DocumentProperties.Modified = DateTime.Now;
// Save it
file.SetDocumentProperties(null);

Similarly, the PDFFile class exposes several high level functions for inserting, deleting, and merging pages from PDF files and performing document conversions such as linearization (optimizing for web viewing) and PDF/A. The following example merges three files and converts them to PDF/A.

string fileName1 = @"C:\File1.pdf";
string fileName2 = @"C:\File2.pdf";
string fileName3 = @"C:\File3.pdf";
string finalFileName = @"C:\Final.pdf";

// Load first file
PDFFile file = new PDFFile(fileName1);
// Merge with second and third files, put the result in final
file.MergeWith(new string[] { fileName2, fileName3 }, finalFileName);

// Convert final file to PDF/A
file = new PDFFile(finalFileName);
file.ConvertToPDFA(null);

Probably the most important feature of a PDF is its searchable text, which is where the PDFDocument class is utilized. Using the PDFParsePagesOptions, you can choose what to parse from the PDF including objects, fonts, hyperlinks and more. In the following example, we will load a PDF and display its searchable text in a MessageBox.

string fileName = @"C:\Document.pdf";
// Create a PDF document
PDFDocument document = new PDFDocument(fileName);

// Parse the objects of the first page
document.ParsePages(PDFParsePagesOptions.Objects, 1, 1);

// Get the page
PDFDocumentPage page = document.Pages[0];

// Use a StringBuilder to gather the text
StringBuilder text = new StringBuilder();

// Loop through the objects
foreach (PDFObject obj in page.Objects)
{
   switch (obj.ObjectType)
   {
      case PDFObjectType.Text:
         // Add the text character code
         text.Append(obj.Code);

         // If this is the last object in a line, add a line terminator
         if (obj.TextProperties.IsEndOfLine)
            text.AppendLine();
         break;

      case PDFObjectType.Image:
      case PDFObjectType.Rectangle:
      default:
         // Do nothing
         break;
   }
}

// Show the text
MessageBox.Show(text.ToString());

Conclusion

LEADTOOLS provides developers with access to the world's best performing and most stable imaging libraries in an easy-to-use, high-level programming interface enabling rapid development of business-critical applications.

PDF is only one of the many technologies LEADTOOLS has to offer. For more information on our other products, be sure to visit our home page, download a free fully functioning evaluation SDK, and take advantage of our free technical support during your evaluation.

Download the Full Example

The demo from which the screenshots and code snippets were taken is available within the main LEADTOOLS evaluation. To run this example you will need the following:

Support

Need help getting this sample up and going? Contact our support team for free technical support! For pricing or licensing questions, you can contact our sales team (sales@leadtools.com).

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

LEADTOOLS Support
Help desk / Support LEAD Technologies, Inc.
United States United States
Member
Organisation (No members)

With a rich history of over twenty years, LEAD has established itself as the world's leading provider of software development toolkits for document, medical, multimedia, raster and vector imaging. LEAD's flagship product, LEADTOOLS, holds the top position in every major country throughout the world and boasts a healthy, diverse customer base and strong list of corporate partners including some of the largest and most influential organizations from around the globe. For more information, contact sales@leadtools.com or support@leadtools.com.

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
QuestionOpen in reader propertiesmemberTevenyereg16 Jan '13 - 22:42 
How you change the way the PDF gets open in a reader?
Like "Display one page at a time with Sidebar -> Thumbnails"
 
Are there properties for it ?
AnswerRe: Open in reader propertiesmemberTevenyereg17 Jan '13 - 22:21 
I think have found the answer to my question.
According to "DOCWRTPDFOPTIONS" it is not implemented. Frown | :(
 
Is a shame even "DEBENU" can do it http://www.debenu.com/docs/pdf_library_reference/SetPageMode.php[^]
GeneralRe: Open in reader propertiesgroupLEADTOOLS Support20 Jan '13 - 21:15 
Tevenyereg,
We have support for this feature in one of our products. You can see the options implemented in this page:
http://www.eprintdriver.com/help/v5.0/Professional-Edition/prnDlg/PDFOpenTab.htm
 
But like you found out, we don't have it in the PDF SDK module
 
We have a pending feature request to add the similar features to the PDF module in the toolkit. If you'd like to be notified when the feature is added, please email support@leadtools.com and mention this thread.
LEADTOOLS Support
LEAD Technologies Inc.
LEADTOOLS Imaging SDK Home Page

QuestionPrintingmemberBenny S. Tordrup16 Jan '13 - 2:00 
Hi
 
Does this product allow direct print of an existing PDF file?
AnswerRe: PrintinggroupLEADTOOLS Support16 Jan '13 - 6:22 
What exactly do you mean by "direct print"? If you mean printing and saving the result as PDF, you can do this using LEADTOOLS Virtual printer with ShellExecute as shown in the following forum post:
http://support.leadtools.com/CS/forums/29238/ShowPost.aspx
 
However, if you mean that you want to print a PDF file to a system printer, you can load the PDF file as raster image using RasterCodecs.Load() method. And then you can print it using Leadtools.WinForms.RasterImagePrinter Class.
For more information, see the following online help topics:
http://www.leadtools.com/help/leadtools/v175/dh/co/leadtools.codecs~leadtools.codecs.rastercodecs~load.html
 
http://www.leadtools.com/help/leadtools/v175/dh/wi/leadtools.winforms~leadtools.winforms.rasterimageprinter.html
 
If you have any questions, feel free to contact us by sending an email to support@leadtools.com.
LEADTOOLS Support
LEAD Technologies Inc.
LEADTOOLS Imaging SDK Home Page

QuestionProducer metatagmemberKalpana Volety11 Jan '13 - 7:39 
There is a metadata item called "Producer". Do you know if it is possible to explicitly define it?
 
Thanks a lot in advance,
Kalpana Volety
PDF Tools Online
AnswerRe: Producer metataggroupLEADTOOLS Support16 Jan '13 - 2:12 
Kalpana,
We do support writing the Producer metadata to the PDF. With reference to the code of the article, you can add this code line:
file.DocumentProperties.Producer = "Producer";
 
If you have any questions, feel free to contact us by sending an email to support@leadtools.com
LEADTOOLS Support
LEAD Technologies Inc.
LEADTOOLS Imaging SDK Home Page

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web03 | 2.6.130523.1 | Last Updated 7 Jan 2013
Article Copyright 2012 by LEADTOOLS Support
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid