Click here to Skip to main content
Click here to Skip to main content

How to Read, Write and Edit PDF Files and Metadata using LEADTOOLS

By , 7 Jan 2013

Editorial Note

This article is in the Product Showcase section for our sponsors at CodeProject. These reviews are intended to provide you with information on products and services that we consider useful and of value to developers.

Introduction

PDF is arguably one of the most influential and widely used file formats in the world. It is no surprise that software developers go to great lengths to provide solutions that support PDF and its many features.

LEADTOOLS Document and Medical Imaging SDKs can use LEAD's Advanced PDF Plug-in to add robust PDF support to their .NET applications. In addition to loading and saving text-searchable and image-based PDF files, LEADTOOLS can extract and edit text (without requiring OCR), merge and split pages, read and update bookmarks, links, jumps, metadata and much more.

In this article, we will walk through several of the core features included with the new LEADTOOLS Advanced PDF Plug-in.

Key Features in the LEADTOOLS Advanced PDF Plug-in

PDF Document Features

  • Load and view any PDF document
  • Extract text (characters, words and lines), fonts, images, rectangles and hyperlinks with location and size
  • Full Unicode support including Chinese, Japanese, Arabic and Hebrew
  • Parse the document structure by reading PDF bookmarks (Table of Contents) and internal links (jumps)
  • Generate a raster image or thumbnail of any page

PDF File Features

  • Comprehensive multipage support including
    • Merge existing PDF files into a single PDF
    • Split a single PDF into multiple PDF files
    • Extract, delete, insert or replace any page in existing PDF files
  • Read and update the Table of Contents (TOC) of existing PDF files
  • Convert any existing PDF to PDF/A
  • Linearize (optimize for web viewing) any existing PDF
  • Encrypt/decrypt documents and convert to and from any PDF version
  • Read, write and update all PDF metadata such as author, title, subject and keywords
  • Read, write and update the PDF document Table of Contents
  • Convert (Distill) postscript to PDF with optimization for eBook, Screen and Prepress

SDK Products the Advanced PDF Plug-in can be added to

Using the Code

LEADTOOLS Advanced PDF features are built upon two classes within the Leadtools.Pdf namespace: PDFFile and PDFDocument. The PDFFile class is used for modifying metadata, pages and conversion.  PDFDocument handles the parsing and modifying the document object structure of PDF files.

In the example below, we use the PDFFile and PDFDocumentProperties classes to load a PDF and modify its metadata.

string fileName = @"C:\Document.pdf";
// Load it
PDFFile file = new PDFFile(fileName);
// Update the properties
file.DocumentProperties = new PDFDocumentProperties();
file.DocumentProperties.Author = "Me";
file.DocumentProperties.Title = "My Title";
file.DocumentProperties.Subject = "My Subject";
file.DocumentProperties.Creator = "My Application";
file.DocumentProperties.Modified = DateTime.Now;
// Save it
file.SetDocumentProperties(null);

Similarly, the PDFFile class exposes several high level functions for inserting, deleting, and merging pages from PDF files and performing document conversions such as linearization (optimizing for web viewing) and PDF/A. The following example merges three files and converts them to PDF/A.

string fileName1 = @"C:\File1.pdf";
string fileName2 = @"C:\File2.pdf";
string fileName3 = @"C:\File3.pdf";
string finalFileName = @"C:\Final.pdf";

// Load first file
PDFFile file = new PDFFile(fileName1);
// Merge with second and third files, put the result in final
file.MergeWith(new string[] { fileName2, fileName3 }, finalFileName);

// Convert final file to PDF/A
file = new PDFFile(finalFileName);
file.ConvertToPDFA(null);

Probably the most important feature of a PDF is its searchable text, which is where the PDFDocument class is utilized. Using the PDFParsePagesOptions, you can choose what to parse from the PDF including objects, fonts, hyperlinks and more. In the following example, we will load a PDF and display its searchable text in a MessageBox.

string fileName = @"C:\Document.pdf";
// Create a PDF document
PDFDocument document = new PDFDocument(fileName);

// Parse the objects of the first page
document.ParsePages(PDFParsePagesOptions.Objects, 1, 1);

// Get the page
PDFDocumentPage page = document.Pages[0];

// Use a StringBuilder to gather the text
StringBuilder text = new StringBuilder();

// Loop through the objects
foreach (PDFObject obj in page.Objects)
{
   switch (obj.ObjectType)
   {
      case PDFObjectType.Text:
         // Add the text character code
         text.Append(obj.Code);

         // If this is the last object in a line, add a line terminator
         if (obj.TextProperties.IsEndOfLine)
            text.AppendLine();
         break;

      case PDFObjectType.Image:
      case PDFObjectType.Rectangle:
      default:
         // Do nothing
         break;
   }
}

// Show the text
MessageBox.Show(text.ToString());

Conclusion

LEADTOOLS provides developers with access to the world's best performing and most stable imaging libraries in an easy-to-use, high-level programming interface enabling rapid development of business-critical applications.

PDF is only one of the many technologies LEADTOOLS has to offer. For more information on our other products, be sure to visit our home page, download a free fully functioning evaluation SDK, and take advantage of our free technical support during your evaluation.

Download the Full Example

The demo from which the screenshots and code snippets were taken is available within the main LEADTOOLS evaluation. To run this example you will need the following:

Support

Need help getting this sample up and going? Contact our support team for free technical support! For pricing or licensing questions, you can contact our sales team (sales@leadtools.com).

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

LEADTOOLS Support
Help desk / Support LEAD Technologies, Inc.
United States United States
With a rich history of over twenty years, LEAD has established itself as the world's leading provider of software development toolkits for document, medical, multimedia, raster and vector imaging. LEAD's flagship product, LEADTOOLS, holds the top position in every major country throughout the world and boasts a healthy, diverse customer base and strong list of corporate partners including some of the largest and most influential organizations from around the globe. For more information, contact sales@leadtools.com or support@leadtools.com.
Group type: Organisation (No members)


Follow on   Twitter   Google+

Comments and Discussions

 
QuestionWork with sdk 17.0 PinmemberMember 1023326326-Aug-13 7:24 
AnswerRe: Work with sdk 17.0 PingroupLEADTOOLS Support27-Aug-13 5:29 
GeneralRe: Work with sdk 17.0 PinmemberMember 1023326319-Sep-13 9:41 
GeneralRe: Work with sdk 17.0 PingroupLEADTOOLS Support23-Sep-13 21:27 
GeneralRe: Work with sdk 17.0 PinmemberMember 102332632-Oct-13 5:28 
GeneralRe: Work with sdk 17.0 PingroupLEADTOOLS Support3-Oct-13 4:37 
QuestionOpen in reader properties PinmemberTevenyereg16-Jan-13 22:42 
AnswerRe: Open in reader properties PinmemberTevenyereg17-Jan-13 22:21 
GeneralRe: Open in reader properties PingroupLEADTOOLS Support20-Jan-13 21:15 
QuestionPrinting PinmemberBenny S. Tordrup16-Jan-13 2:00 
AnswerRe: Printing PingroupLEADTOOLS Support16-Jan-13 6:22 
QuestionProducer metatag PinmemberKalpana Volety11-Jan-13 7:39 
AnswerRe: Producer metatag PingroupLEADTOOLS Support16-Jan-13 2:12 
Kalpana,
We do support writing the Producer metadata to the PDF. With reference to the code of the article, you can add this code line:
file.DocumentProperties.Producer = "Producer";
 
If you have any questions, feel free to contact us by sending an email to support@leadtools.com
LEADTOOLS Support
LEAD Technologies Inc.
LEADTOOLS Imaging SDK Home Page

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web03 | 2.8.140415.2 | Last Updated 7 Jan 2013
Article Copyright 2012 by LEADTOOLS Support
Everything else Copyright © CodeProject, 1999-2014
Terms of Use
Layout: fixed | fluid