Click here to Skip to main content
15,860,859 members
Articles / Programming Languages / C#
Article

How to Read, Write and Edit PDF Files and Metadata using LEADTOOLS

7 Jan 2013CPOL3 min read 69.4K   16   11
In this article, we will walk through several of the core features included with the new LEADTOOLS Advanced PDF Plug-in

This article is in the Product Showcase section for our sponsors at CodeProject. These articles are intended to provide you with information on products and services that we consider useful and of value to developers.

Introduction

PDF is arguably one of the most influential and widely used file formats in the world. It is no surprise that software developers go to great lengths to provide solutions that support PDF and its many features.

LEADTOOLS Document and Medical Imaging SDKs can use LEAD's Advanced PDF Plug-in to add robust PDF support to their .NET applications. In addition to loading and saving text-searchable and image-based PDF files, LEADTOOLS can extract and edit text (without requiring OCR), merge and split pages, read and update bookmarks, links, jumps, metadata and much more.

In this article, we will walk through several of the core features included with the new LEADTOOLS Advanced PDF Plug-in.

Key Features in the LEADTOOLS Advanced PDF Plug-in

PDF Document Features

  • Load and view any PDF document
  • Extract text (characters, words and lines), fonts, images, rectangles and hyperlinks with location and size
  • Full Unicode support including Chinese, Japanese, Arabic and Hebrew
  • Parse the document structure by reading PDF bookmarks (Table of Contents) and internal links (jumps)
  • Generate a raster image or thumbnail of any page

PDF File Features

  • Comprehensive multipage support including
    • Merge existing PDF files into a single PDF
    • Split a single PDF into multiple PDF files
    • Extract, delete, insert or replace any page in existing PDF files
  • Read and update the Table of Contents (TOC) of existing PDF files
  • Convert any existing PDF to PDF/A
  • Linearize (optimize for web viewing) any existing PDF
  • Encrypt/decrypt documents and convert to and from any PDF version
  • Read, write and update all PDF metadata such as author, title, subject and keywords
  • Read, write and update the PDF document Table of Contents
  • Convert (Distill) postscript to PDF with optimization for eBook, Screen and Prepress

SDK Products the Advanced PDF Plug-in can be added to

Using the Code

LEADTOOLS Advanced PDF features are built upon two classes within the Leadtools.Pdf namespace: PDFFile and PDFDocument. The PDFFile class is used for modifying metadata, pages and conversion.  PDFDocument handles the parsing and modifying the document object structure of PDF files.

In the example below, we use the PDFFile and PDFDocumentProperties classes to load a PDF and modify its metadata.

C#
string fileName = @"C:\Document.pdf";
// Load it
PDFFile file = new PDFFile(fileName);
// Update the properties
file.DocumentProperties = new PDFDocumentProperties();
file.DocumentProperties.Author = "Me";
file.DocumentProperties.Title = "My Title";
file.DocumentProperties.Subject = "My Subject";
file.DocumentProperties.Creator = "My Application";
file.DocumentProperties.Modified = DateTime.Now;
// Save it
file.SetDocumentProperties(null);

Similarly, the PDFFile class exposes several high level functions for inserting, deleting, and merging pages from PDF files and performing document conversions such as linearization (optimizing for web viewing) and PDF/A. The following example merges three files and converts them to PDF/A.

C#
string fileName1 = @"C:\File1.pdf";
string fileName2 = @"C:\File2.pdf";
string fileName3 = @"C:\File3.pdf";
string finalFileName = @"C:\Final.pdf";

// Load first file
PDFFile file = new PDFFile(fileName1);
// Merge with second and third files, put the result in final
file.MergeWith(new string[] { fileName2, fileName3 }, finalFileName);

// Convert final file to PDF/A
file = new PDFFile(finalFileName);
file.ConvertToPDFA(null);

Probably the most important feature of a PDF is its searchable text, which is where the PDFDocument class is utilized. Using the PDFParsePagesOptions, you can choose what to parse from the PDF including objects, fonts, hyperlinks and more. In the following example, we will load a PDF and display its searchable text in a MessageBox.

C#
string fileName = @"C:\Document.pdf";
// Create a PDF document
PDFDocument document = new PDFDocument(fileName);

// Parse the objects of the first page
document.ParsePages(PDFParsePagesOptions.Objects, 1, 1);

// Get the page
PDFDocumentPage page = document.Pages[0];

// Use a StringBuilder to gather the text
StringBuilder text = new StringBuilder();

// Loop through the objects
foreach (PDFObject obj in page.Objects)
{
   switch (obj.ObjectType)
   {
      case PDFObjectType.Text:
         // Add the text character code
         text.Append(obj.Code);

         // If this is the last object in a line, add a line terminator
         if (obj.TextProperties.IsEndOfLine)
            text.AppendLine();
         break;

      case PDFObjectType.Image:
      case PDFObjectType.Rectangle:
      default:
         // Do nothing
         break;
   }
}

// Show the text
MessageBox.Show(text.ToString());

Conclusion

LEADTOOLS provides developers with access to the world's best performing and most stable imaging libraries in an easy-to-use, high-level programming interface enabling rapid development of business-critical applications.

PDF is only one of the many technologies LEADTOOLS has to offer. For more information on our other products, be sure to visit our home page, download a free fully functioning evaluation SDK, and take advantage of our free technical support during your evaluation.

Download the Full Example

The demo from which the screenshots and code snippets were taken is available within the main LEADTOOLS evaluation. To run this example you will need the following:

Support

Need help getting this sample up and going? Contact our support team for free technical support! For pricing or licensing questions, you can contact our sales team (sales@leadtools.com).

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Help desk / Support LEAD Technologies, Inc.
United States United States
Since 1990, LEAD has established itself as the world's leading provider of software development toolkits for document, medical, multimedia, raster and vector imaging. LEAD's flagship product, LEADTOOLS, holds the top position in every major country throughout the world and boasts a healthy, diverse customer base and strong list of corporate partners including some of the largest and most influential organizations from around the globe. For more information, contact sales@leadtools.com or support@leadtools.com.
This is a Organisation (No members)


Comments and Discussions

 
QuestionWork with sdk 17.0 Pin
Member 1023326326-Aug-13 7:24
Member 1023326326-Aug-13 7:24 
AnswerRe: Work with sdk 17.0 Pin
LEADTOOLS Support27-Aug-13 5:29
sponsorLEADTOOLS Support27-Aug-13 5:29 
Hello,
This Daoud from LEADTOOLS developer support team.
Advanced PDF features that are discussed here were added in LEADTOOLS v17.5 and later. If you own an older version, send your serial number to sales@leadtools.com, and ask about obtaining a discounted price when upgrading.
Our toolkit is thread-safe so if your application is multi-threaded, you should be able to use our Advanced PDF features in it without any problems.

If you want, you can download our latest free evaluation and try these features out. Our toolkit installations are designed to work independent of one another, so installing a later version should not have any effect on the other (and vice versa).
LEADTOOLS Support
LEAD Technologies Inc.
LEADTOOLS Imaging SDK Home Page

GeneralRe: Work with sdk 17.0 Pin
Member 1023326319-Sep-13 9:41
Member 1023326319-Sep-13 9:41 
GeneralRe: Work with sdk 17.0 Pin
LEADTOOLS Support23-Sep-13 21:27
sponsorLEADTOOLS Support23-Sep-13 21:27 
GeneralRe: Work with sdk 17.0 Pin
Member 102332632-Oct-13 5:28
Member 102332632-Oct-13 5:28 
GeneralRe: Work with sdk 17.0 Pin
LEADTOOLS Support3-Oct-13 4:37
sponsorLEADTOOLS Support3-Oct-13 4:37 
QuestionOpen in reader properties Pin
Tevenyereg16-Jan-13 22:42
Tevenyereg16-Jan-13 22:42 
AnswerRe: Open in reader properties Pin
Tevenyereg17-Jan-13 22:21
Tevenyereg17-Jan-13 22:21 
GeneralRe: Open in reader properties Pin
LEADTOOLS Support20-Jan-13 21:15
sponsorLEADTOOLS Support20-Jan-13 21:15 
QuestionPrinting Pin
Benny S. Tordrup16-Jan-13 2:00
Benny S. Tordrup16-Jan-13 2:00 
AnswerRe: Printing Pin
LEADTOOLS Support16-Jan-13 6:22
sponsorLEADTOOLS Support16-Jan-13 6:22 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.