Click here to Skip to main content
14,637,542 members
Articles » Languages » C# » General
Article
Posted 11 Jul 2006

Stats

85.8K views
26 bookmarked

Counting PDF Pages using Regular Expressions

Rate this:
4.00 (12 votes)
Please Sign up or sign in to vote.
4.00 (12 votes)
11 Jul 2006CPOL
Explains how to count PDF pages using regular expressions in C#

Introduction

During one of my .NET projects working with Adobe PDF files, I encountered the need to simply retrieve the page count of a specific file. I did not need to manipulate the PDF at all so buying a .NET component for this task sounded a little inconvenient.

After a few hours of researching for an easy solution, I found out that the old regular expressions might hold the answer.

Opening the PDF in Notepad, I noticed that for each page in the file there is a specific character sequence: "/Type /Page" (depending on the PDF version with or without the space between the two words). So, all we need to do is to count how many times this sequence repeats in the file.

Getting It Done !

First, we need to open the PDF file using a FileStream and read the contents as a string using a StreamReader.

FileStream fs = new FileStream(@"c:\a.pdf", FileMode.Open, FileAccess.Read);
StreamReader r = new StreamReader(fs);
string pdfText = r.ReadToEnd();

Once we have the PDF text, all we need to do is to create the regular expression and count the matches.

Regex rx1 = new Regex(@"/Type\s*/Page[^s]");
MatchCollection matches = rx1.Matches(pdfText);
MessageBox.Show("The PDF file has " + matches.Count.ToString() + " page(s).";

Voila!

History

  • 11th July, 2006: Initial post

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Vicente Angotti
Web Developer
United States United States
Vicente Angotti has been working as a software developer for more than 12 years and he is currently an IT project manager for a Federal U.S. Government Agency.
Some of his recent projects include TCP/IP communications, Asynchronous Threading, Image Manipulation and VLDB Design using .NET.

Comments and Discussions

 
GeneralPDF version Pin
Marcus Deecke23-Jun-07 21:59
MemberMarcus Deecke23-Jun-07 21:59 
GeneralEasier Way To Get PDF Page Count. Pin
Sean51506-May-07 3:19
MemberSean51506-May-07 3:19 
GeneralRe: Easier Way To Get PDF Page Count. Pin
Member 292057023-Jan-09 6:03
MemberMember 292057023-Jan-09 6:03 
GeneralRe: Easier Way To Get PDF Page Count. Pin
Adrian Schröder23-Apr-15 20:45
MemberAdrian Schröder23-Apr-15 20:45 
GeneralGood but incomplete Pin
Sivrag the Conqueror21-Apr-07 8:07
MemberSivrag the Conqueror21-Apr-07 8:07 
GeneralRe: Good but incomplete Pin
Sivrag the Conqueror21-Apr-07 10:29
MemberSivrag the Conqueror21-Apr-07 10:29 
Generalword count Pin
reza-taavoni7-Apr-07 0:23
Memberreza-taavoni7-Apr-07 0:23 
GeneralClever approach Pin
Bilgin Esme20-Mar-07 3:35
MemberBilgin Esme20-Mar-07 3:35 
GeneralThis solution doesn't always works... Pin
rizwan8229-Aug-06 5:34
Memberrizwan8229-Aug-06 5:34 
GeneralRe: This solution doesn't always works... Pin
Vicente Angotti5-Sep-06 7:43
MemberVicente Angotti5-Sep-06 7:43 
GeneralRe: This solution doesn't always works... Pin
rizwan825-Sep-06 21:07
Memberrizwan825-Sep-06 21:07 
GeneralRe: This solution doesn't always works... Pin
Vicente Angotti13-Sep-06 7:49
MemberVicente Angotti13-Sep-06 7:49 
GeneralRe: This solution doesn't always works... Pin
rizwan8213-Sep-06 21:37
Memberrizwan8213-Sep-06 21:37 
QuestionWhat about large files? Pin
HellfireHD12-Jul-06 11:23
MemberHellfireHD12-Jul-06 11:23 
AnswerRe: What about large files? PinPopular
illium27-Nov-06 10:45
Memberillium27-Nov-06 10:45 
There are probably lots of elegant and impressive ways to do this, but if you're looking for something that will just give you a page count quickly, I suggest using iTextSharp.

For example:

...

using iTextSharp.text.pdf;

...

public static int GetPDFPageCount(string path)
{
// open the file
PdfReader pdf_file = new PdfReader(path);

// read it's page count
int page_count = pdf_file.NumberOfPages;

// close the file.
pdf_file.Close();

// return the value
return page_count;
}


The reason I suggest this, is that, after reading your request, I went and tried to do as you suggest, using FileStream. I boiled it down numerous times until I had a reliable and relatively quick method for counting the pages... However, even that method crawled compared to what I just posted.

Unless you enjoy pointlessly complicated procedural programming as well as lengthy waiting periods while your program does more work than necessary, I advise not using a streamed method OR a regex method.

iTextSharp is free and can be downloaded from http://itextsharp.sourceforge.net/

_illium
GeneralRe: What about large files? Pin
captainplanet01231-Jun-09 2:21
Membercaptainplanet01231-Jun-09 2:21 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.