Click here to Skip to main content
Licence CPOL
First Posted 11 Jul 2006
Views 40,050
Bookmarked 21 times

Counting PDF Pages using Regular Expressions

By | 11 Jul 2006 | Article
Explains how to count PDF pages using regular expressions in C#

Introduction

During one of my .NET projects working with Adobe PDF files, I encountered the need to simply retrieve the page count of a specific file. I did not need to manipulate the PDF at all so buying a .NET component for this task sounded a little inconvenient.

After a few hours of researching for an easy solution, I found out that the old regular expressions might hold the answer.

Opening the PDF in Notepad, I noticed that for each page in the file there is a specific character sequence: "/Type /Page" (depending on the PDF version with or without the space between the two words). So, all we need to do is to count how many times this sequence repeats in the file.

Getting It Done !

First, we need to open the PDF file using a FileStream and read the contents as a string using a StreamReader.

FileStream fs = new FileStream(@"c:\a.pdf", FileMode.Open, FileAccess.Read);
StreamReader r = new StreamReader(fs);
string pdfText = r.ReadToEnd();

Once we have the PDF text, all we need to do is to create the regular expression and count the matches.

Regex rx1 = new Regex(@"/Type\s*/Page[^s]");
MatchCollection matches = rx1.Matches(pdfText);
MessageBox.Show("The PDF file has " + matches.Count.ToString() + " page(s).";

Voila!

History

  • 11th July, 2006: Initial post

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Vicente Angotti

Web Developer

United States United States

Member

Vicente Angotti has been working as a software developer for more than 12 years and he is currently an IT project manager for a Federal U.S. Government Agency.
Some of his recent projects include TCP/IP communications, Asynchronous Threading, Image Manipulation and VLDB Design using .NET.

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board. (secure sign-in)
 
Search this forum  
 FAQ
    Noise  Layout  Per page   
  Refresh
GeneralPDF version PinmemberMarcus Deecke21:59 23 Jun '07  
GeneralEasier Way To Get PDF Page Count. PinmemberSean51503:19 6 May '07  
GeneralRe: Easier Way To Get PDF Page Count. PinmemberMember 29205706:03 23 Jan '09  
GeneralGood but incomplete PinmemberSivrag the Conqueror8:07 21 Apr '07  
GeneralRe: Good but incomplete PinmemberSivrag the Conqueror10:29 21 Apr '07  
Generalword count Pinmemberreza-taavoni0:23 7 Apr '07  
GeneralClever approach PinmemberBilgin Esme3:35 20 Mar '07  
GeneralThis solution doesn't always works... Pinmemberrizwan825:34 29 Aug '06  
GeneralRe: This solution doesn't always works... PinmemberVicente Angotti7:43 5 Sep '06  
GeneralRe: This solution doesn't always works... Pinmemberrizwan8221:07 5 Sep '06  
GeneralRe: This solution doesn't always works... PinmemberVicente Angotti7:49 13 Sep '06  
GeneralRe: This solution doesn't always works... Pinmemberrizwan8221:37 13 Sep '06  
QuestionWhat about large files? Pinmembercyberfloatie11:23 12 Jul '06  
AnswerRe: What about large files? Pinmemberillium10:45 27 Nov '06  
GeneralRe: What about large files? Pinmembercaptainplanet01232:21 1 Jun '09  

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Mobile
Web04 | 2.5.120517.1 | Last Updated 11 Jul 2006
Article Copyright 2006 by Vicente Angotti
Everything else Copyright © CodeProject, 1999-2012
Terms of Use
Layout: fixed | fluid