Click here to Skip to main content
15,868,005 members
Articles / Programming Languages / Visual Basic
Article

Recognizing Barcodes in PDF and TIFF Documents

6 Oct 2009CPOL3 min read 52.9K   29   5
Multipage business documents and reports sometimes use barcodes to aid in routing and indexing. Using the DotImage .NET PDF Reader, .NET TIFF Codec, and Barcode Recognition for .NET Add-on, it's easy to read the barcodes and use them to automate workflows or find these documents in a repository.

This article is in the Product Showcase section for our sponsors at CodeProject. These articles are intended to provide you with information on products and services that we consider useful and of value to developers.

Multipage business documents and reports sometimes use barcodes to aid in automated routing and indexing. Using the DotImage .NET PDF Reader, .NET TIFF Codec, and Barcode Recognition for .NET Add-on, it’s easy to read the data encoded in these barcodes and use them to automate workflows or find these documents in a repository.

Reading a Page from a PDF

Generally, a page in a PDF will either be an imaged page that might have come from a scanner or instead be made up of various elements (text, drawings, images, etc.). If it’s an imaged page, we would want to extract the page from the PDF at the stored resolution. If it’s composed, we need to rasterize the page to a single image, making sure to do it at a high resolution to preserve the quality of the barcodes.

Here is a C# function that retrieves a page from a PDF at the best resolution for barcode reading:

C#
private AtalaImage GetPdfPage(Stream stream, int pageNum)
{
    AtalaImage img = null;

    // check to see if this page is an image
    using (Document pdfDoc = new Document(stream))
    {
        Page pdfPage = pdfDoc.Pages[pageNum];
        if (pdfPage.SingleImageOnly)
        {
            // if it's an image, extract it from the PDF at its
            // stored resolution
            img = pdfPage.ExtractImages()[0].Image;
        }
    }

    // if the page is not an image, rasterize it at high enough
    // resolution to read the barcodes
    if (img == null)
    {
        PdfDecoder pdfReader = new PdfDecoder();
        pdfReader.Resolution = 300;
        stream.Position = 0;
        img = pdfReader.Read(stream, pageNum, null);
    }
    return img;
}

Here is the function in VB.NET

VB
Private Function GetPdfPage(ByVal stream As Stream,
            ByVal pageNum As Integer) As AtalaImage
    Dim img As AtalaImage = Nothing

    ' check to see if this page is an image
    Using pdfDoc As New Document(stream)
        Dim pdfPage As Page = pdfDoc.Pages(pageNum)
        If pdfPage.SingleImageOnly Then
            ' if it's an image, extract it from the PDF at its
            ' stored resolution
            img = pdfPage.ExtractImages()(0).Image
        End If
    End Using

    ' if the page is not an image, rasterize it at high enough
    ' resolution to read the barcodes
    If img Is Nothing Then
        Dim pdfReader As New PdfDecoder()
        pdfReader.Resolution = 300
        stream.Position = 0
        img = pdfReader.Read(stream, pageNum, Nothing)
    End If
    Return img
End Function

Reading a Page from a TIFF

Reading a page from a tiff is easier since it will already be rasterized at a specific resolution. In DotImage, we simply need to construct an AtalaImage object, passing the stream with the encoded image and the page number. Incidentally, this will work for any raster image type, not just TIFF. This C# function detects if a file is PDF, and if so, calls the previous function. If it isn’t a PDF, it simply reads the page and returns it.

C#
private AtalaImage GetDocumentPage(string file, int pageNum)
{
    AtalaImage img = null;
    PdfDecoder pdfReader = new PdfDecoder();
    using (Stream stream = File.OpenRead(file))
    {
        if (pdfReader.IsValidFormat(stream))
        {
            stream.Position = 0;
            img = GetPdfPage(stream, pageNum);
        }
        else
        {
            stream.Position = 0;
            img = new AtalaImage(stream, pageNum, null);
        }
    }
    return img;
}

And, here is the function in VB.NET

VB
Private Function GetDocumentPage(ByVal fileName As String,
            ByVal pageNum As Integer) As AtalaImage

    Dim img As AtalaImage = Nothing
    Dim pdfReader As New PdfDecoder()
    Using stream As Stream = File.OpenRead(fileName)
        If pdfReader.IsValidFormat(stream) Then
            stream.Position = 0
            img = GetPdfPage(stream, pageNum)
        Else
            stream.Position = 0
            img = New AtalaImage(stream, pageNum, Nothing)
        End If
    End Using
    Return img
End Function

Reading Barcodes from an Image

Once you have an AtalaImage, DotImage makes it easy to read any barcode from it. Here is the C# code:

C#
private BarCode[] ReadBarcodesFromPage(AtalaImage img)
{
    BarCodeReader reader = new BarCodeReader(img);
    ReadOpts opts = new ReadOpts();
    opts.Symbology = Symbologies.All;
    return reader.ReadBars(opts);
}

And, here it is in VB.NET

VB
Private Function ReadBarcodesFromPage(ByVal img As AtalaImage)
             As BarCode()
    Dim reader As New BarCodeReader(img)
    Dim opts As New ReadOpts()
    opts.Symbology = Symbologies.All
    Return reader.ReadBars(opts)
End Function

The ReadOpts class allows you to control the barcode reading in a lot of ways. You could specify the type (symbology) of the barcode that you want to find, the direction that it should appear on the page, the number of barcodes you expected to be there. Any of these options can be used to speed up the recognition by having the reader do less work. This code just tries to find any barcode in any direction.

Processing the Barcodes

Once you have the array of BarCode objects, you just need to loop through them and get the recognized text from them. Here it is in C#:

C#
// Get the page and read the barcodes
AtalaImage img = GetDocumentPage(fileName, pageNum);
BarCode[] bars = ReadBarcodesFromPage(img);

// show the barcodes in the list
foreach (BarCode bar in bars)
{
    Console.WriteLine(bar.DataString);
}

And VB.NET:

VB
' Get the page and read the barcodes
Dim img As AtalaImage = GetDocumentPage(fileName, pageNum)
Dim bars As BarCode() = ReadBarcodesFromPage(img)

' show the barcodes in the list
For Each bar As BarCode In bars
    Console.WriteLine(bar.DataString)
Next

DotImage

The DotImage .NET Imaging SDK contains everything you need to read document imaging formats and recognize the barcodes in them. Download a 30-day evalulation.

Archives

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Atalasoft, Inc.
United States United States
Lou Franco is the Director of Engineering at Atalasoft, provider of the leading .NET Imaging SDK (DotImage) and the Document Viewer for SharePoint (Vizit).

http://atalasoft.com/products/dotimage
http://vizitsp.com

Comments and Discussions

 
General[My vote of 2] Commercial Pin
Goran Bacvarovski7-Oct-09 6:28
Goran Bacvarovski7-Oct-09 6:28 
GeneralRe: [My vote of 2] Commercial Pin
Lou Franco7-Oct-09 6:54
Lou Franco7-Oct-09 6:54 
GeneralRe: [My vote of 2] Commercial Pin
Goran Bacvarovski14-Oct-09 13:26
Goran Bacvarovski14-Oct-09 13:26 
GeneralRe: [My vote of 2] Commercial Pin
Lou Franco15-Oct-09 1:43
Lou Franco15-Oct-09 1:43 
Thanks for the response. I agree that it's always possible to do these things on your own -- it comes down to time, quality, breadth, and continued maintenance. If you can meet your requirements easily by doing it yourself, then that is sometimes the right choice.

For example, if you know that you have very high quality images of barcodes, don't have to read them super fast, and only need to support one or two 1D types -- then I agree, you might be fine just doing it yourself.

Some of the advantages of using DotImage instead of doing it yourself:
- It's much faster than something you could easily put together -- there are many person-years of effort put into optimizing the code -- this is important for customers doing production scanning.
- It will read barcodes that aren't necessarily well imaged -- we have an extensive test-bed of barcodes and have spent a lot of effort making sure we read them -- for example, barcodes on faxes
- It reads a lot of different barcodes including 2D ones like DataMatrix, PDF417, and QR
- We stand behind our product with excellent support
- You also get high-quality codecs, viewers, image processing, and scanning with the product

Also, while some competitors may appear cheaper, they might be charging for runtime deployments (which is very common among the higher-end imaging SDK's). We deliver the same quality and speed, but don't charge runtime-royalties for desktop deployments. On servers, we don't charge for each connecting client (like our competitors), but just a per-server charge. This means that as you have more and more users, our costs don't go up.

For example, our current (10/2009) list pricing for reading a code 39 barcode from simple images is $1,860 (DotImage Photo + Code39) and if you want to read all 1D barcodes from a multipage TIFF (DotImage Document Imaging + All 1D Barcodes), then it's $4,440. You get a full imaging suite for that cost (process, view, and in the latter case, scan, annotate, etc). If you make about $100/hr, then you'd have to finish a Code 39 barcode reader in about 2 days and all of the 1D in about a week -- and you won't have our high-quality codecs, viewers and scanning components. And for Desktop GUI's, there is no runtime-royalty.

I'm not saying it's always cheaper to buy an SDK -- but, I think, the more you need in imaging, the better it will be to consider one.

Again, I really appreciate you taking the time to share your thoughts. It helps us understand some of the things people think about when evaluating our products, so that we can make it better and deliver enough value for the cost.

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.