Recognizing Barcodes in PDF and TIFF Documents





2.00/5 (1 vote)
Multipage business documents and reports sometimes use barcodes to aid in routing and indexing. Using the DotImage .NET PDF Reader, .NET TIFF Codec, and Barcode Recognition for .NET Add-on, it's easy to read the barcodes and use them to automate workflows or find these documents in a repository.
Multipage business documents and reports sometimes use barcodes to aid in automated routing and indexing. Using the DotImage .NET PDF Reader, .NET TIFF Codec, and Barcode Recognition for .NET Add-on, it’s easy to read the data encoded in these barcodes and use them to automate workflows or find these documents in a repository.
Reading a Page from a PDF
Generally, a page in a PDF will either be an imaged page that might have come from a scanner or instead be made up of various elements (text, drawings, images, etc.). If it’s an imaged page, we would want to extract the page from the PDF at the stored resolution. If it’s composed, we need to rasterize the page to a single image, making sure to do it at a high resolution to preserve the quality of the barcodes.
Here is a C# function that retrieves a page from a PDF at the best resolution for barcode reading:
private AtalaImage GetPdfPage(Stream stream, int pageNum)
{
AtalaImage img = null;
// check to see if this page is an image
using (Document pdfDoc = new Document(stream))
{
Page pdfPage = pdfDoc.Pages[pageNum];
if (pdfPage.SingleImageOnly)
{
// if it's an image, extract it from the PDF at its
// stored resolution
img = pdfPage.ExtractImages()[0].Image;
}
}
// if the page is not an image, rasterize it at high enough
// resolution to read the barcodes
if (img == null)
{
PdfDecoder pdfReader = new PdfDecoder();
pdfReader.Resolution = 300;
stream.Position = 0;
img = pdfReader.Read(stream, pageNum, null);
}
return img;
}
Here is the function in VB.NET
Private Function GetPdfPage(ByVal stream As Stream,
ByVal pageNum As Integer) As AtalaImage
Dim img As AtalaImage = Nothing
' check to see if this page is an image
Using pdfDoc As New Document(stream)
Dim pdfPage As Page = pdfDoc.Pages(pageNum)
If pdfPage.SingleImageOnly Then
' if it's an image, extract it from the PDF at its
' stored resolution
img = pdfPage.ExtractImages()(0).Image
End If
End Using
' if the page is not an image, rasterize it at high enough
' resolution to read the barcodes
If img Is Nothing Then
Dim pdfReader As New PdfDecoder()
pdfReader.Resolution = 300
stream.Position = 0
img = pdfReader.Read(stream, pageNum, Nothing)
End If
Return img
End Function
Reading a Page from a TIFF
Reading a page from a tiff is easier since it will already be rasterized at a specific resolution. In DotImage, we simply need to construct an AtalaImage object, passing the stream with the encoded image and the page number. Incidentally, this will work for any raster image type, not just TIFF. This C# function detects if a file is PDF, and if so, calls the previous function. If it isn’t a PDF, it simply reads the page and returns it.
private AtalaImage GetDocumentPage(string file, int pageNum)
{
AtalaImage img = null;
PdfDecoder pdfReader = new PdfDecoder();
using (Stream stream = File.OpenRead(file))
{
if (pdfReader.IsValidFormat(stream))
{
stream.Position = 0;
img = GetPdfPage(stream, pageNum);
}
else
{
stream.Position = 0;
img = new AtalaImage(stream, pageNum, null);
}
}
return img;
}
And, here is the function in VB.NET
Private Function GetDocumentPage(ByVal fileName As String,
ByVal pageNum As Integer) As AtalaImage
Dim img As AtalaImage = Nothing
Dim pdfReader As New PdfDecoder()
Using stream As Stream = File.OpenRead(fileName)
If pdfReader.IsValidFormat(stream) Then
stream.Position = 0
img = GetPdfPage(stream, pageNum)
Else
stream.Position = 0
img = New AtalaImage(stream, pageNum, Nothing)
End If
End Using
Return img
End Function
Reading Barcodes from an Image
Once you have an AtalaImage, DotImage makes it easy to read any barcode from it. Here is the C# code:
private BarCode[] ReadBarcodesFromPage(AtalaImage img)
{
BarCodeReader reader = new BarCodeReader(img);
ReadOpts opts = new ReadOpts();
opts.Symbology = Symbologies.All;
return reader.ReadBars(opts);
}
And, here it is in VB.NET
Private Function ReadBarcodesFromPage(ByVal img As AtalaImage)
As BarCode()
Dim reader As New BarCodeReader(img)
Dim opts As New ReadOpts()
opts.Symbology = Symbologies.All
Return reader.ReadBars(opts)
End Function
The ReadOpts class allows you to control the barcode reading in a lot of ways. You could specify the type (symbology) of the barcode that you want to find, the direction that it should appear on the page, the number of barcodes you expected to be there. Any of these options can be used to speed up the recognition by having the reader do less work. This code just tries to find any barcode in any direction.
Processing the Barcodes
Once you have the array of BarCode objects, you just need to loop through them and get the recognized text from them. Here it is in C#:
// Get the page and read the barcodes
AtalaImage img = GetDocumentPage(fileName, pageNum);
BarCode[] bars = ReadBarcodesFromPage(img);
// show the barcodes in the list
foreach (BarCode bar in bars)
{
Console.WriteLine(bar.DataString);
}
And VB.NET:
' Get the page and read the barcodes
Dim img As AtalaImage = GetDocumentPage(fileName, pageNum)
Dim bars As BarCode() = ReadBarcodesFromPage(img)
' show the barcodes in the list
For Each bar As BarCode In bars
Console.WriteLine(bar.DataString)
Next
DotImage
The DotImage .NET Imaging SDK contains everything you need to read document imaging formats and recognize the barcodes in them. Download a 30-day evalulation.
Archives
- TIFF Editing Made Simple
- Building a Visual Studio DebuggerVisualizer with a Custom Serializer
- ThinDoc: Zero-Footprint, Full-Screen PDF Viewer
- Generating a Website Color Scheme from an Image
- Converting Scanned Document Images to Searchable PDFs with OCR
- Case Study: Bringing WinForms Controls to the Web with AJAX
- "Hey! Is That My Car? How to Sharpen a QuickBird Satellite Image Using DotImage"
- Using DotImage to Scan Documents into the Cloud
- Optical Mark Recognition with DotImage
- Effortless ActiveX Twain Scanning with Atalasoft’s DotTwain SDK
- TIFF and PDF: What’s the Difference and How to Convert Between Them
- Recognizing Barcodes in PDF and TIFF Documents