![]() |
Third Party Products »
Product Showcase »
Applications
Intermediate
Using DotImage to Scan Documents into the CloudBy Lou FrancoGot a stack of paper, a scanner, and an account on Scribd? Then this project is for you. With DotImage, Visual Studio, and a little bit of code, you can quickly and easily write a Scan-to-Scribd desktop application to put those documents online. Free sample code and step-by-step instructions! |
C#VS2005, VS2008, Dev
|
|
Advanced Search Add to IE Search |
|
|
|
Got a stack of paper, a scanner, and an account on Scribd? Want to share those documents on your blog or with your co-workers? Then this project is for you.
With DotImage, Visual Studio, and a little bit of code, you can write a Scan-to-Scribd desktop application to put those documents online. Best of all, the upload code can be easily modified to handle any remote document repository that supports uploading documents as a service.
The first step is to build a basic scanning application with DotImage. If you would like to see videos detailing the exact steps to doing that, go to these links (total lesson series time = less than 23 minutes):
Video Tutorial: Lesson 1 - Basic Structure of a Capture Application
Video Tutorial: Lesson 2 – Implementing Save and AutoZoom
Video Tutorial: Lesson 3 – Getting a List of Devices and Scanning
Video Tutorial: Lesson 4 – Basic Cleanup of Scanned Documents
The basic steps are simple:
tscbScanners.tsbScan.On Form Load, we want to fill in the list of installed scanners. To do so, call this function:
private void InitializeScannerList()
{
tsbScan.Enabled = false;
tscbScanners.Enabled = false;
if (acquisition1.SystemHasTwain)
{
// Loop through each scanner, adding to list
foreach (Device d in acquisition1.Devices)
{
string devName = d.Identity.ProductName;
tscbScanners.Items.Add(devName);
// Make sure the default one is selected
if (d == acquisition1.Devices.Default)
{
tscbScanners.SelectedItem = devName;
}
}
// If we have scanners, enable the scanning controls
if (tscbScanners.Items.Count > 0) {
tsbScan.Enabled = true;
tscbScanners.Enabled = true;
}
}
}
When the scan button is pressed, we can scan documents with this code (in the tsbScan Click event handler)
private void tsbScan_Click(object sender, EventArgs e)
{
// If a scanner is selected, use it to scan
Device selectedDevice = GetSelectedDevice();
if (selectedDevice != null)
{
selectedDevice.Acquire();
}
}
private Device GetSelectedDevice()
{
// Look for the selected scanner and return it
foreach (Device d in acquisition1.Devices)
{
if (tscbScanners.SelectedItem.ToString() ==
d.Identity.ProductName)
{
return d;
}
}
return null;
}
Every time an image is scanned, the Acquisition object’s ImageAcquired event will fire. Add a handler with this code to add the image to the document viewer:
// This function is called for each page. Add the page
// to the document viewer
private void acquisition1_ImageAcquired(object sender,
AcquireEventArgs e)
{
documentViewer1.Add(AtalaImage.FromBitmap(e.Image), "", "");
}
Once the document is scanned, we can upload to any service that accepts document uploads. Scribd (www.scribd.com) is a free document sharing website that has a web-service interface for uploading documents. There is an excellent open-source .NET library called Scribd.NET that makes interacting with the service relatively simple. You can get it here: http://www.codeplex.com/scribdnet.
Here is what you need to do to upload a Document to Scribd using Scribd.NET.
tspbUploadProgress.// Initialize and login
private void InitializeScribd()
{
// replace with yours
Scribd.Net.Service.APIKey = _apiKey;
Scribd.Net.Service.SecretKey = _secretKey;
Scribd.Net.Service.EnforceSigning = true;
}void LoginUser(string user, string password)
{
// Subscribe to events
User.LoggedIn += _loggedInHandler;
User.LoginFailed += _loggedInHandler;
// Sign into the service
User.Login(user, password);
}// This method is called on login.
void User_LoggedIn(object sender, UserEventArgs e)
{
User.LoggedIn -= _loggedInHandler;
User.LoginFailed -= _loggedInHandler;
if (e.Success)
{
_scribdInitialized = true;
}
else
{
_scribdInitialized = false;
}
}private EventHandler<UserEventArgs> _loggedInHandler;
static private bool _scribdInitialized = false;_loggedInHandler
= new EventHandler<UserEventArgs>(User_LoggedIn);Before you upload, you need to handle events that Scribd.NET raises once the file is uploaded and saved:
private void InitializeScribdEventHandlers()
{
Document.Uploaded +=
new EventHandler<DocumentEventArgs>(Document_Uploaded);
Document.Saved +=
new EventHandler<DocumentEventArgs>(Document_Saved);
Document.UploadProgressChanged +=
new EventHandler<System.Net.UploadProgressChangedEventArgs>
(Document_UploadProgressChanged);
Service.Error += new EventHandler<ScribdEventArgs>(Service_Error);
}
Scribd.NET uploads files by name, so we save it first – here’s how you use DotImage to save the file into a temporary TIFF.
// save the document as a tiff in a temporary location
// so that we can pass a path to the Scribd API
private string SaveDocumentAsTempTif()
{
string tempName = Path.GetTempFileName() + ".tif";
documentViewer1.Save(tempName, new TiffEncoder());
return tempName;
}
Here’s how you upload (AccessTypes is a Scribd.NET type that you can use to specify if the document is public or private):
private void UploadFileToScribd(string filename, AccessTypes accessType)
{
Scribd.Net.Document.UploadAsync(filename, accessType);
}
And, here’s how you handle the events:
// Called by Scribd API to report an error
void Service_Error(object sender, ScribdEventArgs e)
{
MessageBox.Show(this, "Scribd Error: " + e.Message,
"Error", MessageBoxButtons.OK, MessageBoxIcon.Error);
}
// Called by Scribd API to show progress
void Document_UploadProgressChanged(object sender,
System.Net.UploadProgressChangedEventArgs e)
{
tspbUploadProgress.Value = e.ProgressPercentage;
}
// Called by Scribd API when the document is uploaded
void Document_Uploaded(object sender, DocumentEventArgs e)
{
if (e.Document != null)
{
// _title is a String that you set up before uploading
e.Document.Title = _title;
e.Document.Save();
}
}
// Called by Scribd API when the document is saved
void Document_Saved(object sender, DocumentEventArgs e)
{
// Here the document is uploaded and saved, so
// you can update your UI to reflect this
}
So you see how easy it is to scan and upload documents. The same basic structure could be used to upload documents into Amazon’s S3, the new Microsoft Azure SQL Data Services, Google Docs, any ECM that supports the emerging CMIS standard for documents, SharePoint, etc.
To get the full code and a build of this project go to Atalasoft’s Scan Documents to Scribd Project page.
TIFFs are fine for scans, but they will not be indexed by Scribd, so you will not be able to search for the documents later. We can OCR the document and then create a PDF with the original page image on top and the text that it represents beneath. That way, we get a document that looks like a scan, but can be found by indexers. This is called a searchable PDF and they are easy to create with DotImage.
Since OCR is a time consuming process, it’s best to do it in the background with a BackgroundWorker object. Here is how you do it:
saveAsSearchablePdfBackground. Set its WorkerReportsProgress property to true.// Create a searchable PDF (Image with text behind it)
private void CreateSearchablePdf(string tif, string pdf)
{
using (TesseractEngine ocrEngine = new TesseractEngine())
{
ocrEngine.Initialize();
ocrEngine.PreprocessingOptions.Deskew = false;
try
{
ocrEngine.DocumentProgress += new
OcrDocumentProgressEventHandler(
ocrEngine_DocumentProgress);
ocrEngine.Translators.Add(new PdfTranslator());
ocrEngine.Translate(new FileSystemImageSource(
new string[]{tif}, true), "application/pdf", pdf);
}
finally
{
ocrEngine.ShutDown();
}
}
}// Called by OCR engine (in background thread).
// Need to call worker process ReportProgress so that the call
// to update the progress bar happens in the right thread.
void ocrEngine_DocumentProgress(object sender,
OcrDocumentProgressEventArgs e)
{
if (e.ProgressIsValid)
{
saveAsSearchablePdfBackground.ReportProgress(e.Progress);
}
}// Called indirectly by ReportProgress on the background worker
private void saveAsSearchablePdfBackground_ProgressChanged(
object sender, ProgressChangedEventArgs e)
{
tspbUploadProgress.Value = e.ProgressPercentage;
}
// called when DoWork is complete
private void saveAsSearchablePdfBackground_RunWorkerCompleted(
object sender, RunWorkerCompletedEventArgs e)
{
tspbUploadProgress.Visible = false;
}Upload the PDF as before (inside of the RunWorkerCompleted handler if you want to do it automatically)
Atalasoft, Inc. provides ECM imaging technology to ISVs, Systems Integrators, and Enterprises with thousands of customers, and millions of end users worldwide. Specializing in zero-footprint, AJAX-enabled web image viewing, Atalasoft provides the tools to migrate enterprise solutions from the desktop to the web. For almost a decade, Atalasoft has produced imaging technology products including DotImage – the leading imaging toolkit for .NET developers, and Vizit SP – Document Viewing and Imaging for SharePoint.
| You must Sign In to use this message board. | ||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||
General
News
Question
Answer
Joke
Rant
Admin
Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads.
|
PermaLink |
Privacy |
Terms of Use
Last Updated: 13 Nov 2008 Editor: Sean Ewington |
Copyright 2008 by Lou Franco Everything else Copyright © CodeProject, 1999-2010 Web22 | Advertise on the Code Project |