![]() |
Languages »
C# »
Applications
Intermediate
The Paperless DesktopBy Martin WelkerHow to perform scanning, rearranging, OCR and Outlook export of documents for a paperless future - or at least a tidy desktop. |
C#.NET 1.1, Win2003VS.NET2003, Dev
|
|
Advanced Search Add to IE Search |
|
|
||||||||||||||||||

After playing around with Microsoft's Document Imaging Library (MODI) in OCR with Microsoft Office, I decided to add some features to the primary MODI application like scanning, multi TIFF rearrangement and Outlook export. The Outlook export enables you to organize your documents by email folders. Since tools like LookOut, this might be faster than the walk to the good old file cabinet.
The application looks very similar to the well known MS Imaging Viewer. The most important difference is the email export to offer a comfortable document storage. From the technical point of view, the application is based on three technologies: MODI, TWAIN and Outlook-Interop (for each of these topics, there is a link at the end of the article which can be used as a tutorial, therefore I won't describe specific details of these techniques). Their integration needed some changes and enhancements and this is what the article is about.

First of all, I chose TWAIN support instead of the modern WIA standard because I own a scanner that only supports TWAIN. Based on the very well done code from the Scanning via TWAIN article, I created the TwainControl to encapsulate the scanning functionality. The interaction with the control is done with three methods and one event.
twainControl1.Init(this.Handle);
twainControl1.Release();
The TWAIN library offers a device selection dialog.

This dialog can be shown by calling the following method:
twainControl1.SelectDevice();
The start method has two parameters, bool UI and bool modal. With UI = false, you can start the process without the scanner's configuration dialog. The modal flag controls the modal status of the scan dialog.
twainControl1.StartScanning(UI, modal);
After the control has proceeded scanning, a FinishScanning event is fired. The main application is registered as a listener.
private void twainControl1_FinishScanning(object sender,
Util.TwainLib.FinishScanningEventArgs e)
{
if (e.scanned)
{
ArrayList images = twainControl1.PopImages();
AppendScannedImages(images);
SaveFile();
}
}
This is the point where integration comes in. The PopImages method writes all scanned pages (if you have a multi page scanner device) into an image array. Afterwards, these images are appended to the MODI document.
If you want to open the application by pressing the scanner hardware button, you can add the application path to the Registry. The Registry key in HKEY_LOCAL_MACHINE is Software\Microsoft\Windows\CurrentVersion\StillImage\Registered Applications.
The key value should be:
[Path\]MartinsPaperlessDesktop.exe /StiDevice:%1 /StiEvent:%2
The first version included self written TIFF handling code (which was not making me happy). By integrating the MODI library, handling multi-TIFF files gets really simple and fast.
One valuable feature is the 'append'-function which appends pages to a multi-TIFF document. In case that your scanner device does only support single page scanning, this might be helpful.
private void AppendImage(string source)
{
if (_MODIDocument == null) return;
try
{
MODI.Document document = new MODI.Document();
document.Create(source);
_changed = true;
// iterate through all image pages of the source document
for (int i = 0; i < document.Images.Count; i++)
{
_MODIDocument.Images.Add(document.Images[i],null);
}
}
catch(Exception ee)
{
MessageBox.Show(ee.Message);
SetImage("",false);
}
}
If you got mixed up during the scanning process, you can move single pages within the document to get the order you want.
private void MoveImage(int pageNumber, bool up)
{
if (_MODIDocument == null) return;
MODI.Image img = (MODI.Image) _MODIDocument.Images[pageNumber];
if (up)
{
if (pageNumber-1 >= 0)
{
MODI.Image prevImg =
(MODI.Image) _MODIDocument.Images[pageNumber-1];
// the add methode needs the "beforeImage" as second argument
_MODIDocument.Images.Add(img,prevImg);
MODI.Image removeImg =
(MODI.Image) _MODIDocument.Images[pageNumber+1];
_MODIDocument.Images.Remove(removeImg);
axMiDocView1.PageNum = pageNumber-1;
}
}
else
{
if (pageNumber+1 < axMiDocView1.NumPages)
{
MODI.Image nextImg = null;
if (pageNumber+2 < _MODIDocument.Images.Count)
{
nextImg =
(MODI.Image) _MODIDocument.Images[pageNumber+2];
}
// if the second argument is NULL,
//the page is appended at the end of the image
_MODIDocument.Images.Add(img,nextImg);
MODI.Image removeImg =
(MODI.Image) _MODIDocument.Images[pageNumber];
_MODIDocument.Images.Remove(removeImg);
axMiDocView1.PageNum = pageNumber+1;
}
}
// this action causes a change notification
_changed = true;
ShowStatus();
}
One primary goal of the layout processing was to keep the original document layout alive. The OCR comes from the MODI.Document.OCR() method. I used the document model from Document Processing Part II to get a better layout serialization than provided by MODI. Since we will export the document's text to an HTML based email, a very trivial HTML converting is done.
private string GetDocumentText()
{
Model.Document doc = Model.Document.CreateByMODI(_MODIDocument);
string c = doc.GetText();
// very trivial converting to HTML
c = c.Replace("\r\n","<\br\>");
return c;
}
Now, the work is all done and exporting is straightforward coding. The source is placed in the DocumentMailer class. The constructor method opens a connection to MS Outlook.
private Microsoft.Office.Interop.Outlook.Application oApp;
private Microsoft.Office.Interop.Outlook._NameSpace oNameSpace;
private Microsoft.Office.Interop.Outlook.MAPIFolder oOutboxFolder;
public DocumentMailer()
{
oApp = new Outlook.Application();
oNameSpace= oApp.GetNamespace("MAPI");
oNameSpace.Logon(null,null,true,true);
oOutboxFolder =
oNameSpace.GetDefaultFolder(OlDefaultFolders.olFolderOutbox);
}
With an opened Outlook connection, the AddToOutBox method does all the work.
Outlook._MailItem oMailItem =
(Outlook._MailItem)oApp.CreateItem(Outlook.OlItemType.olMailItem);
oMailItem.To = toValue;
oMailItem.BodyFormat = Outlook.OlBodyFormat.olFormatHTML;
oMailItem.Subject = subjectValue;
oMailItem.HTMLBody = bodyValue;
oMailItem.SaveSentMessageFolder = oOutboxFolder;
//.. attachments
oMailItem.Save();
oMailItem.Display(null);
To save us from selecting our personal settings each time again, I added a small configuration class. The code is placed in an instance of Configuration which is loaded during initialization.

To have a short look at serialization and deserialization, here is the code for loading:
public static Configuration LoadFromFile(string path)
{
IFormatter formatter = new BinaryFormatter();
Stream streamS =
new FileStream(path, FileMode.Open, FileAccess.Read, FileShare.Read);
object o = formatter.Deserialize(streamS);
Configuration solution = (Configuration) o;
streamS.Close();
return solution ;
}
..and for saving as well:
public bool SaveToFile(string path)
{
IFormatter formatter = new BinaryFormatter();
Stream stream =
new FileStream(path, FileMode.Create, FileAccess.Write, FileShare.None);
formatter.Serialize(stream, this);
stream.Close();
return true;
}
After all this technical stuff, we can afford a little distraction. Accidentally I designed a haunting process oriented adventure.
Players
Game instructions:
Sounds too basic to be a five star adventure? Well, you are probably right, but it works for me. Mostly.
It's obvious, the ultimate goal of a paperless desktop is not archived by software. The answer seems to stay in the process. Bringing peace into your desktop's chaos needs discipline from the moment you open the post box. Feel free to share your own experiences in the article's discussion board. That's all for the moment; thanks for reading - and of course, thanks for voting too.
General
News
Question
Answer
Joke
Rant
Admin
|
PermaLink |
Privacy |
Terms of Use
Last Updated: 5 Apr 2007 Editor: Sean Ewington |
Copyright 2005 by Martin Welker Everything else Copyright © CodeProject, 1999-2009 Web20 | Advertise on the Code Project |