65.9K
CodeProject is changing. Read more.
Home

OCR With MODI in Visual C++

starIconstarIconstarIcon
emptyStarIcon
starIcon
emptyStarIcon

3.81/5 (20 votes)

Jan 24, 2007

1 min read

viewsIcon

240249

downloadIcon

17007

An article on how to use Microsoft Office Document Imaging Library (MODI) for OCR in Visual C++

MODI VC Demo

Introduction

Microsoft Office Document Imaging Library (MODI) which comes with the Office 2003 package, allows us easily integrate OCR functionality into our own applications. Although there is a good C# sample: "OCR with Microsoft® Office" posted on this web site, I would need something in C++. After searching on the Internet and the Microsoft web site and can't find anything good regarding MODI's OCR for Visual C++. I decided to dig this thing out and write this sample demo program to show the basic thing of MODI's OCR feature. I believe that some people may be interested in this program, so, I post it on the codeproject web site to share the common interest.

Project Background

This project was firstly started in Visual C++ 6.0 and then updated to Visual Studio .Net 2003 and I have included two project file in the demo program. To run it in Visual C++ 6.0, open MODIVCDemo.dsp manually.

Build Project and Use Code

Add MODI Active-X into the project

In visual C++ 6.0, click "Project->Add To Project->Components and Controls->Registered ActiveX Control" and select MODI ActiveX as shown below.

 MODI Active-X Control

Mapping Active-X into the project

MODI Active-X Control Mapping

Once map MODI Active-X control into the project, all Active-X control wrapped classes will be automatically added into the project.

HOW TO OCR it in Visual C++.

Following is the sample code showing how to use MODI for OCR.

BOOL CMODIVC6Dlg::bReadOCRByMODIAXCtrl(CString csFilePath, 
                                       CString &csText)
{
   BOOL bRet = TRUE;
   HRESULT hr = 0;
   csText.Empty();

   IUnknown *pVal = NULL;
   IDocument *IDobj = NULL;
   ILayout *ILayout = NULL;
   IImages *IImages = NULL;
   IImage *IImage = NULL;
   IWords *IWords = NULL;
   IWord *IWord = NULL;

   pVal = (IUnknown *) m_MIDOCtrl.GetDocument(); 

   if ( pVal != NULL )
   {
      //Already has image in it, Don't need to create again
      //Just get IDocument interface
      pVal->QueryInterface(IID_IDocument,(void**) &IDobj);
      if ( SUCCEEDED(hr) )
      {
         hr = IDobj->OCR(miLANG_SYSDEFAULT,1,1);

         if ( SUCCEEDED(hr) )
         {
            IDobj->get_Images(&IImages);
            long iImageCount=0;
    
            Images->get_Count(&iImageCount);
            for ( int img =0; img<iImageCount;img++)
            {
               IImages->get_Item(img,(IDispatch**)&IImage);
               IImage->get_Layout(&ILayout);

               long numWord=0;
               ILayout->get_NumWords(&numWord);
               ILayout->get_Words(&IWords);

               IWords->get_Count(&numWord);

               for ( long i=0; i<numWord;i++)
               {
                  IWords->get_Item(i,(IDispatch**)&IWord);
                  CString csTemp;
                  BSTR result;
                  IWord->get_Text(&result);
                  char buf[256];
                  sprintf(buf,"%S",result);
                  csTemp.Format("%s",buf);

                  csText += csTemp;
                  csText +=" ";
               }

            //Release all objects
            IWord->Release();
            IWords->Release();
            ILayout->Release();
            IImage->Release();
         }
         IImages->Release();

      } else {
         bRet = FALSE;
      }
   } else {
      bRet = FALSE;
   }

   IDobj->Close(0);
   IDobj->Release();
   pVal->Release();

   } else {
      bRet = FALSE;
   }

   return bRet;
}

That is!

Version History

Version 1: No Active-X ctrl in the dialogue, use bReadOCRByMODI(...)

Version 2: Add Active-X ctrl in the dialogue,use bReadOCRByMODIAXCtrl(...)

and version 2 is the 1st demo program posted on the codeproject.