Click here to Skip to main content
13,190,404 members (50,347 online)
Click here to Skip to main content
Add your own
alternative version


12 bookmarked
Posted 8 Nov 2005

Full text search with Oracle Text

, 8 Nov 2005
Rate this:
Please Sign up or sign in to vote.
How to do full text search with Oracle Text

Sample Image - ora_fulltext.gif


Have you ever had a problem like this: I need all documents on my hard drive containing the words Oracle and SQL Server but not MS Access. Do you have Oracle 10g and about 2 hours time? Yes! Then I will show you an easy way to index your documents using Oracle Text. As I am a fan of low tech solutions, everything you need is a text editor and Oracle. We create this solution in three steps:

  • Create a table, index, and Stored Procedure using SQL*Plus
  • Import the documents with VBScript
  • Create a user interface (GUI) for searching the documents

The Oracle part

To start with the Oracle objects, open SQL*Plus. First, we create a new table my_doc with file_name, upload_date, and content as main fields:

   id NUMBER,
   file_name VARCHAR2(255),
   upload_date VARCHAR2(10),
   filesize VARCHAR2(20),
   mime VARCHAR2(50),
   content BFILE,

The field content is defined as type BFILE. This means the content of the document is not stored in Oracle. This is perfectly OK, as we only need the index. The next thing we will do is: define a directory that will be used as the root folder for the documents that we want to index. Grant read access for this directory to user ctxsys.

CREATE OR REPLACE DIRECTORY documents AS 'C:\_work\datastore\';
GRANT READ ON DIRECTORY documents TO ctxsys;

Next, we create an index of type CTXSYS.CONTEXT:


Finally, we create the Stored Procedure put_file:

  p_file_name  IN  my_doc.file_name%TYPE,
  p_upload_date IN my_doc.upload_date%TYPE,
  p_filesize IN my_doc.filesize%TYPE,
  p_mime IN my_doc.mime%TYPE
) AS
   index_name varchar2(20) := 'my_doc_idx';
   INSERT INTO my_doc (id, file_name, upload_date,
   filesize, mime, content)
   VALUES (my_doc_seq.NEXTVAL, p_file_name, p_upload_date,
   p_filesize, p_mime,

That's it - we are finished creating the Oracle objects. Now we can test the solution with SQL*Plus:

exec put_file('test.doc', '10.10.2005', '100', 'doc');

When you execute the above statement, be sure that the file test.doc exists in the root folder of the indexed directory. You should also check for errors in the table ctx_user_index_errors using the following statement:

SELECT err_index_name, err_timestamp,err_text FROM 
       ctx_user_index_errors ORDER BY err_timestamp;

When you have no errors in this table, continue with the VBScript to insert more documents into the table and the index. To check the contents of the index table, you can use the script index_stats.sql that is also in the zipped source files.

Importing the data

Next, we will create a simple VBScript file to import all the documents in a certain folder. First, we open the database connection and get access to the root directory. We insert each file from the directory into the table my_doc using the procedure put_file. In the end, we close the database connection and cleanup all used objects. Here is the complete source of the VBScript file:

Dim strFolder: strFolder = "C:\_work\datastore\" ' index documents from this directory
' open database connection
Dim strCon: strCon = _
    "Driver={Microsoft ODBC for Oracle}; CONNECTSTRING=ORCL; uid=system;pwd=man;" 
Dim oCon: Set oCon = WScript.CreateObject("ADODB.Connection")
oCon.Open strCon
' insert record for each file
Dim oFs: Set oFs = WScript.CreateObject("Scripting.FileSystemObject")
Dim oFolder: Set oFolder = oFs.GetFolder(strFolder)
For Each oFile In oFolder.Files
 WScript.Echo oFile.Name
 Dim strCmd:
 strCmd = "put_file('" & _
   oFile.Name & "','" & _
   Date() & "','" & _ 
   oFile.Size & "','" & _ 
   oFs.GetExtensionName(oFile) & "')"
' cleanup
Set oFolder = Nothing
Set oFs = Nothing
Set oCon = Nothing

That is all we need to import the data. I made the code as simple as possible. You should walk down the directory structure to import an entire tree of documents. You should also probably check for errors - reading the table ctx_user_index_errors.

The user interface

As I promised in the introduction, we only need Notepad, even for the user interface. So we use an HTA as the user interface for searching in documents. To do this, we create a file called search.hta. An HTA uses the GUI-objects from Internet Explorer, but does not need a web server (like IIS) to be displayed. First, we define the head-section:

<title>Indexed files</title>






Please visit the Microsoft website as the starting point for more information about HTA. Next, we need a script section to perform the search and display the results within the HTA page.

Sub StartQuery
 Dim strCon: strCon = "Driver={Microsoft ODBC for Oracle}; " & _ 
                      "CONNECTSTRING=ORCL; uid=system;pwd=man;"
 Dim oCon: Set oCon = CreateObject("ADODB.Connection")
 Dim oRs: Set oRs = CreateObject("ADODB.Recordset")
 oCon.Open strCon
 Dim strSql, strRes
 strSql = "SELECT SCORE(1), file_name, filesize FROM my_doc WHERE CONTAINS(content," & _
          Search.Value & ", 1) > 0 ORDER BY SCORE(1) DESC"
 strRes = "<table border=1 cellpadding=10 cellspacing=0><tr><th>found" & _ 
 Set oRs = oCon.Execute(strSql)
 While Not oRs.EOF
  strRes = strRes & "<tr><td>" & _
    oRs.Fields(0).Value & "</td><td>" & _
    oRs.Fields(1).Value & "</td><td>" & _
    oRs.Fields(2).Value & "</td><td><a href='datastore/" & _
    oRs.Fields(1).Value & "' target=_blank>Open in new windows</a>" & _
 strRes = strRes & "</table>"
 DataArea.InnerHtml = strRes
 Set oCon = Nothing
End Sub

Here are some points of interest in the above code:

strSql = "SELECT SCORE(1), file_name, filesize FROM my_doc " & _
    WHERE CONTAINS(content," & Search.Value & ", 1) > 0 " & _

This is a query performing a full text search on the content of the files. For example: the where condition CONTAINS(content, 'Perl', 1) selects only documents that contain the string Perl and gets the number of hits in the document. The variable Search.Value is the value of a text box defined later in the body section of the HTA.

DataArea.InnerHtml = strRes

When we are finished formatting the results using HTML tables, we assign the table to 'DataArea.InnerHtml'. The object DataArea is also defined in the body section of the HTA. Here, we see the body section (I removed some formatting code like tables and pictures) to keep things easy:

<input type="text" name="Search" size="60">
<input type="button" value="Start Query" 

      name="startQuery" onClick="StartQuery"><p> 
<span id = "DataArea"></span>

In the body section, we define a text area named Search. The content of Search.Value acts as the input for the Select statement. Next, we define a button with caption Start Query with an event handler that calls the subroutine StartQuery as defined in the script section. Lastly, we need a DataArea. We assign the formatted output of the query to 'DataArea.InnerHtml'.

Now we are done - you can start the search interface with a double click from Explorer. See some examples of valid searches in the GUI. For more information on searching in documents, have a look at:

Points of interest

As you can see, it is very simple to create an index for your office documents using Oracle Text. But in this solution, we have only a snapshot of the document contents. What happens when you change a document? The index for this document will no longer be valid. So we should take care of updating the index when a document has changed. I already have an idea of how to solve this problem.


This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


About the Author

Web Developer
Germany Germany
Florian works as consultant for change- and configuration management for about 7 years. In this environment he is often forced to work with unix, perl and shell scripts.

For more information about change- and configuration management (espacially Serena Dimensions) visit:

For video tutorials about, ajax, gridviews, ... (in german) visit:

You may also be interested in...


Comments and Discussions

Questionconnection error in search.hta Pin
mikcutu21-Feb-17 23:01
membermikcutu21-Feb-17 23:01 
GeneralQuery doesn't work Pin
milind.pangarkar29-Mar-10 23:35
membermilind.pangarkar29-Mar-10 23:35 
GeneralOracle Free Text Search Pin
CharlieRedondo7-Jan-09 5:53
memberCharlieRedondo7-Jan-09 5:53 
GeneralRe: Oracle Free Text Search Pin
fstrahberger9-Jan-09 11:05
memberfstrahberger9-Jan-09 11:05 

I don't have currently an orcale instance running so here some untested example:

CREATE TABLE tblTest(id NUMBER,content VARCHAR2(50));
INSERT INTO tblTest(id,content) VALUES (1,'Sports');
INSERT INTO tblTest(id,content) VALUES (2,'support');
INSERT INTO tblTest(id,content) VALUES (3,'xxx Port xxx');
INSERT INTO tblTest(id,content) VALUES (4,'line with portable');

SELECT id,content FROM tblTest WHERE CONTAINS(content,'(%port%)',1)>;0;

this should return:
3,xxx Port xxx
4,line with portable

About case insensitive query i found this quote from an oracle document

3.2.7 Case-Sensitive Indexing and Querying
By default, all text tokens are converted to uppercase and then indexed. This results in case-insensitive queries. For example, separate queries on each of the three words cat, CAT, and Cat all return the same documents.

You can change the default and have the index record tokens as they appear in the text. When you create a case-sensitive index, you must specify your queries with exact case to match documents. For example, if a document contains Cat, you must specify your query as Cat to match this document. Specifying cat or CAT does not return the document.

Hope this helps

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Terms of Use | Mobile
Web04 | 2.8.171016.2 | Last Updated 8 Nov 2005
Article Copyright 2005 by fstrahberger
Everything else Copyright © CodeProject, 1999-2017
Layout: fixed | fluid