Click here to Skip to main content
15,881,803 members
Articles / Desktop Programming / MFC
Article

Things NOT to do with CRecordSet (or any other database code)

Rate me:
Please Sign up or sign in to vote.
4.62/5 (36 votes)
8 Jan 2004CPOL7 min read 130.1K   35   29
Some general pitfalls noticed when using relational databases.

Introduction

A few years ago I joined a small software development company. I was assigned responsibility for a relatively new product of theirs that had been created in a hell of a rush and that had significant performance problems.

The product in question (no, I'm not going to name either the company or the product), was a database application split into a couple of components. One component ran as a service and populated a relational database from external data. The other component was a client used to search the database and display records.

The title of this article implies it's about the CRecordSet class. Actually it's not - it's more about things to avoid in any code dealing with relational databases. I simply happened to encounter them in code that used CRecordSet. I'm also not a database expert - I just use em when I can't avoid em :)

The first problem

was in the service. The external data source was files that were created in a directory being watched by the service. A new file appears, the service notices and kicks into action, and extracts data which is added to the database.

The code that did the database insert looked something like this (in pseudocode).

C++
CRecordSet rs;

rs.Open(parameters);
rs.AddNew();

//    set recordset data members for new record from data source
.
.
.
rs.Update();
Pretty straightforward code that follows the MSDN samples. It works fine with a small database. But give it a large database and performance degrades so badly that it can take upwards of an hour to complete the rs.AddNew(); call. In the case I'm writing about, we had a customer who had a database containing 3 million records and they were measuring 40 minutes to add one record. Why?

Let's look at CRecordSet as created by the ClassWizard in VC 6 a little more closely. You run the ClassWizard to add a new CRecordSet derived class to your application which in turn creates a new class definition/implementation file. The class in turn defines a default SQL query string which is used (unless overridden) when the object instance is opened. A dummy class I just created using the NorthWind database sample and the Categories table returns this default SQL query.

C++
CString CDummySet::GetDefaultSQL()
{
    return _T("[Categories]");
}
Single stepping through the code (including MFC classes) revealed that opening the recordset returned a single record from the database. But when the rs.AddNew() line was executed the entire database was copied into a temporary file on the local machine. Read that again. The ENTIRE database was copied, record by record, into a temporary file on the local machine.

Man, those guys at Microsoft must have been on crack when they wrote that code.

Well no, they weren't. Let's think a little about this. We've opened a database table with a query that will return all data in the table. We've then tried to add a new record. Unless they know, unequivocally, that there is a server at the other end of the connection that will understand a 'move to the end of the database and add a record' command they have no choice. Since the CRecordSet class supports any ODBC database it can't make that assumption. If the database at the other end of the connection is an Access database there's no server - thus there is no way to issue a command to a database server to append a record. You're left with the hard choice to find the end of the database yourself. Hence the copy.

One might argue that they should have added code to determine if there's a server at the other end. Maybe. But there is a better solution.

The solution to the first problem

was to write a function that dynamically builds an INSERT SQL statement. The syntax is pretty simple and the update command can be issued to a CRecordSet instance using the parent CDataBase::ExecuteSQL() function. I was able to cut the insert time on a 3 million record database from 40 minutes to rather less than a tenth of a second.

First solution revisited

Part of the problem with using the default SQL query string is that it returns all records. The AddNew() function then iterated over each record to find the end of the recordset. You could try to solve this by changing the default SQL query string to one that would select no records whatsoever, in which case the iteration over the recordset should be very fast. Well maybe. Remember that you don't know if you have a server at the other end of the connection. If you do, and the table you're doing an update to has indexes and if your SQL query string (which returns no records) uses those indexes then this is true. You're also assuming enough knowledge about the contents of the database to be able to issue a query that will return no records. That's a pretty big assumption. It's better, in my opinion, to dynamically build a INSERT SQL statement because that decision involves NO assumptions about the database apart from knowledge of field data.

The second problem

was in the search client. It's related to the problem in the service but the solution is different. The product manager had decided (not unreasonably) that when a search of the database was performed the GUI should show in the status line the number of records that had been found.

The piece of code I'm about to discuss was written by a friend of mine (he's had veto rights on this part of the article) so I'm being careful :)

Management had mandated that the GUI show how many records had matched the query. One creates a CRecordSet with filtering criteria as specified by the user and runs the query. Then one looks at the class definition for CRecordSet and notices a function called CRecordset::GetRecordCount(). Aha! Call it and it returns some value. Plug that into the GUI update code and all looks rosy except that it's wrong. QA notice and it's back to the drawing board. A closer read of the MSDN docs is called for. My friend notices this caution.

Caution The record count is maintained as a "high water mark" "the highest-numbered record yet seen as the user moves through the records. The total number of records is only known after the user has moved beyond the last record. For performance reasons, the count is not updated when you call MoveLast. To count the records yourself, call MoveNext repeatedly until IsEOF returns nonzero. Adding a record via CRecordset:AddNew and Update increases the count; deleting a record via CRecordset::Delete decreases the count.

So my friend heeds the advice in MSDN and writes a loop to count the records, using MoveNext() until IsEOF() returns nonzero.

It all works in his 100 record test database. And it works in QA's 1000 record database and no one notices it takes time to run. When our 3 million record customer runs it it takes 40 minutes. And we wonder why they are disappointed? The MoveNext on our CRecordSet instance causes the contents of that record to be copied (behind our back) to a temporary file on local storage.

Man, even if the Microsoft guys weren't on crack the first time surely they must be this time?

Nope :) And for the same reason. They cannot assume a server at the other end. And, as I discovered, it can take significant time to navigate to the end of the recordset.

The solution to the second problem

was to create a custom CRecordSet class that executes a SELECT COUNT(*) FROM table WHERE search_criteria SQL statement on another thread which returns the number of records that matches the search criteria. It still takes time (I typically saw 15 seconds on a remote Access database and 10 seconds on a SQL Server database) but that's a damn sight faster than the other approach.

The second problem revisited

This is also the first problem revisited :)

I found that both initial solutions ((ie) the wrong solutions) failed intermittently. The reason was that if one is querying a 3 million record database one had better have sufficient disk space to accomodate the entire database (remember that if one is using the wrong solution the entire database is copied to your local machine)).

Bottom line?

I cannot fault Microsoft in any of this. In the end our performance problems occurred because everyone assumed that a test on a 1000 record database would scale up to a 3 million record database and no one bothered to validate that assumption.

History

January 9, 2004 - Initial version.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
United States United States
I've been programming for 35 years - started in machine language on the National Semiconductor SC/MP chip, moved via the 8080 to the Z80 - graduated through HP Rocky Mountain Basic and HPL - then to C and C++ and now C#.

I used (30 or so years ago when I worked for Hewlett Packard) to repair HP Oscilloscopes and Spectrum Analysers - for a while there I was the one repairing DC to daylight SpecAns in the Asia Pacific area.

Afterward I was the fourth team member added to the Australia Post EPOS project at Unisys Australia. We grew to become an A$400 million project. I wrote a few device drivers for the project under Microsoft OS/2 v 1.3 - did hardware qualification and was part of the rollout team dealing directly with the customer.

Born and bred in Melbourne Australia, now living in Scottsdale Arizona USA, became a US Citizen on September 29th, 2006.

I work for a medical insurance broker, learning how to create ASP.NET websites in VB.Net and C#. It's all good.

Oh, I'm also a Kentucky Colonel. http://www.kycolonels.org

Comments and Discussions

 
QuestionAdd Item in large tables Pin
Angel Brisighelli24-Dec-11 7:29
Angel Brisighelli24-Dec-11 7:29 
GeneralCRecordset and Sybase database Pin
ahmed_magdy9-Dec-06 5:15
ahmed_magdy9-Dec-06 5:15 
NewsCounting Class for CRecordset Pin
Paul S Ganney4-Apr-06 23:23
Paul S Ganney4-Apr-06 23:23 
GeneralRe: Counting Class for CRecordset Pin
Member 316730618-Jul-09 12:39
Member 316730618-Jul-09 12:39 
GeneralRe: Counting Class for CRecordset Pin
Paul S Ganney19-Jul-09 23:55
Paul S Ganney19-Jul-09 23:55 
GeneralRe: Counting Class for CRecordset Pin
Member 316730629-Jul-09 11:21
Member 316730629-Jul-09 11:21 
GeneralExecution time isn't the only symptom Pin
Yvan Rodrigues3-Aug-05 7:09
professionalYvan Rodrigues3-Aug-05 7:09 
GeneralCRecordset::Update() problem for MFC and no select for public role for sysdatabases Pin
Katarzyna21-Jun-05 5:49
Katarzyna21-Jun-05 5:49 
GeneralJust a little of my experience... Pin
S Douglas2-Jan-05 21:24
professionalS Douglas2-Jan-05 21:24 
I would like to share my experience with working with databases. I use ADO to connect to Access database (much to my chagrin) at work. Good article its always interesting to see what other have done to overcome problems I have faced as well.

I have found that creating a “dummy” recordset like so many books & others suggest causes more problems that is solves.

1: Like the author suggested if you don’t filter out all of the records from the table your potentially coping every record to the local computer, unless you use server side recordsets (which can only be done with SQL server and not an Access dB).
2: When doing this you always have to check to make sure a valid recordset was created. It is possible that an error occurred opening the recordset, attempting to add new record would naturally fail then.
3: You must check to make sure a read only recordset was not created. If a read-only recordset was created then you can not write to the recordset.
4: Logically to me it doesn’t make a whole lot of sense, your making a connection to the dB, creating a new recordset, populating it (even if no records are retrieved), then moving to a new record, finally adding a new record. That’s a lot of work.

What the author suggests is one of the simplest solutions, create an INSERT query, doing so only requires you have a valid read / write connection to the dB. You don’t have to worry about making sure that no error occurred with the recordset because there is not one. The only debugging (if there are problems) is with the INSERT query. Additionally if you need to add many new records at a time it’s a simple matter of looping through the data and inserting. Finally the INSERT query offers a lot of flexibility, for example I have a script that uses a complex INSERT query that adds new records from the “live” dB to a local copy where I do testing.

As for the problems with the record count, it’s all in the cursor (well using ADO anyway, I don’t use Crecordset, even when using VC++). Each cursor has different properties. Depending on the level of performance / efficiency you desire will dictate the type of cursor you need, below is a snippet from the MSDN defining the four different cursor types.

Static cursor. This is the one to use for generating reports or finding data. Additions, changes, or deletions by other users are not visible. adOpenStatic

Forward-only cursor. This is the default. It is identical to the Static except that you can only scroll forward. The fastest cursor this side of the Pecos Mountains. adOpenForwardOnly

Dynamic cursor. Additions and deletions by others are visible. All movement is supported. But some providers don't support this cursor type. adOpenDynamic

Keyset-driven cursor. This is similar to a Dynamic cursor except you can't see records others add. If another user deletes a record, it is inaccessible from your recordset. adOpenKeyset


Now if I could only find an article concerning creating (its structure and working with multiple views and such) a large-scale MDI app that uses ADO. I suppose all good things in time, eh?




Social Engineering Specialist.

Because there is no patch for human stupidity.
GeneralRe: Just a little of my experience... Pin
DonTb16-Mar-06 0:00
DonTb16-Mar-06 0:00 
GeneralRe: Just a little of my experience... Pin
S Douglas16-Mar-06 4:49
professionalS Douglas16-Mar-06 4:49 
GeneralRe: Just a little of my experience... Pin
Jerry Jeremiah20-Jul-06 23:51
Jerry Jeremiah20-Jul-06 23:51 
GeneralNeed your help...Please Pin
vivadot18-Oct-04 15:17
vivadot18-Oct-04 15:17 
GeneralNice article Pin
Ezz Khayyat14-Jan-04 0:15
professionalEzz Khayyat14-Jan-04 0:15 
GeneralUnitialized CTime, AddNew and Update( ) Pin
Jazee13-Jan-04 11:16
Jazee13-Jan-04 11:16 
GeneralRe: Unitialized CTime, AddNew and Update( ) Pin
Rob Manderson13-Jan-04 12:16
protectorRob Manderson13-Jan-04 12:16 
GeneralRe: Unitialized CTime, AddNew and Update( ) Pin
BOABOA22-Jan-04 5:13
BOABOA22-Jan-04 5:13 
GeneralRe: Unitialized CTime, AddNew and Update( ) Pin
Rick Crone7-Feb-06 5:18
Rick Crone7-Feb-06 5:18 
Questiondynaset? Pin
Rick Crone13-Jan-04 10:40
Rick Crone13-Jan-04 10:40 
AnswerRe: dynaset? Pin
Rob Manderson13-Jan-04 12:12
protectorRob Manderson13-Jan-04 12:12 
GeneralEmpty Recordset Pin
Neil Sl12-Jan-04 20:48
Neil Sl12-Jan-04 20:48 
GeneralRe: Empty Recordset Pin
Rob Manderson12-Jan-04 22:04
protectorRob Manderson12-Jan-04 22:04 
GeneralRe: Empty Recordset Pin
GWSyZyGy13-Jan-04 11:37
GWSyZyGy13-Jan-04 11:37 
GeneralRe: Empty Recordset Pin
Rob Manderson13-Jan-04 12:15
protectorRob Manderson13-Jan-04 12:15 
GeneralNicely Done, Rob Pin
Roger Wright12-Jan-04 7:40
professionalRoger Wright12-Jan-04 7:40 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.