65.9K
CodeProject is changing. Read more.
Home

MsAccess MdbTools with MFC and .NET

starIconstarIconstarIconstarIcon
emptyStarIcon
starIcon

4.82/5 (9 votes)

Nov 14, 2011

LGPL3

10 min read

viewsIcon

71676

downloadIcon

10010

Viewer of MsAccess databases directly from MFC and .NET - Repair corrupt databases

Imagen Imagen

Introduction

I used to have problems with some Access 97 databases that went corrupt. Also I had instalation problems with the JET Engine for MsAccess. I found MdbTools, and could repair databases or at least get data out. Then I ported MdbTools to MFC and .NET (only read, not write support). I updated the sources to the latest version of mdbtools.

The MsAccess file structure

Pages

All the info of the mdb file is organized as pages. A page is a region in the file of 2k (Jet3) and 4k (Jet4).
So, the file len of MsAccess databases is a multiple of 2,048 (Jet3) or 4,096 (Jet4). For example, the Nwind.mdb file sample contains 1,548,288 / 2048 = 756 pages.
All pages are stored secuentially, and, as they are all the same size, page n starts at n * 2048 + 1 or n * 4096 + 1 in the file (considering n starting at 0).

For example, for a Jet3 database containing 5 pages (including 0 which is the Database Definition page):

Positions differ from Jet3 to Jet4, so all tables will be referred to Jet3.
Page Number Starts at position (dec) Size (dec)
0 0 2048
1 2048 2048
2 4096 2048
3 6144 2048
4 8192 2048

Page 0 - Database Definition Page

There is not much information about this page. What is known:

From (hex) To (hex) Len (dec) Format Description
0x0 0x0 1 Byte Page type - For the database definition page type is 0
0x1 0x13 19 Unknown
0x14 0x17 4 LE-Int Jet version of the database (0 for Jet3, 1 for Jet4, 2 for Jet5, 3 for Access 2010
0x18 0x3D 38 Unknown
0x3E 0x41 4 LE-Int RC4 Key used to encrypt pages (if the database is encrypted)
0x42 0x50 14 Byte Password (masked)
0x51 0x7FF 1967 Unknown

LE-Int: Refers to Little Endian Int which means that the number is stored from the least significant bit to the most significant bit (LSB-MSB). For example, the number 0xA1B2C3D4 is stored as: D4 C3 B2 A1.

The first byte of each page identifies the page type:
  • 0x00 Database definition page. (Always page 0)
  • 0x01 Data page
  • 0x02 Table definition
  • 0x03 Intermediate Index pages
  • 0x04 Leaf Index pages
  • 0x05 Page Usage Bitmaps (extended page usage)

Page 1

Keeps track of allocated pages in the database.

Page 2 - Table definition page for MSysObjects

MsAccess databases include system tables that contain information about the database itself. They all start with the prefix MSys.
Page 2 contains the Table definition (Page type 0x02) of the table MSysObjects. This table contains all the objects names and (more important), the page number location of them.

The structure of a Table Definition page is:

From (hex) To (hex) Len (dec) Format Description
0x0 0x0 1 Byte Page type - For the table definition page type is 2
0x1 0x7 8 Unknown
0x8 0x9 2 LE-Int Page len
0xC 0xF 4 LE-Int Number of rows
0x17 0x18 2 LE-Int Number of variable columns
0x19 0x1A 2 LE-Int Number of columns
0x1B 0x1E 4 LE-Int Number of indexes
0x1F 0x22 4 LE-Int Number of index entries
0x23 0x26 4 Pages usage bitmask
0x27 0x2A 4 LE-Int Pages with free space bitmask

Column definitions start at 0x2B + (Number of index entries * 8). Column properties go first. Then go column names (in the same order that properties).
Column properties have a fixed size of 18 bytes.

Assuming "Number of index entries" = 2 (for MSysObjects), column properties start at 0x3B. The number of columns was retrieved before, so you must read 18 * numcols bytes.

From (hex) To (hex) Len (dec) Format Description
0x3B 0x3B 1 Byte Column type
0x3C 0x3C 1 Byte Column Number
0x3D 0x3E 2 LE-Int Variable col number
0x3F 0x40 2 LE-Int Row col number
0x41 0x45 5 Unknown
0x46 0x46 1 Byte Precision
0x47 0x47 1 Byte Scale
0x48 0x48 1 Byte Flags. Include isfixed, is long autonumeric, uuid auto
0x49 0x4A 2 LE-Int Column fixed offset
0x4B 0x4C 2 LE-Int Column size offset

After the properties, the name of the columns appear. The first byte contains the size and then you read that amount of bytes (the name). So, column names can´t be bigger than 256 chars
Column names are stored in compressed unicode.

For example, for MSysObjects there are 17 columns, so 0x3B adds to 17 * 18 = 16D The first column is Id (len 2)

From (hex) To (hex) Len (dec) Format Description
0x16D 0x16D 1 Byte Column name size
0x16E 0x16F 2 Unicode Text Column Name

After the columns, the indexes definition follow.

Data allocation

Data is stored in data pages (page type 1) with the structure:

From (hex) To (hex) Len (dec) Format Description
0x0 0x0 1 Byte Page Type (1 for this page)
0x1 0x1 1 Byte Unknown
0x2 0x3 2 LE-Int Free space in this page
0x4 0x7 4 LE-Int Page pointer to page definition
0x8 0x9 2 LE-Int Number of records

After that information, there are offsets of 2 bytes to records (there are 2 * num_records bytes of this items). After that, records start.
To know record len you substract next offset to previous offset.

In the table definition there is a field called Pages usage bitmask:
The 3 first bytes of that item are the page and the last byte the row of the page usage bitmap.
For example, in the sample database, the bitmap is located in Page 6, row 0.

Bitmap data is 133 bytes long and contain:
From (hex) To (hex) Len (dec) Format Description
0x0 0x0 1 Byte Map type (0 or 1)
0x1 0x4 4 Byte Base page for which this map applies
0x5 0x85 128 Byte Each bit contains allocation of this table (boolean). Allocation can be set to 128 pages after base page

So, this basic allocation cannot hold more than 128 pages. As pages are of size 2048, that gives: 128 * 8 * 2048 = 2Mb. For bigger databases the field Pages usage bitmask points to a record containing an map identifier (1) of 1 byte and secuentially pointers of 4 bytes to pages of type 5 that contain bitmaps.

The code

Solution structure

The solution contains 5 projects:
  • MdbToolsLib: Contains the GLib (C Libraries) and libmdb (Access). It also contains classes to interface with projects MFCMdbTools and NETMdbTools. It uses STL.
  • MFCMdbTools: The idea of this Dll is to act as a layer between MFC and MdbToolsLib. Most of the code is CString conversion
  • NETMdbTools: The idea of this Dll is to act as a layer between .NET and MdbToolsLib. It uses C++/CLI.
  • MFCMdbToolsTestApp: MFC test app that uses MFCMdbTools.
  • NETMdbToolsTestApp: C# test app that uses NETMdbTools. It includes export functionality.

Naming conventions

There are 3 projects which share the same functionality (MdbToolsLib, MFCMdbTools, NETMdbTools) for different libraries (C/C++, MFC, .NET).
So, there are 3 prefixes, one for each library:
  • MdbToolsLib: The prefix is MdbLib. For example, MdbLibDatabase
  • MFCMdbTools: The prefix is MFCMdb. For example, MFCMdbDatabase
  • NETMdbTools: The prefix is NETMdb. For example, NETMdbDatabase

MdbToolsLib

Class MdbLibDatabase

CMdbLibDatabase() Initializes the MdbTools handler as 0 so that you can check if a database is loaded.
void* mdb; Private variable that contains the MdbHandle (struct) of the database. It is declared as void so that you don't need the MdbTools.h header.
int Open(const char* strname); Opens the database and assigns Mdbhandle. Returns a 0 for error, other for success.
void Close(); If there is an open database, it closes all tables handles and closes the Mdbhandle database.
void LoadTables(); Loads the database catalog and creates CMdbLibTable objects for each one in the tables array.
map<char*, CMdbLibTable, cmp_str> tables; This is the array of tables. cmp_str is a comparison to find items sorted by char*

Class MdbLibTable

CMdbLibTable() Initializes the MdbTableDef object (tdef) and the name (char*) as 0.
char* Name; Contains the name of the table. It is allocated and assigned in CMdbLibDatabase::LoadTables() and deleted in ClearTableColumns()
map<char*, CMdbLibColumn, cmp_str> columns; This is the array with the columns of the table. It is loaded in LoadTableColumns()
map<char*, CMdbLibValue, cmp_str> cvalues; This is the array with the values for iterating the table rows. It is loaded and allocated in LoadTableColumns() but values are assigned in NextItem()
void LoadTableColumns(); Loads columns and values (columns and cvalues arrays). Allocates memory for both. Binds cvalue data to col so that it is assigned when row changes
void ClearTableColumns(); Frees the MdbTableDef tdef variable, and all cvalues allocated space for row data.
int NextItem(); Executes the mdb_fetch_row and returns if eof. Values are assigned inside mdb_fetch_row by the bind_ptr.
void MoveFirst(); Executes the mdb_rewind_table so that record cursor moves to the first row.
void* mdb; MdbHandle variable of the database. Declared as void to avoid Mdbtools.h dependency.
void* tdef; This is the MdbTools handle for the table. It is assigned in LoadTableColumns().
BOOL IsSystem; If the table is system table (name starts with MSys).
void* GetEntryByName(char* name); Finds the entry in the catalog by table name. This entry is neccesary to perform the mdb_read_table.

Class MdbLibColumn

enum MdbLibColType Contain the column type constants. It mimics MdbTools column type constants. They exist to provide independence from MdbTools.h.
char* Name; Column name. It is allocated and assigned in CMdbLibTable::LoadTableColumns().
MdbLibColType Type; Column type.
int Size; Column fiexd size.
int IsInt(); Boolean used to group all integer types for formatting porpuses.
int IsNumDec(); Boolean used to group all float types for formatting porpuses.
int IsBinary(); Boolean used to group all binary (memo, OLE, etc) types for formatting porpuses.
int IsDate(); Returns if the column type is date (for formatting porpuses).

Class MdbLibValue

char* value; This is the buffer where the column value is stored when iterating rows. It is assigned to bind_ptr in CMdbLibTable::LoadTableColumns().
int len; This is real len of the data stored in value. It is assigned to len_ptr in CMdbLibTable::LoadTableColumns().
int GetIntValue(); Returns the value stored in "value" variable as int.
double GetDoubleValue(); Returns the value stored in "value" variable as double. To do so it eliminates the exponent and multiplies accordingly.
void GetTimeValue(SYSTEMTIME* st); Returns the date in SYSTEMTIME format.

Libraries

The original MdbTools uses GLib, so, there are dependencies and libglib is needed to run the application.
I had the idea of replacing GLib with STL and native windows functions, but then, it would be a lot of work and besides, when MdbTools sourcecode improves, I would have a difficult time upgrading.

The project is in VS 2010 and uses the latest MFC and .NET 4 but it can be easily ported the older versions of Visual Studio.
To compile MFC and .NET apps, first compile GLib and copy the .lib in the glib folder of the app.

History

  • 2012-01-13: Improve article with some MsAccess format detail.
  • 2011-12-20:

    • Made a common .lib interface (including GLib) for both .NET and MFC and put it all into one single project.
    • Added support for Binary fields (OLE). Improved support for numbers and dates
    • In the .NET app added export to DAO database to repair corrupt databases
  • 2011-11-21. GLib library now links into dlls.
  • 2011-11-14. First version that uses glib dlls.

Licence and sourcecode

The licence for this project is LGPL which basically means that you can use the library in any project (even commercial apps) but if you make modifications to the library itself you must publish them. The sources were taken from https://github.com/brianb/mdbtools.

To do

  • Add support for other database objects (queries, modules, forms, etc).
  • Add ATL/COM interface for MdbTools.
  • Add write support.

Acknowledgements

To the authors of mdbtools: Brian Bruns, Karl Nyberg, Georg Bauer, Carl Seutter, Trevor Harrison, Brent Johnson, Tim Nelson, David Mansfield, Jeff Smith, Steve Langasek, Rene Engelhard, Vincent Fourmond, Tim Retout, Nirgal Vourgere