Balanced Multiway Trees

JorgeLuisOrejel

4.86/5 (11 votes)

Oct 15, 2014

CPOL

27 min read

24333

Balanced Multiway Trees

NOTE: This article is Chapter 8 from my unpublished textbook Applied Algorithms and Data Structures.

Top-Down 2-3-4 Trees

The following discussion is adapted from Sedgewick, R Algorithms in C. Reading, Massachusetts: Addison-Wesley, 1990, pp. 215-220. In 2-3-4 trees, nodes can hold more than one key. Nodes with two keys (and three children) are called 3-nodes, whereas nodes with 3 keys (and four children) are called 4-nodes. The structure of 2-3-4 nodes is illustrated below.

Ordinary binary trees are a special case of 2-3-4 trees, consisting exclusively of 2-nodes containing one key and having two children.

The keys stored in the sub-trees of 3-4 nodes, and those stored in the nodes themselves, satisfy the following orderings.

Orderings for 3-nodes	Orderings for 4-nodes

As an illustration of the kinds of transformations that can be performed on the nodes of 2-3-4 trees during insertion operations, consider the initial tree shown on the right.
If we insert the key "X", then the 2-node containing "S" can be turned into a 3-node to store the new key in its proper place. This is an example of augmenting a node. Similarly, a 3-node can be transformed into a 4-node.
It is not difficult to imagine a situation where we end up having to insert a new key into a 4-node. For example, the key "G" would go into the middle 4-node of the example tree. A simple-minded solution would be to let the height of the tree increase.
The problem with this approach is that the tree becomes unbalanced. A better solution is to spread the tree horizontally. First, split the 4-node into two 2-nodes containing the left and right keys, and transfer the middle key to the node’s parent.
Then, insert the new key into one of the 2-nodes produced. The second approach also presents a problem. What if it is necessary to split a 4-node whose parent is also a 4-node? In that case, split the parent and, if necessary, the grandparent, and so on all the way up to the root.

To avoid the need to split multiple nodes after an insertion, we can follow a top-down approach. During insertion, make sure that the parent of any node visited will not be a 4-node. This is accomplished by splitting any 4-node visited on the way down the tree. In addition, whenever the root of the tree becomes a 4-node, split it into three 2-nodes. This is simpler than waiting until the next insertion to do the split because we need not worry abot the parent of the root. Splitting the root is the only operation that increases by one the height of the tree. Two of the possible transformation rules for 2-3-4 trees are illustrated below.

As an example of the top-down approach to build 2-3-4 trees, consider the insertion of the sequence of keys ABCDEFGHIJKLM into an initially empty tree. (Note that such a sequence of keys would lead to a right-skewed binary-search tree or "vine".) The following diagrams illustrate the insertion of keys "A" through "H". The reader is encouraged to finish the tree by inserting the remaining keys.

Properties of Top-Down 2-3-4 Trees

Top-Down 2-3-4 trees have three important properties. First, the transformation operations performed during insertions yield perfectly-balanced trees. Second, searches in N-node 2-3-4 trees never visit more than (log)_2(N)+1 nodes because the distance from the root to every leaf is the same, and splitting the root increases the height by one. Hence, if all nodes are 2-nodes the result holds since the tree is a perfect binary tree. If there are 3-nodes and 4-nodes, the height can only be lower. Third, insertions into N-node 2-3-4 trees require fewer than (log)_2(N)+1 node splits in the worst case, and seem to require less than one node split on the average. Empirical studies consistently show that very few splits occur. (This will be confirmed later by sample executions.)

Representation of 2-3-4 Nodes

In order to implement search, insertion, and deletion operations on 2-3-4 trees, we must design a suitable data structure to represent the nodes. One possibility is to use unions in C or variant records in Pascal. For instance, assume that DataType denotes a suitable type for the keys stored in 2-3-4 trees. We could define a Pascal variant record as follows.

type
     nRange = 2 .. 4;

     a234ptr = ^ a234rec;
     a234rec = record
                 d1 : DataType;
                 left, right : a234ptr;
                 case n : nRange of
                      3 : ( d23 : DataType;
                            mid : a234ptr );
                      4 : ( d24, d3 : DataType;
                            midL, midR : a234ptr )
               end;

Alternatively, we could define at the outset the largest possible Pascal record (or C/C++ structure), and enforce its proper manipulation. For example, in C/C++ we could define

typedef struct a234struct a234str;
typedef a234str * a234ptr;

struct a234struct {
        int n;
   DataType d1, d2, d3;       // n-1 valid
    a234ptr p1, p2, p3, p4;   // n valid
};

The above two possibilities for defining 2-3-4 nodes would cause the implementation of operations on 2-3-4 trees to be quite involved, because the data and pointer fields must be individually manipulated by name. (See, for example, Aho, A.V., J.E. Hopcroft, and J.D. Ullman Data Structures and Algorithms. Reading, Massachusetts: Addison-Wesley, 1983, pp. 176-178, which presents a two-page implementation of the insertion operation on 2-3 trees!) Such implementation is based on explicit naming of fields in a Pascal variant record.) In order to motivate our implementation, it is worth quoting what has been said on the issue of implementing 2-3-4 trees:

It seems that Sedgewick’s opinion explains why 2-3-4 trees are usually implemented in terms of binary red-black trees. (See, for example, Cormen T.H., C.E. Leiserson, R.L. Rivest, and C. Stein Introduction to Algorithms. Cambridge, Massachusetts: The MIT Press, Second Edition, 2001, Chapter 13.)

It turns out that 2-3-4 trees can be represented in a simple and uniform way, without resorting to the "coloring" trick used with red-black trees. The data and pointer fields of 2-3-4 trees can be implemented by means of arrays, which allow both systematic and efficient access, as shown below.

typedef struct a234struct a234str;
typedef a234str * a234ptr;

struct a234struct {
        int n;
   DataType d[3];   // 0 .. n-2 data elements valid
    a234ptr p[4];   // 0 .. n-1 pointer elements valid
};

The above definitions are in the spirit of Wirth’s Pascal implementation of insertion, deletion, and search operations on B-trees. (Wirth, N. Algorithms + Data Structures = Programs. Englewood Cliffs, N.J.: Prentice-Hall, 1976, pp. 252-257.) We will generalize this representation in order to implement generic, balanced, multiway trees (MWTs) of order m, where m is the number of pointers stored in a node. We will use generic pointers for the data items stored in the nodes.

In general, for implementation purposes, an n-node of a multiway tree (an MWT) contains n data items (or keys), and n+1 pointers. The following declarations define MWTs that can contain from 3-nodes to 6-nodes. The order of the MWT equals the size of the maximum node plus one.

#define N 2 // minimum mumber of data items in a node
#define M 5 // maximum number of data items in a node (M >= N + 1)

typedef struct mwtStruct mwtStr;
typedef mwtStr * mwtPtr;

typedef char * genPtr;  // Generic pointer (simply,, a pointer to a byte)

typedef struct InfoPtrStr { // Substructure of mwtStr
   genPtr info;             // Pointer to information object
   mwtPtr ptr;              // Pointer to mwt
};

typedef struct mwtStruct {
          int m;            // Number of info pointers: N <= m <= M
                            // -> Number of mwt pointers == m + 1
   InfoPtrStr e[ M + 1 ];   // e[ 0 ].info not used
};

Given these definitions, an object stored at position i (actually pointed to by InfoPtrStr e[ i ].info) of a node is "bracketed" by the pointers InfoPtrStr e[ i-1 ].ptr and e[ i+1 ].ptr. The valid range for i is from 1 to M. The allocation of structures for MWTs is accomplished by means of the following two functions.

In practical applications, the heap space used to allocate dynamic structures might become exhausted. This is the reason for the implementation and use of function AllocAbort. If that function is ever called, the program terminates issuing an appropriate message.

The implementation of operations on MWTs is not a trivial task. Therefore, I will follow a wishful-thinking, top-down approach to implement a generic MWT abstract data type (ADT). We start off with the skeleton of a class definition, showing the data and function members that immediately come to mind from past experience with other ADTs. In addition to the constructor and the destructor, the obvious operations on MWTs are insert, remove, search, print, and destroy. Observe that the public functions defined to perform these operations hide from the user the pointer to the root of the tree. Observe also that an MWT is destroyed only if both the OKtoDelInfo flag is non-zero, and the function pointer DelFn is not bound to NULL.

enum Relval { LT, EQ, GT };

class gMWtree { // Generic MWT
   protected:
      mwtPtr root;                         // Pointer to root of MWT
         int Mdiv2,                        // M >> 1 if M is even
                                           // (M >> 1)-1 if M is odd
             PrintFlag,                    // General-purpose flag to
                                           //   element-print function
             OKtoDelInfo;                  // Flag to enable deletion of
                                           //   *info objects
        void (*PrnFn)( genPtr, int );      // Pointer to info-print function
      RelVal (*CmpFn)( genPtr, genPtr );   // Pointer to info-comparison function
        void (*DelFn)( genPtr );           // Pointer to info-delete function
      genPtr (*CoyFn)( genPtr, genPtr );   // Pointer to info-copy function

             ...   // Protected function members to be defined during
                   //   the top-down design

   public:
             gMWtree();
             ~gMWtree()
             {
                if ( OKtoDelInfo && (DelFn != NULL) )
                   destroy();
             }
        void destroy() { DeleteAll( root ); }
      genPtr search( genPtr gp ) { return prSearch( gp, root ); }
      genPtr insert( genPtr );
        void print() { PrintTree( root ); }
      genPtr remove( genPtr );
};

The constructor initializes all the data members to define a truly abstract class. The data member Mdiv2 defines the "mid point" of an m-node that must be split after becoming full.

gMWtree::gMWtree()
{
   root = NULL;
   PrnFn = NULL;     CmpFn = NULL;
   DelFn = NULL;     CpyFn = NULL;
   OKtoDelInfo = 1;  PrintFlag = 0;
   Mdiv2 = M >> 1;
   if ( IsOdd( M ) )
      ++Mdiv2;
   printf( "M div 2 == %d\n", Mdiv2 );
}

Function IsOdd is simply defined as

int IsOdd( int i ) 
{ 
   return i & 0x1;
} .

The deletion of all the elements of an MWT is done recursively. Observe that function DeleteAll (protected) performs an in order traversal of the tree. Because of the call free( t ) at the end of the function, the second recursive call is not tail-recursive. Hence it cannot be replaced by a goto statement to a label before the if statement.

void gMWtree::DeleteAll( mwtPtr t )
{
   int i;

   // OKtoDelInfo && (DelFn != NULL)
   if ( t != NULL ) {
      DeleteAll( t->e[ 0 ].ptr );
      for ( i = 1; i <= t->m; ++i ) {
         (*DelFn)( t->e[ i ].info );
         DeleteAll( t->e[ i ].ptr );
      }
      free( t );
   }
}

Search, Insertion, and Print Operations on Multiway Trees

Assuming that we have built an MWT, searching the tree is the first operation that comes to mind. In the class definition, the public function search initiates the search at the root of the tree by calling the protected function prSearch.

Argument gp to function prSearch points to an object containing partial information, that is, search key(s). If an object matching the keys is found in the tree, the function returns a pointer to such object; otherwise it returns NULL. Since MWTs are recursive structures, it is natural to write this function recursively, as indicated by the assertion before the two statements in the last else clause. Such a call would be tail-recursive. These statements and the label before the first if eliminate the tail recursion.

genPtr gMWtree::prSearch( genPtr gp, mwtPtr t )
{
   int i;   RelVal cmp;

l0:
   if ( t == NULL )
      return NULL;
   else { // t != NULL
      i = FindPlace( gp, t, 0, &cmp );
      // i <= t->m  ||  cmp != LT
      if ( cmp == EQ )
         return t->e[ i+1 ].info;
      else { // i == t->m || cmp == GT
         // return prSearch( gp, t->e[ i ].ptr );
         t = t->e[ i ].ptr;
         goto l0;
      }
   }
}

Function prSearch relies on function FindPlace (protected) in order to find the place of the object sought in the multiway tree. It receives a pointer to the search object (gp), a pointer to a node (t), the index to start the search within the node ( i), and a reference argument to return the result of the comparison function (cmp). Since MWTs are extensions of binary-search trees, the search continues as long as the object sought for is less than the objects stored in the node. The function returns the index of the pointer to the left of the object within the node that is greater than or equal to the search object. (In my first implementation of MWTs as 2-3-4 trees, the while loop in function FindPlace was part of function prSearch. Pretty quickly, I realized that such a loop would have to be used during insertions, and hence I turned it into a function.)

int gMWtree::FindPlace( genPtr gp, mwtPtr t, int i, RelVal *cmp )
{
   // t != NULL
   // && (*CmpFn)( t->e[ j ].info, gp ) == LT  for  1 <= j < i
   while ( i < t->m
           &&
           (*cmp = (*CmpFn)( t->e[ i+1 ].info, gp )) == LT )
      ++i;
   // i <= t->m || *cmp != LT
   return i;
}

The next operation that comes to mind is the insertion of an object into an MWT. If the tree is empty, function insert creates a 1-node (1 key, 2 pointers), and returns NULL as an indication of the insertion of a new object.

Following the discussion on how to perform insertions in order to maintain a balanced tree, if the tree is not empty, function prInsert (protected) is called to perform the insertion. After the insertion, a full root node is split in preparation for the next insertion. Such a split is the only operation that makes the height of an MWT to increase by one.

int gMWtree::FindPlace( genPtr gp, mwtPtr t, int i, RelVal *cmp )
{
   // t != NULL
   // && (*CmpFn)( t->e[ j ].info, gp ) == LT  for  1 <= j < i
   while ( i < t->m
           &&
           (*cmp = (*CmpFn)( t->e[ i+1 ].info, gp )) == LT )
      ++i;
   // i <= t->m || *cmp != LT
   return i;
}

The creation of a 1-node is accomplished by the protected functions Mk1node and Mk0node, defined as follows. (For self-tracing, function Mk1node prints a message and the object stored in the new 1-node.)

mwtPtr gMWtree::Mk1node( genPtr gp, 
                         mwtPtr p0, mwtPtr p1 )
{
   mwtPtr q;

   printf( "Mk1node: " ); 
   (*PrnFn)( gp, PrintFlag ); NL();
   q = Mk0node();
   q->m = 1;
   q->e[ 0 ].info = NULL; // unused field
   q->e[ 0 ].ptr = p0;
   q->e[ 1 ].info = gp;
   q->e[ 1 ].ptr = p1;
   return q;
}

mwtPtr gMWtree::Mk0node()
{
   mwtPtr p = AllocMWTstr();

   p->m = 0;
   return p;
}


void NL() 
{ 
   printf( "\n" ); 
}

Function prInsert (protected) below does all the heavy-duty work of inserting a new object into a multiway tree. The basic algorithm for its implementation is an inorder traversal of the tree, looking for the insertion place. Naturally, this function would also be defined recursively, with tail-recursion removed. Observe the use of function FindPlace in much the same way it was used by function prSearch.

The protected function FullNode, called by both search and prSearch, simply determines whether or not a node of an MWT has been filled completely with m objects (indexed from 1 to m).

int FullNode( mwtPtr t ) { return t != NULL && t->m == M; }

We turn now to the operations of splitting a full root node, splitting a full child node, and augmenting a leaf node.

A full root node contains M objects and M+1 pointers. It is split by creating two new nodes at p0 and p1, storing the objects and pointers in the range 0 to Mdiv2-1 in the node at p0, storing the objects and pointers in the range Mdiv2+1 to M in the node at p1, storing the Mdiv2 object in the root node, and making the root node a 2-node with children at p0 and p1.

This algorithmic description is implemented by the protected function SplitRoot:

void gMWtree::SplitRoot()
{
   mwtPtr p0, p1;

   // FullNode( root )
   ReportSplit( "root", root );
   split( root, &p0, &p1 );
   root->e[ 1 ].info = root->e[ Mdiv2 ].info;
   root->e[ 0 ].ptr = p0;
   root->e[ 1 ].ptr = p1;
   root->m = 1;
}

To do its job, SplitRoot relies on the protected function split, which creates the two new nodes, and calls the protected function CopyRange twice to copy two ranges of objects and pointers from the node at q into the nodes at *ap0 and *ap1. The justification for defining function split is that we can use it again to split a full child node.

void gMWtree::split( mwtPtr q, 
                     mwtPtr *ap0, 
                     mwtPtr *ap1 )
{
   mwtPtr p0, p1;

   // FullNode( q );
   *ap0 = p0 = Mk0node();
   *ap1 = p1 = Mk0node();
   CopyRange( p0, 1, q, 1, Mdiv2-1 );
   CopyRange( p1, 1, q, Mdiv2+1, M );
}

void gMWtree::CopyRange( mwtPtr dst, 
                            int k, 
                         mwtPtr src, 
                            int i, 
                            int j )
{
   int s = i;

   dst->e[ k-1 ].ptr = src->e[ i-1 ].ptr;
   while ( s <= j )
      dst->e[ k++ ] = src->e[ s++ ];
   dst->m += j - i + 1;
}

A graphical description of function CopyRange is shown below.

Copy objects and pointers in the ranges i to j, and i-1 to j, respectively, of the node at src, starting at the pointer position k-1 of the node at dst, and update its size.

The justification for defining function CopyRange is obviously the fact that it is used twice.

The protected function ReportSplit is called by split for self-tracing purposes, and is implemented as follows.

void gMWtree::ReportSplit( char *str, mwtPtr p )
{
   printf( "split %s ", str );
   PrintNode( "", p, 1 );
}

Augmenting a node involves adding one object and two pointers to nodes. To augment a node pointed to by, say t, at the pointer position i, we must first move one position to the right all the objects and pointers to nodes starting at i+1 and i, respectively.

After the movement, the object is stored in the node at position i+1, and the pointers to the two nodes, say tL and tR, are stored at pointer positions i and i+1, respectively.

This algorithmic description can be implemented in such a way that the movement of objects and pointers takes place from the end of the node toward pointer position i and object position i+1, as follows.

void gMWtree::augment   // protected
              ( mwtPtr t, genPtr gp, mwtPtr tL, mwtPtr tR, int i )
{
   int j;

   PrintNode( "augmenting", t, 0 );
   printf( " with " ); (*PrnFn)( gp, PrintFlag );
   ++(t->m);
   printf( " at i == %d\n", i );
   for ( j = t->m-1; j > i; --j )
      t->e[ j+1 ] = t->e[ j ];
   t->e[ i+1 ].info = gp;
   t->e[ i ].ptr = tL;
   t->e[ i+1 ].ptr = tR;
}

To split a full child node at, say tI (corresponding to pointer position i from its parent node at t), simply use function split on the child, to obtain two nodes at tL and tR, and then augment the parent node at position i with the "middle" object from the child, and the pointers to the new nodes. Behold the simplicity of the implementation!

void gMWtree::SplitChild( mwtPtr t, mwtPtr tI, int i )
{
   mwtPtr tL, tR;

   ReportSplit( "child", tI );
   split( tI, &tL, &tR );
   augment( t, tI->e[ Mdiv2 ].info, tL, tR, i );
   free( tI );
}

After the split, the child node is no longer needed, and it is returned to the heap space. Function SplitChild should be protected.

Observe that the augmentation of the parent node in function SplitChild is justified because such a node was found not to be full during the search for the insertion place. This is the rationale behind splitting a full root node after an insertion, and splitting a full child on the way down the tree during the insertion.

To print an MWT, it is useful to implement a level-by-level traversal. The task at hand is to print all the objects stored in a node, and to traverse all the children of each node. The implementation is pretty much straightforward. (The reader may skip the use of xorQueue, and write a class to implement ordinary linked queues to contain pointers to MWT nodes as the base class for mwtPtrQ.)

// Queue of multi-way tree pointers
//  based on the implementaton of xor-coded queues
class mwtPtrQ : public xorQueue {
};

void gMWtree::PrintTree( mwtPtr t )
{
   mwtPtrQ q;

   NL();
   q.enqueue( (genPtr)t );
   LevelByLevelPrint( 0, &q );
   NL();
}

void gMWtree::LevelByLevelPrint( int level, mwtPtrQ *pq0 )
{
       int i;
   mwtPtrQ q1;
        mwtPtr p, c;

   if ( !pq0->IsEmpty() ) {
      printf( "level %2d: ", level );
      while ( !pq0->IsEmpty() ) {
         p = (mwtPtr)(pq0->dequeue());
         if ( p != NULL ) {
            PrintNode( "", p, 0 );
            for ( i = 0; i <= p->m; ++i )
               if ( (c = p->e[ i ].ptr) != NULL )
                  q1.enqueue( (genPtr)c );
         }
         else // p == NULL
            printf( "_ " );
      }
      NL(); NL(); NL();
      LevelByLevelPrint( level+1, &q1 );
   }
}

void gMWtree::PrintNode( char *str, mwtPtr p, int nl )
{
   if ( *str )
      printf( "%s node ", str );
   if ( p == NULL )
      printf( "/" );
   else { // p != NULL
      printf( "<" );
      for ( int i = 1; i < p->m; ++i ) {
         (*PrnFn)( p->e[ i ].info, PrintFlag );
         printf( "," );
      }
      (*PrnFn)( p->e[ p->m ].info, PrintFlag );
      printf( ">  " );
   }
   if ( nl )
      NL();
}

Illustration of a Search Operation on Multiway Trees

Consider the following detailed drawing of a multiway tree and a search object.

The info fields of the tree nodes point to structures containing a pointer to a search key (such as education and multiprocessor), and a pointer to some other information related to the key. Suppose that we define these structures, and a specific class as derived from the gMWtree class,

typedef struct {
   genPtr key;
   genPtr data;
} DataStr;
typedef DataStr * DataPtr;

class DataTree : public gMWTree { ... };

whose constructor binds the PrnFn and CmpFn function pointers to appropriate functions. For instance, the function to compare keys might be as follows.

RelVal CmpKeys( genPtr gp1, genPtr gp2 )
{
   DataPtr dp1 = (DataPtr)gp1, dp2 = (DataPtr)gp2;

    string s1 = (string)(dp1->key), s2 = (string)(dp2->key);

       int i = strcmp( s1, s2 );

   return i < 0 ? LT : i == 0 ? EQ : GT;
}

To the left of the tree, a similar structure containing only a search key is pointed to by the argument gp to function gMWtree::search. Beneath the left-most node of the tree, the code for function gMWtree::prSearch is written with the call to FindPlace replaced by its implementation (the while loop, in which I mistakenly re-named the m data member as n). The initial value of argument t to prSearch is a copy of the root pointer. So, we are searching for a node in the tree containing trie as its search key. The following table shows an execution trace of prSearch.

t	t->n	i		t->e[ i+1 ].info	cmp
root node	2	0		object with key `education`	`LT`
		1		object with key `multiprocessor`	`LT`
		2	exit the `while` loop
right-most node of level 2	2	0		object with key `program`	`LT`
		1		object with key `scatter`	`LT`
		2	exit the `while` loop
only node shown at level 3	5	0		object with key `text`	`LT`
		1		object with key `traversal`	`LT`
		2		object with key `trie`	`EQ`

Upon finding an object with the same key, prSearch returns a pointer to it.

Illustration of Insertion and Print Operations on Multiway Trees

Suppose that an appropriate class has been defined as derived from the gMWTree class to create an MWT of order 6 (N == 2, M == 5) to store integer numbers. The following table illustrates a sequence of insertion and print operations on the tree. (I edited the output of function gMWtree::print, and added by hand the lines linking nodes.) The table should be read row-by-row.

Removal Operation on Multiway Trees

The removal of elements from MWTs is seemingly a very difficult operation because of the requirement that these trees must be perfectly balanced. The insertion algorithm guarantees a balanced tree, and we must make the removal give the same guarantee. We wish we had a means of implementing self-adjusting MWTs similar to the one shown below. As was done in the case of binary trees, we proceed in a top-down fashion by analyzing increasingly more complicated cases. I will use the integer MWT just built to illustrate the various removal scenarios. (I must point out that most of the textbooks on advanced data structures do not address the problem of removals (or deletions) from MWTs. Those that address it do so in a cursory fashion and only in graphical form. All the code presented here to manipulate MWTs, especially the one to remove elements, is entirely my own. It is yet one more example of the power of top-down, and bottom-up structured programming.)

A Self-Adjusting Search Tree.

(Published on the occasion of Robert E. Tarjan’s Turing Award Lecture "Algorithm Design." Communications of the ACM Vol. 30, No. 3, March 1987.)

The top-level function to remove objects from an MWT is given below. The argument gp0 to remove is a generic pointer to a structure containing partial information (the search key[s]). This function simply initiates the inorder traversal of the MWT, which is done by the protected function prRemove. The arguments to the latter function are: a pointer to the search object, a pointer (initially NULL) to a parent node, and pointer (initially root) to the child node. (The function has an optional fourth argument described later) within the parent node to access the pointer to the child. If prRemove returns NULL, there is no object in the tree whose key(s) match the search key(s). In the case of a successful removal, if the root node has become empty, it is returned to the heap. Function remove returns whatever it got back from prRemove.

genPtr gMWtree::remove( genPtr gp0 )
{
   genPtr gp1;

   printf( "removing key " );
   (*PrnFn)( gp0, PrintFlag );
   NL();
   gp1 = prRemove( gp0, NULL, root );
   if ( gp1 == NULL )
      printf( " (not present)\n" );
   if ( root->m == 0 ) {
      free( root );
      root = NULL;
   }
   return gp1;
}

Due to the complexity of removal operations (and to illustrate bottom-up software design), function prRemove will be built from the ground up. First, we organize the inorder traversal of the MWT, to search for the element to be removed, following the code of functions prSearch and prInsert.

Removal From Leaf Nodes

Given the integer MWT tree obtained with the last insertion (shown below), suppose that we remove number 35.

This number is in the 4-node <32,35,38> pointed to by tC. Its parent node, pointed to by tP, is the 3-node <30,40>.

This is the simplest kind of removal operation. It involves reducing the node at tC, that is, shifting number 38 to replace 35, and turning the 4-node into a 3-node. As long as a node has at least the minimum number of elements (N), nothing else must be done. On the other hand, if a reduced node underflows, then it must be restored using an element from its parent node at tP. We anticipate that this operation may have to be performed several times. These considerations lead to the second version of function prRemove.

Second Version of prRemove

genPtr gMWtree::prRemove( genPtr gp0, 
                          mwtPtr tP, 
                          mwtPtr tC, 
                             int j )
//
// tP : parent;  tC : child;
//
// tP== NULL || ( j >= 0 && tC == tP->e[ j ].ptr)
//
{
   int i;  RelVal cmp;  genPtr gp1;  mwtPtr tI;

   if ( tC == NULL )
      return NULL;
   else { // tC != NULL
      i = FindPlace( gp0, tC, 0, &cmp );
      if ( cmp != EQ )
         gp1 = prRemove( gp0, 
                         tC, 
                         tC->e[ i ].ptr, 
                         i );
      else { // cmp == EQ

         gp1 = tC->e[ i+1 ].info;
         tI = tC->e[ i ].ptr;

         if ( tI == NULL ) 
            // removal from leaf node tC
            reduce( tC, i+1 );

         else // tI != NULL
            // removal from internal node tC
            // must be done here
      }

      if ( tP != NULL && UnderflowNode( tC ) )
         RestoreChild( tP, tC, j );
      return gp1;
   }
}

When the comparison returned by FindPlace is EQ, removal takes place from the node at tC.
If the node at tC is a leaf node then it must be reduced.
After the removal, if the node at tP exists (parent node), and the node at tC (child node) underflowed, then the child must be restored with an element from the parent at index position j+1.

The reduction of the node pointed to by t at object position i involves shifting to the left by one position all the objects and pointers to the right of the object being removed. The node size is decremented by one.

void gMWtree::reduce( mwtPtr t, int i )
{
   PrintNode( "reducing", t, 0 );
   printf( " at i == %d\n", i );
   for ( int j = i+1; j <= t->m; ++j )
      t->e[ j-1 ] = t->e[ j ];
   --(t->m);
}

The node at t has underflowed if its size is less than the minimum size allowed.

int UnderflowNode( mwtPtr t )
{ 
   return t != NULL && t->m < N; 
}

The following figure illustrates three removals from leaf nodes of the integer MWT. The resulting leaf nodes after the removals are not underflow nodes.

Removal of elements from leaf nodes of the integer MWT without causing them to underflow

Restoration of an Underflowed Child Leaf Node, Case 1: Non-minimal Siblings

In order to figure out how to restore an underflow child node, consider again the original integer MWT (shown below), and suppose that we remove elements 32 and 38 in that order.

As shown in the following figure, these removals cause the leaf 4-node <32,35,38> to be reduced to the 3-node <35,38>, and then to the 2-node <35>, which becomes and underflow node.

As shown by the dotted arrows in the last tree, to restore the underflow 2-node <35> we can move number 40 from its parent node to make the 3-node <35,40>, and replace 40 in the parent node with the left-most number (42) from the nodes right sibling. The resulting balanced tree is as follows.

There is, of course, the symmetric case in which the borrowed number is the right-most number from the underflow node’s left sibling, also performing the rotation of numbers through the parent node. The following figures illustrate this situation.

Tree resulting after the removal of numbers 46, 35, and 20, in that order.
Tree resulting after the removal of number 13.The dotted arrows indicate the movement of 10 from the parent node to the underflow node <15>, and the replacement of 10 with the right-most number of the left sibling.
Balanced tree after the restoration of the node.

It would seem that these two symmetric restoration operations for an underflow node are justified only if the following conditions hold

The parent of the underflow node is at least a 3-node.
Either the right or the left sibling of the underflow node is at least a 4-node.

As will become apparent later, the first condition it is not necessary at all. If it were, we could easily define a function Is3node to test the parent node, and modify version 2 of function prRemove as follows

if ( Is3node( tP ) && UnderflowNode( tC ) )
   RestoreChild( tP, tC, j );

Finally, when an underflow node has two siblings and both of them are at least 4-nodes, then it is advisable to borrow from the largest. We are now ready to start writing RestoreChild.

RestoreChild

void gMWtree::RestoreChild( mwtPtr tP, 
                            mwtPtr tC, 
                               int i )
{
//
// tP : parent ; tC : child ;  
// tC == tP->e[ i ].ptr
//
   // pointers to tC's left/right siblings
   mwtPtr tLs, tRs;
   // flags indicating minimal siblings
      int minLs, minRs,
   // number of items in siblings
          mLs, mRs;

   PrintNode( "underflow", tC, 1 );
   print();

   GetSiblings( tP, i, &tLs, &tRs );
   minLs = MinimalNode( tLs );
   minRs = MinimalNode( tRs );

   if ( minLs && minRs )

      // to be defined later

   else { // !minLs || !minRs
      mLs = tLs->m;
      mRs = tRs->m;

      if ( minLs || mRs > mLs )
         MoveFromRight( tP, tRs, tC, i );

      else // !minLs && mRs <= mLs
         MoveFromLeft( tP, tLs, tC, i );
   }
}

The function defines variables to point to potentially two siblings, for the underflow child node at tC has at least one. There are two flags to know if the siblings are minimal (i.e. cannot give away anything), and two variables for the sizes of the siblings.
Get pointers to the sibling(s), and determine whether it(they) is(are) minimal.
If both siblings are minimal, we must figure out something else to do.
Otherwise, there is at least one non-minimal sibling.
If the left sibling is minimal, or if the right sibling is larger, then move an item from the right sibling.
Otherwise, the left sibling is non-minimal or larger than the right. Move an item from the left sibling.

Thus far, function RestoreChild relies on the protected functions GetSiblings, MinimalNode, MoveFromRight, and MoveFromLeft. We have all the information we need to implement them.

If the pointer t is NULL, it does not point to a node, and the non-existing node is minimal. Otherwise, the node at t is minimal if it contains fewer than N+1 info items.

int gMWT::MinimalNode( mwtPtr t )
{ 
   return t == NULL || t->m < N+1; 
}

To find the siblings of a node, we need a pointer to its parent (tP). If the child whose siblings we want is at tP->e[ i ].ptr, its potential siblings would be at positions i – 1 and i + 1 of the array.

void gMWtree::GetSiblings( mwtPtr tP, int i, mwtPtr *aLs, mwtPtr *aRs )
{
//
// Assign pointers to the siblings of node at tP->e[ i ].ptr
// or assign NULL for a non-existing sibling
//
   *aLs = *aRs = NULL;          // Assume there are no siblings
   if ( i < tP->m )             // Right sibling exists
      *aRs = tP->e[ i+1 ].ptr;
   if ( i > 0 )                 // Left sibling exists
      *aLs = tP->e[ i-1 ].ptr;
}

Functions MoveFromRight and MoveFromLeft must implement the two rotation operations described above with the example integer MWT. We can implement them using functions augment, reduce, and some additional code. Observe carefully their input assertions.

void gMWtree::MoveFromRight
              ( mwtPtr tP, mwtPtr tRs, mwtPtr tC, int i )
{
// tC == tP->e[ i ].ptr && UnderflowNode( tC )
// &&
// tRs == tP->e[ i+1 ].ptr && !MinimalNode( tRs )

   ReportMoveThrough( tC, tP, tRs );

   augment( tC, tP->e[ i+1 ].info,        // Augment the node at tC with
                tC->e[ tC->m ].ptr,       // the info from the parent node
                tRs->e[ 0 ].ptr,          // at position i + 1, and its
                tC->m );                  // surrounding pointers. Place
                                          // the info at the end.

   tP->e[ i+1 ].info = tRs->e[ 1 ].info;  // At the parent node, replace the
                                          // info given to the node at tC
                                          // with the left-most info from the
                                          // right sibling at tRs.

   reduce( tRs, 1 );                      // Reduce the right sibling at
                                          // position 1.
}

void gMWtree::MoveFromLeft
              ( mwtPtr tP, mwtPtr tLs, mwtPtr tC, int i )
{
// tC == tP->e[ i ].ptr && UnderflowNode( tC )
// &&
// tLs == tP->e[ i-1 ].ptr && !MinimalNode( tLs )

   int mLs = tLs->m;

   ReportMoveThrough( tC, tP, tLs );

   augment( tC, tP->e[ i ].info,          // Augment the node at tC with
                tLs->e[ tLs->m ].ptr,     // the info from the parent node
                tC->e[ 0 ].ptr,           // at position i, and its
                0 );                      // surounding pointers. Place the
                                          // at the beginning.

   tP->e[ i ].info = tLs->e[ mLs ].info;  // At the parent node, replace the
                                          // info given to the node at tC
                                          // with the right-most info from the
                                          // left sibling at tLs.

   reduce( tLs, mLs );                    // Reduce the left sibling at
                                          // position 1.
}

Function ReportMoveThrough is called by MoveFromLeft and MoveFromRight for self-tracing purposes. Its implementation, and the actual output produced for the two restoration examples are shown below.

void gMWtree::ReportMoveThrough( mwtPtr dst, mwtPtr thr, mwtPtr src )
{
   PrintNode( "moving from node ", src, 0 );
   PrintNode( "   through node ", thr, 0 );
   PrintNode( "   to node ", dst, 1 );
}

The reader should appreciate the simplicity of the implementation of functions MoveFromLeft and MoveFromRight. Ignoring the comments, each of the functions consists of three instructions, two of which are a call to function augment (designed to perform insertions!) and a call to function reduce. (As the reader may have noticed, I do not put many comments in my code because I believe that code should be self-documenting. The comments that I added, after the fact, to functions MoveFromLeft and MoveFromRight are just to repeat the steps of the restoration process in each case.)

Restoration of an Underflowed Child Leaf Node, Case 2: Minimal Siblings

The first version of RestoreChild is incomplete because it does nothing when the siblings of an underflowed node are minimal. Again, to determine what to do we must study a concrete case. Consider the figure on the right, showing a tree, and the result after removing number 38.

As indicated by the dashed arrows, to restore the tree, we do the following:

Reduce the parent node to send 27 down to the left (and only) sibling of the underflow node.
Move all the information (data and tree pointers) from the underflow node to its sibling.
Return the underflow node to the heap.

This operation constitutes a merge of the underflow node with its left sibling. Obviously, there is the symmetric situation of merging an underflow node with its right sibling. We are now in the position to complete function RestoreChild.

Third (and definitive) version of RestoreChild

Both siblings are minimal nodes.

If the left sibling exists, merge the underflow node at the end of its sibling.

Otherwise, merge the underflow node at the beginning of its right sibling.

void gMWtree::RestoreChild( mwtPtr tP, 
                            mwtPtr tC, 
                               int i )
{
//
// tP : parent ; tC : child ;  
// tC == tP->e[ i ].ptr
//
   // pointers to tC's left/right siblings
   mwtPtr tLs, tRs;
   // flags indicating minimal siblings
      int minLs, minRs,
   // number of items in siblings
          mLs, mRs;

   PrintNode( "underflow", tC, 1 );
   print();
   GetSiblings( tP, i, &tLs, &tRs );
   minLs = MinimalNode( tLs );
   minRs = MinimalNode( tRs );

   if ( minLs && minRs )

      if ( tLs != NULL )
         MergeLeft( tP, tLs, tC, i );

      else // tLs == NULL
         MergeRight( tP, tRs, tC, i );

   else { // !minLs || !minRs
      mLs = tLs->m;
      mRs = tRs->m;
      if ( minLs || mRs > mLs )
         MoveFromRight( tP, tRs, tC, i );
      else // !minLs && mRs <= mLs
         MoveFromLeft( tP, tLs, tC, i );
   }
}

The implementation of functions MergeLeft and MergeRight requires a little more work than what was done in functions MoveFromLeft and MoveFromRight, but the basic idea is the same: every time we move a data pointer, we also move its enclosing tree pointers. Again, pay close attention to the input assertions of the functions.

Function MergeLeft can be implemented in terms of reduce (on the parent node) and CopyRange (to put elements at the end of the left sibling), plus some additional code:

void gMWtree::MergeLeft( mwtPtr tP, mwtPtr tLs, mwtPtr tC, int i )
{
// tP : parent; tC : child; tLs : tC's left sibling
//
// tLs == tP->e[ i-1 ].ptr && tC == tP->e[ i ].ptr
// &&
// UnderflowNode( tC ) && MinimalNode( tLs )

   ++(tLs->m);
   tLs->e[ tLs->m ].info = tP->e[ i ].info;
   reduce( tP, i );
   ReportMoveAll( tLs, tC );
   CopyRange( tLs, tLs->m+1, tC, 1, tC->m );
   CheckEmptyRoot( tP, tLs );
   free( tC );
}

Function MergeRight can also use reduce on the parent node, but must resort to function augment to put elements, one by one, at the beginning of the right sibling. The function also has some additional code.

void gMWtree::MergeRight( mwtPtr tP, mwtPtr tRs, mwtPtr tC, int i )
{
// tP : parent; tC : child; tRs : tC's right sibling
//
// tRs == tP->e[ i+1 ].ptr && tC == tP->e[ i ].ptr
// &&
// UnderflowNode( tC ) && MinimalNode( tRs )

   augment( tRs, tP->e[ i+1 ].info, NULL, tRs->e[ 0 ].ptr, 0 );
   reduce( tP, i );
   ReportMoveAll( tRs, tC );
   for ( int k = tC->m; k > 0; --k )
      augment( tRs, tC->e[ k ].info, NULL, tC->e[ k ].ptr, 0 );
   tRs->e[ 0 ].ptr = tC->e[ 0 ].ptr;
   CheckEmptyRoot( tP, tRs );
   free( tC );
}

Functions MergeLeft and MergeRight call function ReportMoveAll, for self-tracing purposes,

void gMWtree::ReportMoveAll( mwtPtr dst, mwtPtr src )
{
   PrintNode( "moving into", dst, 0 );
   PrintNode( " all {info, ptr} from", src, 1 );
}

and function CheckEmptyRoot to replace the root of the MWT with a new root when it becomes empty because of compression of the tree during rebalancing.

void gMWtree::CheckEmptyRoot( mwtPtr tP, mwtPtr NewRoot )
{
   if ( tP == root && tP->m == 0 ) {
      root = NewRoot;
      free( tP );
   }
}

The following figure illustrates the complete process of removal of number 38 from the example tree, and the rebalancing of the tree.

Before dealing with removal of elements from internal nodes of a MWT, I will present a beautiful example of the operation of functions MergeLeft and MergeRight. Consider the following tree, and the result after removal of number 22.

At the point where the 3-node <22, 24> underflows, function prRemove, has called itself recursively once, and the last tree shows pointer tP at the 3-node <10,18> and pointer tC at the underflow 2-node <24>. The the test tP != NULL && UnderflowNode( tC ) is true, and prRemove calls RestoreChild. The underflow node has only a left sibling, pointed to by tLs, and it is a minimal node. Therefore, function MergeLeft is called to perform the merge indicated by the dashed arrows. The following figure shows the tree after the merge.

After the merge that produced the top tree in the figure, function prRemove returns from the recursive call. (For convenience, the relevant code is shown below with the recursive call in bold.)

The reader should observe how remarkable this behavior is. Function RestoreChild was designed to restore underflowed leaf nodes and yet, due to the recursive traversal of the tree by prRemove, it works just as well with internal nodes! This is the reason for not demanding the node at tP to be at least a 3-node when calling RestoreChild. The test tP != NULL && UnderflowNode( tC ) simply asks whether the node at tP is not NULL before asking whether the node at tC is underflowed bcause such value is assigned to tP only when tC is pointing to the root of the whole tree on the first call to prRemove. As a consequence, the root node of an MWT is the only one allowed to remain underflowed as a 2-node.

The second thing the reader should observe is the power of recursive solutions to problems that are inherently recursive, to wit, the traversal of a recursive structure. Finally, because of the possibility of having to restore an underflowed node at the end of prRemove, the recursive call shown in bold above is not tail-recursive. Even though I consider myself a seasoned programmer, I cannot begin to imagine how to implement the removal operation in an iterative fashion without using an explicit stack.

Removals From Internal Nodes of Multiway Trees

To complete the definition of function prRemove, we consider now the removal of elements from internal nodes. Again, in order to determine what to do we study some concrete situations.

In general, the removal of an element from an internal node involves replacing it with either its successor or its predecessor, as we did when removing an internal node of a BST in Chapter 6. Using wishful thinking, the third (and definitive) version of prSearch is shown below, with the added code in bold.

genPtr gMWtree::prRemove( genPtr gp0, 
                          mwtPtr tP, 
                          mwtPtr tC, 
                             int j )
//
// tP : parent;  tC : child;
//
// tP== NULL || ( j >= 0 && tC == tP->e[ j ].ptr)
//
{
   int i;  RelVal cmp;  genPtr gp1;  mwtPtr tI;

   if ( tC == NULL )
      return NULL;
   else { // tC != NULL
      i = FindPlace( gp0, tC, 0, &cmp );
      if ( cmp != EQ )
         gp1 = prRemove( gp0, tC, tC->e[ i ].ptr, i );
      else { // cmp == EQ
         gp1 = tC->e[ i+1 ].info;
         tI = tC->e[ i ].ptr;
         if ( tI == NULL ) 
            // removal from leaf node tC
            reduce( tC, i+1 );
         else // tI != NULL
            // removal from internal node tC
            ReplaceWithPredOrSucc( tC, tI, i );
      }
      if ( tP != NULL && UnderflowNode( tC ) )
         RestoreChild( tP, tC, j );
      return gp1;
   }
}

The reader may be wondering what we must do when, after performing the replacement in the internal node and the reduction of a child node, the latter underflows. Well, we already have taken care of that! We now turn to the implementation of function ReplaceWithPredOrSucc.

Following the logic behind the removal of an internal node in a BST, ReplaceWithPredOrSucc must find the nodes containing the predecessor and the successor of the element to be removed. To minimize the amount of work, the replacement element should be taken from the largest node, if both of them exist. Instead of calling reduce on the node that "lost" an element, we simply call prRemove to remove such an element

void gMWtree::ReplaceWithPredOrSucc( mwtPtr tC, mwtPtr tI, int i )
{
// tI == tC->e[ i ].ptr
// replace tC->e[ i+1 ].info with predecessor or successor
// (get replacement info from largest node)
//

   mwtPtr tIp1, tPred, tSucc;
      int mPred, mSucc;  
   genPtr gp;

   tIp1 = tC->e[ i+1 ].ptr;
   tPred = PredecessorNode( tI );
   tSucc = SuccessorNode( tIp1 );
   mPred = tPred->m;
   mSucc = tSucc->m;
   if ( CopyFn == NULL ) {
      printf( "ERROR: NULL CopyFn pointer\n" );
      exit( 0 );
   }
   if ( mPred > mSucc ) {
      gp = (*CopyFn)( tC->e[ i+1 ].info, tPred->e[ mPred ].info );
      ShowReplacement( "predecessor", gp );
      prRemove( gp, tC, tI, i );
   }
   else { // mPred <= mSucc
      gp = (*CopyFn)( tC->e[ i+1 ].info, tSucc->e[ 1 ].info );
      ShowReplacement( "successor", gp );
      prRemove( gp, tC, tIp1, i+1 );
   }
}

The functions to find the node containing the predecessor and the node containing the successor of the element being removed are generalizations of standard functions to find the predecessor and successor of a node in binry search trees, and are as follows.

mwtPtr gMWtree::PredecessorNode( mwtPtr t )
{
   mwtPtr tM;

   // t != NULL
   while ( (tM = t->e[ t->m ].ptr) != NULL )
      t = tM;
   return t;
}

mwtPtr gMWtree::SuccessorNode( mwtPtr t )
{
   mwtPtr t0;

   // t != NULL
   while ( (t0 = t->e[ 0 ].ptr) != NULL )
      t = t0;
   return t;
}

Finally, a function to show the replacement (used for self-tracing purposes) is as follows.

void gMWtree::ShowReplacement( char *str, genPtr gp )
{
   printf( "replacing with %s: ", str );
   (*PrnFn)( gp, PrintFlag );
   NL();
}

As an illustration of the execution of prRemove and its auxiliary functions, the following figure shows an initial tree and the actions that take place with the removal of some of its elements. The reader is encouraged to determine, from the messages issued, which function(s) is(are) responsible for each of the actions.

Examples of removal operations on an MWT

This concludes the top-down design and implementation of operations on balanced MWTs.