## Introduction

After the Red-Black Tree discussion, this article will implement another variant of the self-balancing binary search tree: the AVL Tree.

## Project Setup

Follow the same style and assumption as other articles in the Build the Forest Series, the implementation assumes Python 3.9 or newer. This article adds two modules to our project: *avl_tree.py* for the AVL tree implementation and *test_avl_tree.py* for its unit tests. After adding these two files, our project layout becomes the following:

forest-python ├── forest │ ├── __init__.py │ ├── binary_trees │ │ ├── __init__.py │ │ ├── avl_tree.py │ │ ├── binary_search_tree.py │ │ ├── double_threaded_binary_tree.py │ │ ├── red_black_tree.py │ │ ├── single_threaded_binary_trees.py │ │ └── traversal.py │ └── tree_exceptions.py └── tests ├── __init__.py ├── conftest.py ├── test_avl_tree.py ├── test_binary_search_tree.py ├── test_double_threaded_binary_tree.py ├── test_red_black_tree.py ├── test_single_threaded_binary_trees.py └── test_traversal.py

(The complete code is available at forest-python.)

## What is an AVL Tree?

An AVL tree (named after the inventors Adelson-Velsky and Landis) is a self-balancing binary search tree. In addition to the binary-search-tree-property, an AVL tree maintains the AVL-tree-property to keep it balanced:

- For every node in an AVL tree, the heights of its left subtree and its right subtree differ by at most one.

The property is also called **balance factor** and can be rewritten to the following formula:

If a node’s balance factor > 0, we call it **left-heavy**. If a node’s balance factor < 0, we call it **right-heavy**. If a node’s balance factor = 0, it is called **balanced**.

(Note that some people define the balance factor as the height of its right subtree – the height of its left subtree. In this case, left-heavy becomes a node’s balance factor < 0, whereas right-heavy happens when a node’s balance factor > 0. However, no matter which definition is used, the concept of the AVL tree is the same.)

A typical AVL tree can be visualized in the picture below:

In the above picture, BF indicates a node’s balance factor. The numbers under the nodes are the height of the nodes. If a node is empty, i.e., `None`

, its height is `-1`

. This way makes it easier to calculate nodes’ balance factor.

## Build AVL Tree

This section will walk through the implementation of the AVL tree with some thoughts behind the implementation choices.

### Node

Instead of computing a node’s height every time we need it, we store the height in each node. Therefore, the node’s structure has one more field than the binary search tree node.

Storing the height saves the compute time, so we do not need to calculate the height every time we check the balance factor. However, it comes with the cost — we need to keep the height up to date when the AVL tree is modified, such as inserting a node or deleting a node. More details about the height update will come in the insertion and deletion sections.

Like the other binary tree nodes, we utilize the dataclass to define the AVL tree node.

from dataclasses import dataclass @dataclass class Node: """AVL Tree node definition.""" key: Any data: Any left: Optional["Node"] = None right: Optional["Node"] = None parent: Optional["Node"] = None height: int = 0

### Class Overview

Like other types of binary trees in the Build the Forest project, the AVL tree class has similar functionalities.

class AVLTree: def __init__(self) -> None: self.root: Optional[Node] = None def __repr__(self) -> str: """Provie the tree representation to visualize its layout.""" if self.root: return ( f"{type(self)}, root={self.root}, " f"tree_height={str(self.get_height(self.root))}" ) return "empty tree" def search(self, key: Any) -> Optional[Node]: … def insert(self, key: Any, data: Any) -> None: … def delete(self, key: Any) -> None: … @staticmethod def get_leftmost(node: Node) -> Node: … @staticmethod def get_rightmost(node: Node) -> Node: … @staticmethod def get_successor(node: Node) -> Optional[Node]: … @staticmethod def get_predecessor(node: Node) -> Optional[Node]: … @staticmethod def get_height(node: Optional[Node]) -> int: … def _get_balance_factor(self, node: Optional[Node]) -> int: … def _left_rotate(self, node_x: Node) -> None: … def _right_rotate(self, node_x: Node) -> None: … def _insert_fixup(self, new_node: Node) -> None: … def _transplant(self, deleting_node: Node, replacing_node: Optional[Node]) -> None: … def _delete_no_child(self, deleting_node: Node) -> None: … def _delete_one_child(self, deleting_node: Node) -> None: … def _delete_fixup(self, fixing_node: Node) -> None: …

We can implement most of the AVL tree functions precisely as the regular binary search tree, such as search and most auxiliary functions. We can also use the traversal functions in Binary Tree Traversal to traverse an AVL tree.

Insert and delete are two operations that may result in an AVL tree imbalance. Therefore, the `AVLTree`

class has methods that help to keep the tree balanced, including `_left_rotate()`

, `_right_rotate()`

, `_insert_fixup()`

, and `_delete_fixup()`

. Since these helper methods mainly keep an AVL tree balanced, we define them as private functions and transparent to the client code.

### Rotations

The method to restore the violated AVL-tree-property after insertion or deletion is rotation (similar to the Red-Black Tree: Rotations).

The picture below demonstrates the four cases in that AVL-tree-property could be broken: left-left, left-right, right-left, and right-right.

When we deal with an unbalanced AVL tree, we always start from the bottommost unbalanced node (i.e., the node is either *BF > 1* or *BF < -1*). So, for example, node 23 in the picture above is the bottommost unbalanced node. And then check the balance factor of its **child that has the higher height**. If the bottommost unbalanced node is left-heavy (i.e., BF > 0) and its child with higher height is left-heavy, it is the left-left case (the leftmost case in the picture above). If the balance factor of the child that has the higher height is right-heavy (i.e., BF < 0), it is the left-right case (the second leftmost case in the picture above). The other cases (right-left and right-right) in the picture are symmetric to the left-left cases and left-right cases.

The following subsections show how rotations rebalance the imbalance cases.

#### Right Rotation (Left-Left Case)

For the left-left case, we perform the right rotation on the bottommost unbalanced node (node 23 in this case). After the rotation, node 17 becomes the parent of node 23. Also, the height and balance factor of both node 17 and node 23 changed after the rotation.

#### Left-Right Rotation (Left-Right Case)

If the bottommost unbalanced node is left-heavy, but its child (who has higher height) is right-heavy, we perform left rotation first on the child (node 17 in this case) and then perform right rotation on the bottommost unbalanced node (node 23).

Note that only the nodes that involve the rotation change their balance factor and heights. For example, after the left rotation, node 17 and node 21 change their height and balance factor (highlight in yellow). After the right rotation, only node 21 and node 23 change their height and balance factor (height in green).

#### Right-Left Rotation (Right-Left Case)

The right-left case is symmetric to the left-right case. Therefore, we perform the right rotation first and the left rotation after.

#### Left Rotation (Right-Right Case)

The right-right case is symmetric to the left-left case. So we can perform left rotation to make it balance.

#### Summary

The table below summarizes the unbalanced cases and their solutions.

### Insert

Insertion in the AVL tree has one substantial effect: update the height. Therefore, when we insert a node into an AVL tree, the new node may change the tree height, violating the AVL-tree-property. When that happens, we perform the specific rotations to rebalance the tree.

#### Height Update

Before we implement the `insert`

function, we need to understand how the insertion changes the height. First, note that the new node to be inserted must become a leaf node after the insertion. Therefore, the new node’s height must be 0, and its balance factor is also 0. Besides, if the new node’s parent has a child before the new node’s insertion, the parent and the overall tree’s height remain the same: no height changes, no AVL-tree-property violation.

Second, the height change only happens when the new node’s parent did not have a child before the insertion. In this case, we update the new node’s ancestors’ height all the way to the root (like the picture below), and only the new node’s ancestors have a height update. When the height changes, it means potential AVL-tree-property violations may happen.

(The picture above does not violate the AVL-tree-property. However, we will handle the case that an AVL tree becomes imbalanced after the insertion in the following sections.)

The insert algorithm is modified from the regular binary search tree insert.

- Insert the new node with the height 0 in the same way as the binary search tree insertion: find the proper location (i.e., the parent of the new node) to insert the new node by walking through the tree from the root and comparing the new node’s key with each node’s key along the way.
- Update the height and check if the violated AVL-tree-property happens by traceback from the new node to the root. While traveling back to the root, update each node’s height on the way if necessary. If we find an unbalanced node, perform certain rotation(s) to balance it. After the rotation, the insertion concludes. If no unbalanced node is found, the insertion completes after reaching the root and updating its height.

def insert(self, key: Any, data: Any) -> None: new_node = Node(key=key, data=data) parent: Optional[Node] = None current: Optional[Node] = self.root while current: parent = current if new_node.key < current.key: current = current.left elif new_node.key > current.key: current = current.right else: raise tree_exceptions.DuplicateKeyError(key=new_node.key) new_node.parent = parent # If the tree is empty, set the new node to be the root. if parent is None: self.root = new_node else: if new_node.key < parent.key: parent.left = new_node else: parent.right = new_node # After the insertion, fix the broken AVL-tree-property. # If the parent has two children after inserting the new node, # it means the parent had one child before the insertion. # In this case, neither AVL-tree property breaks nor # heights update requires. if not (parent.left and parent.right): self._insert_fixup(new_node)

#### Fixup

As the rotations section mentioned, there are four potential unbalanced cases. And we perform specific rotations to restore the AVL-tree-property. After we fix the AVL-tree-property, the AVL tree becomes balanced. No need to keep tracking the ancestors’ balance factor. The following subsections describe the fixup for each case.

##### Left-Left Case

In the picture above, we add node 19. After the insertion, we start checking the balance factor of node 19’s ancestors from node 17 and up. And then, we find that node 37’s balance factor is unbalanced and left-heavy. We also need to check its child with a higher height to determine which rotation to perform. In the insertion case, the child with a higher height must happen in the path that contains the new node, i.e., node 23 in this case, since node 23’s balance factor is 1 after the insertion. We identify that this is a left-left case.

So we perform the right rotation on node 37. After the right rotation, node 23 becomes the new root, and node 37 becomes node 23’s right child. Note that only the nodes involved in the rotation have height and balance factor change. So, in this case, only node 23 and node 37 have height, and the balance factor changed.

**Why There Is No Need to Keep Checking the Ancestors’ Balance Factor After the Rotation?**

The height of node 37 was 2, and node 23’s height was 1 before the insertion. After the insertion, node 37’s height became 3, and node 23’s height became 2. Then we performed the rotation. After the rotation, node 23 took node 37’s original position, and node 23’s height became 2, and node 37’s height became 1. Therefore, the height of the same position remains the same, i.e., the root (node 37) had height 2 before the insertion; the root (node 23) still has height 2 after the rotation. In other words, if node 37 has a parent (say *x*) before the rotation, after the rotation, *x*‘s left child becomes node 23, but x’s height is not affected. So, we don’t need to keep checking the x’s ancestors’ heights after the rotation.

##### Left-Right Case

After we identify this is a left-right case by checking the balance factor of the bottommost unbalanced node (node 37) and its child (node 23) with higher height, we perform the left rotation on node 23, so it becomes a left-left case. Then, we perform the right rotation on the unbalanced node (node 37). After that, we restore the violated AVL-tree-property.

##### Right-Left Case

This case is symmetric to the left-right case, so we perform the right on the unbalanced node’s (node 37) right child (node 43), so it becomes the right-right case. Then, we perform the left rotation on the unbalanced node (node 37). After that, we fix the violated AVL-tree-property.

##### Right-Right Case

The right-right case is symmetric to the left-left case, so we can restore its AVL-tree property by performing the left rotation.

After the fixup analysis, we can implement the `_insert_fixup`

function like the following. Note that we always update the node’s height while we are walking up the tree and before the rotation.

def _insert_fixup(self, new_node: Node) -> None: parent = new_node.parent while parent: parent.height = 1 + max( self.get_height(parent.left), self.get_height(parent.right) ) grandparent = parent.parent # grandparent is unbalanced if grandparent: if self._get_balance_factor(grandparent) > 1: # Case Left-Left if self._get_balance_factor(parent) >= 0: self._right_rotate(grandparent) # Case Left-Right elif self._get_balance_factor(parent) < 0: self._left_rotate(parent) self._right_rotate(grandparent) # Since the fixup does not affect the ancestor of the unbalanced # node, exit the loop to complete the fixup process. break elif self._get_balance_factor(grandparent) < -1: # Case Right-Right if self._get_balance_factor(parent) <= 0: self._left_rotate(grandparent) # Case Right-Left elif self._get_balance_factor(parent) > 0: self._right_rotate(parent) self._left_rotate(grandparent) # Since the fixup does not affect the ancestor of the unbalanced # node, exit the loop to complete the fixup process. break parent = parent.parent

### Search

The search function is identical to the Binary Search Tree: Search.

### Delete

The basic idea of the AVL tree deletion is similar to the regular binary search tree. There are three cases of the node to be deleted: no child, only one child, and two children. We also use the same `transplant`

method to replace the subtree rooted at the `deleting_node`

with the subtree rooted at the `replacing_node`

.

**Transplant**

The `transplant`

method is identical to the Binary Search Tree: Delete.

def _transplant(self, deleting_node: Node, replacing_node: Optional[Node]) -> None: if deleting_node.parent is None: self.root = replacing_node elif deleting_node == deleting_node.parent.left: deleting_node.parent.left = replacing_node else: deleting_node.parent.right = replacing_node if replacing_node: replacing_node.parent = deleting_node.parent

Similar to the insertion, the AVL tree deletion may update the tree height and potentially violate the AVL-tree-property, so we need to check if the AVL tree becomes unbalanced and fix it after the deletion.

The overall delete procedure is like a regular binary search tree with some modifications.

- Find the node to be deleted (
`deleting_node`

). - If the
`deleting_node`

has no child, use the`transplant`

method to replace the`deleting_node`

with`None`

. Then, perform the fixup operation. - If the
`deleting_node`

has only one child, use the`transplant`

method to replace the`deleting_node`

with only one child. Then perform the fixup operation. - If the
`deleting_node`

has two children, find the`deleting_node`

‘s successor as the`replacing_node`

. Then, replace the`deleting_node`

‘s key and data with the`replacing_node`

‘s key and data, so the`deleting_node`

is replaced by the`replacing_node`

but keep its original balance factor and height (i.e., no height and balance factor change means no AVL-tree-property violation). After that, delete the`replacing_node`

, which is the same as either step 2 (no child) or step 3 (only one child).

To make the implementation clear, we define the `_delete_no_child`

method for the case that the node to be deleted has no child and the `_delete_one_child`

method for the node to be deleted has only one child. If the node is deleted with two children, we can reuse the `_delete_no_child`

and `_delete_one_child`

accordingly. Furthermore, since the node to be deleted has two children using `_delete_no_child`

and `_delete_one_child`

, only these two methods need to call the `_delete_fixup`

function to fix unbalanced nodes. Thus, we can implement the `delete`

, `_delete_no_child`

, and `_delete_one_child`

functions as the following:

def delete(self, key: Any) -> None: if self.root and (deleting_node := self.search(key=key)): # Case: no child if (deleting_node.left is None) and (deleting_node.right is None): self._delete_no_child(deleting_node=deleting_node) # Case: Two children elif deleting_node.left and deleting_node.right: replacing_node = self.get_leftmost(node=deleting_node.right) # Replace the deleting node with the replacing node, # but keep the replacing node in place. deleting_node.key = replacing_node.key deleting_node.data = replacing_node.data if replacing_node.right: # The replacing node cannot have left child. self._delete_one_child(deleting_node=replacing_node) else: self._delete_no_child(deleting_node=replacing_node) # Case: one child else: self._delete_one_child(deleting_node=deleting_node) def _delete_no_child(self, deleting_node: Node) -> None: parent = deleting_node.parent self._transplant(deleting_node=deleting_node, replacing_node=None) if parent: self._delete_fixup(fixing_node=parent) def _delete_one_child(self, deleting_node: Node) -> None: parent = deleting_node.parent replacing_node = ( deleting_node.right if deleting_node.right else deleting_node.left ) self._transplant(deleting_node=deleting_node, replacing_node=replacing_node) if parent: self._delete_fixup(fixing_node=parent)

#### Fixup

We know the delete operation may change the height and violate the AVL-tree-property. Like the insertion, there are four potential unbalanced cases. And we perform specific rotations to restore the AVL-tree-property. We also start the fixup procedure from the bottommost unbalanced node. The bottommost unbalanced node must be one of the ancestors of the node to be deleted. However, unlike the insertion fixup, the unbalanced balance factor may propagate above the node we perform the rotations on. Therefore, after we restore the bottommost unbalanced node, we need to check its parent. If its parent becomes unbalanced, fix it. Repeat this process until we reach the root, and the root is also balanced.

##### No Child

Like we mentioned in the rotations section, four cases could violate the AVL-tree-property. The following picture shows these four cases and how to restore their balance factor if the node to be deleted has no child.

##### One Child

The following two pictures demonstrate the four cases.

**The unbalanced node is left-heavy.**

**The unbalanced node is right-heavy.**

##### Two Children

Like other binary trees’ deletion, we can see the two children case as two sub-cases: the replacing node is the `deleting_node`

‘s direct child, and the `replacing_node`

is the `deleting_node`

’s leftmost node. In either case, we can transfer the two children case to either the no-child case or the one-child case by replacing the `deleting_node`

with the `replacing_node`

but keep the original height and balance factor. By doing that, we do not change the height and balance factor. After that, we delete the `replacing_node`

, and it becomes either the no-child case or the one-child case. The following pictures show how does the two children deletion works and how to fix its unbalance.

**The replacing_node is the deleting_node’s direct child.**

**The replacing_node is the deleting_node’s leftmost node.**

The implementation of the complete fixup operation is the following. Similar to the insertion, we update the node’s height while walking up the tree.

def _delete_fixup(self, fixing_node: Node) -> None: while fixing_node: fixing_node.height = 1 + max( self.get_height(fixing_node.left), self.get_height(fixing_node.right) ) if self._get_balance_factor(fixing_node) > 1: # Case Left-Left if self._get_balance_factor(fixing_node.left) >= 0: self._right_rotate(fixing_node) # Case Left-Right elif self._get_balance_factor(fixing_node.left) < 0: # The fixing node's left child cannot be empty self._left_rotate(fixing_node.left) self._right_rotate(fixing_node) elif self._get_balance_factor(fixing_node) < -1: # Case Right-Right if self._get_balance_factor(fixing_node.right) <= 0: self._left_rotate(fixing_node) # Case Right-Left elif self._get_balance_factor(fixing_node.right) > 0: # The fixing node's right child cannot be empty self._right_rotate(fixing_node.right) self._left_rotate(fixing_node) fixing_node = fixing_node.parent

### Auxiliary Functions

The auxiliary functions, such as get the leftmost node and get the successor of a node, are identical to the Binary Search Tree: Auxiliary Functions, and the implementation is available at the Github repository. The only auxiliary function different from the binary search tree is the function to get the height.

#### Get the Height

Since each node has its height stored, to get the node’s height becomes very straightforward: just return the height.

@staticmethod def get_height(node: Optional[Node]) -> int: if node: return node.height # None has height -1 return -1

### Traversal

Although the AVL tree node has one additional field (i.e., *height*) than the normal binary search tree node, we can still use the exact implementation we did in Binary Tree Traversal to traverse an AVL tree. The only modification we need to make is to add the AVL tree as a supported type.

# Alisa for the supported node types. For type checking. SupportedNode = Union[None, binary_search_tree.Node, avl_tree.Node] SupportedTree = Union[binary_search_tree.BinarySearchTree, avl_tree.AVLTree] """Alisa for the supported tree types. For type checking."""

(See traversal.py for the complete source code.)

The supported type is for type checking, as we discussed at Binary Tree Traversal: Function Interface.

## Test

As always, we should have unit tests for our code as much as possible. Check test_avl_tree.py for the complete unit test.

## Analysis

AVL tree is a self-balancing binary search tree, and it has height O(lg n), where n is the number of nodes (This can be proved by leveraging the Fibonacci number. See AVL Tree Wikipedia for more detail). Therefore, the time complexity of an AVL tree can be summarized in the table below:

## Example

Like the red-black tree, the AVL tree is widely used in software programs because of its self-balancing ability. For example, this section uses the AVL tree we implement here to implement a key-value `Map`

.

from typing import Any, Optional from forest.binary_trees import avl_tree from forest.binary_trees import traversal class Map: """Key-value Map implemented using AVL Tree.""" def __init__(self) -> None: self._avlt = avl_tree.AVLTree() def __setitem__(self, key: Any, value: Any) -> None: """Insert (key, value) item into the map.""" self._avlt.insert(key=key, data=value) def __getitem__(self, key: Any) -> Optional[Any]: """Get the data by the given key.""" node = self._avlt.search(key=key) if node: return node.data return None def __delitem__(self, key: Any) -> None: """Remove a (key, value) pair from the map.""" self._avlt.delete(key=key) def __iter__(self) -> traversal.Pairs: """Iterate the data in the map.""" return traversal.inorder_traverse(tree=self._avlt) @property def empty(self) -> bool: """Return `True` if the map is empty; `False` otherwise.""" return self._avlt.empty if __name__ == "__main__": # Initialize the Map instance. contacts = Map() # Add some items. contacts["Mark"] = "mark@email.com" contacts["John"] = "john@email.com" contacts["Luke"] = "luke@email.com" # Retrieve an email print(contacts["Mark"]) # Delete one item. del contacts["John"] # Check the deleted item. print(contacts["John"]) # This will print None # Iterate the items. for contact in contacts: print(contact)

(The complete example is available at avlt_map.py.)

## Summary

Like the red-black tree, the AVL tree introduces some complexity on insertion and deletion operations to keep its balance, but the AVL tree’s self-balancing ability provides O(lg n) time complexity for basic operations, which is better performance than the regular binary search tree.

(Originally published at Shun's Vineyard on June 6, 2021.)

## History

- 7
^{th}June, 2021: Initial version