#### This article is part of a series of articles:

Multiple Precision Arithmetic: A recreational project:

- Part #1, The Sum
- Part #2, The Subtraction
- Part #3, The Multiplication
- Part #4, The Division
- Part #5, Using the Code
- Github Repo

## Definition

Subtraction is one of the basic arithmetic operations, defined as the inverse of the sum.

**Definition of subtraction **(as inverse of the sum): to subtract B from A means to remove B items from A, the result S is such that B + S = A

**Alternative definition**: subtraction can be defined as a countdown operation:

9 – 5 = *start countdown from 9 for 5 times* : 8, 7, 6, 5, 4 the result is 4

**Careful**: the subtraction is not commutative: 9 – 5 not equals to 5 – 9

On natural numbers, 5 – 9 is not defined, so the subtraction is defined only for the couples (A, B) where A ∊ ℕ , B ∊ ℕ and A >= B

*This article is only about the subtraction in **ℕ** + 0*

**Note**: The computer uses **two complement notation** to represent integers having fixed size, we will not because I am using a different number representation.

## Background

### Simple Subtraction

In base 10 (*same applies to any other base*), to compute the result of **A** - **B**, having **A** and **B** single digit numbers, working manually with pencil and paper or using an abacus, you need to count backward from 9 to 0: remember you can only apply the simple subtraction if and only if A >= B.

### Simple Subtraction Extended

The limitation by which you can subtract two single digits A and B if and only if A >= B is a limit we need to remove to build our algorithm. We’d introduce the concept of borrow, as the CPU reaches 0, it just overflows and continues counting from 2^{register_size} -1 : we can do the same, example:

5 - 9 = start countdown from 5 for 9 times: 4,3,2,1,0, (overflow, mark on paper the borrow, resume counting from 9) ,8,7,6 : the result is 6 and we had a borrow.

This is useful to apply the long subtraction that we are going to introduce in a minute.

### An Interesting Property of the Simple Subtraction

An interesting property of the simple subtraction, is that, regardless of the base ** b**, having

**A**,

**B**single digits numbers of base

**, when**

*b***B**>

**A**then

**A**–

**B**is going to:

- have a borrow
- the result will always be 1 digit
- the result digit will be greater than the first operand (the proof is similar concept to the proof for addition in the previous chapter).

### Long Subtraction

If numbers **A** or **B** have more than 1 digit, we need an algorithm that allows us to complete the operation without having to do an expensive countdown, *imagine you need to do 1000000 – 999999 and you do not want to count down so long*, we will apply the countdown method on base 10 by just applying a ** simple subtraction extended** to each pair of digits

**A**–

_{i}**B**at each iteration if we have an overflow we borrow a unit from the next

_{i:}**A**digit, we just need to mark on the paper that we borrowed one on the previous digit, then at successive iteration, we will need to account for it.

_{(i+1)}### Long Subtraction at the Digital Computer (same as base 10 but using base 2^^{integer_register_size})

As we can handle decimal digits, the CPU can handle 2^^{integer_register_size} digits. The CPU could have a borrow flag and a *subtract with borrow* hardware instruction to simplify borrow detection.

Note that borrow flag is not always available on high level languages, by the way borrow flag can be emulated by tweaking the property we discussed a little earlier.

## The Algorithm

The algorithm here proposed is a high-level overview of the actual algorithm, the idea is to split the long subtraction of **A**-**B** into a sequence of simple subtractions **A _{n}**-

**B**, starting with the least significant digits of both numbers, we also need to propagate the borrow (that can only be 1 or zero) at each iteration.

_{n}Here is the algorithm:

You should have noted the “eventually remove leading zeroes” part, that is what makes subtraction more complex than sum, we don’t know in advance how many digits the result will have*. One could loop the result starting at sizeof(A)-1 and decreasing the index until we found the first non-zero number. Another way could be, at each iteration on the above algorithm, to mark position as soon as we append a non-zero value to result.*

## Sample Implementation

Here, I paste my C “reference” version, I made some assumptions on local machine hardware registry, the idea is to have a *config* file which sets up things for the actual algorithm and have a basic portable implementation.

They are 40- lines of code.

**Side note**: C cannot read nor use the borrow flag register, so we detect borrow using branches.

/* Caller must check: * R should have MAX(ASize, BSize) space, will not check bounds * ASize must be > BSize or ASize can be == BSize if A > B * Most significant digit of A is not zero Return value is number of significant digits in result */ numsize_t LongSub(reg_t* A, numsize_t ASize, reg_t * B, numsize_t BSize, reg_t* R) { register numsize_t i; int borrow = (int)(i = 0); numsize_t msd = _R(-1); //keep track of last most significant digit while (BSize > i) { register reg_t a = A[i]; register reg_t b = B[i]; R[i] = a - b - borrow; borrow = (a < b) | (borrow & (b==a)); /* that's an hack, tests will say if it work*/ ++i; } while (ASize > i) { R[i] = A[i] - borrow; if (R[i] != 0) msd = i; borrow = R[i] > A[i] ? 1 : 0; ++i; if (borrow == 0) break; } if (ASize > i) msd = ASize - 1; while (ASize > i) { R[i] = A[i]; ++i; } return msd+1; }

## Aftermath

After some testing, I can say that the above implementation works just nice. I also created 2 more x64 assembly versions to compare “reference” implement to them, when compiled release mode.

*I don’t know whether the C compiler could penalize assembly function calls because it’s unable to optimize extern calls, so these numbers must be taken with a pinch of salt, anyway being the test working on reproducible-pseudo-random 512KB number pairs, call time should be negligible in either cases.*

Relative timing of three versions of the same algorithm.

*Above, you see results on i7 3ghz of 2011, compiled using VS 2017 C++.*

*Above, you see results on i9-10900 compiled using VS 2022 C++, as you may note, fastest ASM version is 2x faster (compared to previous hardware+compiler setup) and C version has the same speed of assembler version despite it is indirectly detecting the borrow while ASM version is using hardware borrow flag, that indicates I am outskilled by the vc2022 compiler and, generally speaking, I feel some confirmation bias towards who says "you need to be quite an expert in order to beat a C compiler". *

## History

- 2017: Started writing
- 4
^{th}February, 2019: Almost ready to publish - 11
^{th}June, 2022: Published online