Click here to Skip to main content
14,574,641 members

Maths in IT #1: Basic set theory

Rate this:
4.95 (25 votes)
Please Sign up or sign in to vote.
4.95 (25 votes)
4 Nov 2015CPOL
A simple introduction to set theory.

Welcome back everyone. Today I have a little something different for you. No new language or framework, but something that’s been around for millennia: maths.
When asking programmers about maths you’ll find two kinds of people, those who say you don’t need maths to be a good programmer and those who say maths is essential. Personally I think both are true. For some applications and industries you really don’t need advanced maths, but go into robotics, machine learning, statistics, or that kind of thing and you’re going to need maths, lots of it. And whether you need it or not, computers, programming languages and databases all wouldn’t exist without maths.
For now, let’s put it this way: knowing a thing or two about maths gives you an edge as a programmer!

Maths is everywhere: physics, chemistry, biology, economy and, yes, even in the arts! For this series I’ll focus on the maths we need in IT. I don’t know how much entries it’s going to have or what I’ll be discussing (and what I won’t be discussing), but I’ll be sure to keep it somewhat practical. No prior maths knowledge is assumed. Don’t worry, I’ll still post about code once in a while too!

  1. Maths in IT #1: Basic set theory
  2. Maths in IT #2: Venn diagrams [CodeProject]
  3. Maths in IT #3: Algebra of sets [CodeProject]
  4. Coming soon…


Do I need to tell you why we should study collections? You probably use them in your code every day in the form of arrays, database tables, lists or hash tables. What you probably didn’t know is that collections have a lot of mathematical theory! Since collections are very important in both maths as in programming I’m going to start this series here. More specifically we’re going to talk about sets.

A set is an unordered collection containing only distinct values.

Let’s look at an example of a set in real life. Do you collect anything? Maybe you collect old records, each record in your collection can be uniquely identified and doubles are for trading or selling. Also, no matter in which order you put the records on the shelf, it’s still the same collection of records. So a record collection is really a set.

Likewise we can have a collection of paintings or stamps. Another collection is an alphabet, for example the English, Russian or Greek alphabet. Using these alphabets we can construct a language. We have natural languages (like the ones I just mentioned) and formal languages. Programming languages like C#, Java, C or Haskell are examples of formal languages.

A collection in maths is usually indicated by a single suggestive capital letter, such as R for records, S for stamps or A for alphabet. If we have more than one collection we can use index notations to uniquely identify them: A_{1}, A_{2}, A_{3}

The notation for a single collection containing elements a, b and c is \{a, b, c\}. This is called the explicit definition. We can now declare a collection A as A = \{a, b, c\}.
And of course we can have a collection of collections: C = \{\{a, b\}, \{c, d\}, \{a, c\}\} . Notice that collection \{a, c\} is unique even though a and c are already elements in other collections.
The following set of sets isn’t valid: \{\{a, b\}, \{b, a\}\}. Because the order of elements in sets is ignored this set contains the same set twice, but sets also must have distinct values.
Now let’s say we have English alphabet E, Russian alphabet R and Greek alphabet G. The collection of alphabets is written as \{E, R, G\}.

Now we want to indicate that a is an elements of E. We do this using the following syntax: a \in E.
To indicate that \lambda (the Greek letter lambda) is not an element of E we use notation: \lambda \notin E.

We can compare collections. Collections are considered to be equal when they contain exactly the same elements, no more and no less. \{a, b, c\} = \{a, b, c\}, \{a, b, c\} = \{c, b, a\} (remember, order is ignored), \{a, b, c\} \neq \{d, e, f\} and \{a, b, c\} \neq \{a, \{b\}, c\} (the collection \{b\} does not equal b).

So far we’ve seen only explicitly defined collections. We can also implicitly define collections. This is especially useful when a collection contains too many elements to write down. For example a collection containing all countries on Earth. The notation for such a collection is as follows: \{x | \text{x is a country on Earth}\}. You should read that as “the collection consisting of all (objects) x for which x is a country on Earth”.
Generally we can say that an implicitly defined collection is in the form of \{x | P(x)\} where, in this example, P(x) is the statement that x is a country on Earth. Actually P(x) is a function taking parameter x and returning whether x is or isn’t a part of the set. We’ll look at functions in a later blog post.

To indicate how many elements are in any given (finite) collection we can use notation |A|. So |\{a, b, c\}| = 3 and |E| = 26 (where E is the English alphabet). Of course this is only possible when our collection is finite (we can count the elements).

If a collection is empty (it contains no elements) or some collection A = \{\} we can use a special symbol \emptyset. And of course |\emptyset| = 0 (an empty set has 0 elements).

When we allow the same element to appear in a collection more than once and we start taking the order of elements into consideration we’re speaking of a row. The syntax for a row is simply putting the elements next to each other. For example, using the set \{a, b, c\} we can make the rows a, ab, abc, baac, caab, but not baad (because d is not in the set).

Infinite collections

So far we’ve looked at finite collections. Let’s look at some infinite collections now. Consider the collection of all numbers: 1, 2, 3, 4… In theory we can always add 1 to any number, so we can never stop counting. A few of these collections are so important that they got their own symbol.
First we have the natural numbers: \mathbb{N} = \{0, 1, 2, 3, ...\}.
If we add negatives to the collection we get all whole numbers, or integers: \mathbb{Z} = \{..., -3, -2, -1, 0, 1, 2, 3, ...\}.
If we also allow fractions, like \frac{1}{2} (a half) or \frac{1}{4} (a quarter) then we have the collection of rational numbers \mathbb{Q}.
Not all numbers can be expressed in fractions, for example pi (\pi, the surface of a circle with radius 1) or \sqrt{2}. When we want to include those numbers we get the collection of real numbers \mathbb{R}.

Now suppose we want all positive natural numbers, so 1, 2, 3… (excluding 0). We can indicate this with \mathbb{N}^+. Likewise, all negative numbers -1, -2, -3… can be indicated using \mathbb{Z}^-. And of course we can use \mathbb{Q}^+, \mathbb{Q}^-, \mathbb{R}^+ and \mathbb{R}^- to indicate positive or negative fractions and real numbers too.

And we can now define new infinite collections using implicit definitions. The collection of all even numbers, for example, can be defined as follows: E = \{ x | x \in \mathbb{Z} \text{ and x is even}\}.
So here E is a collection consisting of all x for which x is an element in \mathbb{Z} (all integers) and x is even.


If a collection A contains elements that are all elements of another collection B we say that A is a subset of B. For example A = \{a, b, c\} is a subset of B = \{x | \text{x is a letter in the English alphabet}\} because a, b and c are all letters in the English alphabet.

More formally we can say that if A and B are both collections and for every x \in A applies x \in B then A is a subset of B.

We can write this as A \subset B. Since, in the previous example, A does not contain all letters in the English alphabet B we can also say that B is not a subset of A, this is written as B \not\subset A.

Given the above definition we can also say that A \subset A or A is a subset of itself. The empty collection \emptyset is a subset of all collections (including itself).

When a collection A \subset B, but A \neq B then we say that A is a proper subset of B. We may write this as A \nsubseteq B.
Sometimes you may see the notation A \subseteq B to indicate that A is a subset of B that may or may not be equal to B.
I’ll simply use A \subset B throughout the series).

When A \subset B and B \subset A then A = B.

In a similar manner we can say that B is a superset of A if A is a subset of B. This is notated as B \supset A (it’s the subset-symbol reversed). And of course we can say B \supseteq A to indicate that B is a superset of A that may or may not be equal to A. For a proper superset we may use A \nsupseteq B notation.

And when A \supset B and B \supset A then A = B.

Some code

I promised I’d keep this series somewhat practical. So let’s look at some code. Consider the following C# sample using your favorite .NET collection class.

List<int> list = new List<int>();
int thirdItem = list[2]; // 0-based index.

The list now contains the items 1, 2, 3 and 1 again. We can also get the item at the nth position, which is only useful when we know the order of the elements within the list. Based on that we can conclude that the List class in .NET is not a set. If we want a set in .NET we can use the HashSet<T> class instead.

HashSet<int> set = new HashSet<int>();
if (!set.Add(1))
    Console.WriteLine("Item 1 couldn't be added.");
//int secondItem = set[1]; // Doesn't compile.

Because a set only contains unique items a hash of each item can be generated which means that lookup time for sets is much faster than that of lists (especially when the size of the list grows). The HashSet<T> can be compared to the Dictionary<TKey, TValue> class, but without values.

Unfortunately C# doesn’t know list generators. Working with infinite collections isn’t very do-able in C# either. Let’s say you’d like to get all positive even numbers, E^+ = \{ x | x \in \mathbb{N}^+ \text{ and x is even}\}. We have a few problems. First \mathbb{N} isn’t available in C#. We could take Int32.MaxValue (but that threw an OutOfMemoryException). So let’s just take all positive evens smaller than or equal to a million.

IEnumerable evens = from x in Enumerable.Range(1, 1000000)
                         where x % 2 == 0
                         select x;
List evaluatedEvens = evens.ToList();

Something like that is really the closest we can get in C# without going through a lot of trouble.

And at this point I stand corrected. Paulo Zemek pointed out that it’s actually pretty easy to work with infinite collections by utilizing the yield keyword. The next (Console Application) example illustrates this (although int will eventually overflow…).

static void Main(string[] args)
    foreach (int i in Evens().Take(10))

static IEnumerable Evens()
    return XToInfinity(1).Where(i => i % 2 == 0);

static IEnumerable XToInfinity(int start)
    int current = start;
    while (true)
        yield return current;

That’s still a lot of typing though…

Let’s take another language that solves these kinds of problems a little better, a language that is a little closer to actual maths, Haskell.

evens = [ x | x <- [1..], x `mod` 2 == 0]

And there you have it. Notice that the code is actually pretty close to the mathematical notation. Take each x where x is an element of [1..] (one to infinity) and where x is even.
We can easily take the first 10 items of this infinite collection without crashing the program or encountering out of memory exceptions.

*Main> take 10 evens

This isn’t a blog on Haskell, so I can’t really expand on what it does, but I can say that Haskell is a “lazy” language, which means it doesn’t evaluate the items in the list until you actually need them. In this case it will only evaluate the first ten items of evens, keeping it from hogging our memory and CPU.

That’s it for now. We’ve had a very basic introduction to sets, but if you’re not familiar with this stuff it gets hard pretty quickly. So let’s take it nice and easy. Next time we’re going to combine collections and take a look at the Venn diagram.

Hope to see you next time!

The post Maths in IT #1: Basic set theory appeared first on Sander's bits.


This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


About the Author

Sander Rossel
Software Developer (Senior)
Netherlands Netherlands
Sander Rossel is a Microsoft certified professional developer with experience and expertise in .NET and .NET Core (C#, ASP.NET, and Entity Framework), SQL Server, Azure, Azure DevOps, JavaScript, and other technologies.
He has an interest in various technologies including, but not limited to, Cloud, NoSQL, Continuous Integration and Deployment, Functional Programming, and software quality in general.
In his spare time he writes articles for MSDN, CodeProject, and his own blog, as well as books about Object Oriented Programming, Databases, and Azure.

You can't miss his must read book Continuous Integration, Delivery, and Deployment!

Also make sure to check his FREE +/- 100 page eBooks Object-Oriented Programming in C# Succinctly and SQL Server for C# Developers Succinctly.

He wrote a JavaScript LINQ library, arrgh.js (works in IE8+, Edge, Firefox, Chrome, and probably everything else).

Check out his prize-winning articles here on CodeProject and his blogs over at as well!

Comments and Discussions

GeneralMy vote of 5 Pin
Pratik Bhuva5-Dec-15 2:32
professionalPratik Bhuva5-Dec-15 2:32 
GeneralRe: My vote of 5 Pin
Sander Rossel6-Dec-15 9:05
professionalSander Rossel6-Dec-15 9:05 
GeneralMy vote of 5 Pin
Afzaal Ahmad Zeeshan4-Dec-15 10:56
mvaAfzaal Ahmad Zeeshan4-Dec-15 10:56 
GeneralRe: My vote of 5 Pin
Sander Rossel4-Dec-15 11:58
professionalSander Rossel4-Dec-15 11:58 
GeneralRe: My vote of 5 Pin
Afzaal Ahmad Zeeshan4-Dec-15 12:03
mvaAfzaal Ahmad Zeeshan4-Dec-15 12:03 
PraiseVery good start Pin
Greg Russell6-Nov-15 3:28
professionalGreg Russell6-Nov-15 3:28 
GeneralRe: Very good start Pin
Sander Rossel6-Nov-15 4:55
professionalSander Rossel6-Nov-15 4:55 
GeneralRe: Very good start Pin
Greg Russell8-Nov-15 21:50
professionalGreg Russell8-Nov-15 21:50 
PraiseWell written! Just one small thing.... Pin
spoljarecDamir5-Nov-15 20:39
MemberspoljarecDamir5-Nov-15 20:39 
GeneralRe: Well written! Just one small thing.... Pin
Sander Rossel6-Nov-15 5:22
professionalSander Rossel6-Nov-15 5:22 
PraiseCute Pin
jediYL5-Nov-15 9:38
professionaljediYL5-Nov-15 9:38 
GeneralRe: Cute Pin
Sander Rossel5-Nov-15 12:03
professionalSander Rossel5-Nov-15 12:03 
GeneralMy vote of 5 Pin
Paulo Zemek4-Nov-15 7:17
professionalPaulo Zemek4-Nov-15 7:17 
GeneralRe: My vote of 5 Pin
Sander Rossel4-Nov-15 7:47
professionalSander Rossel4-Nov-15 7:47 
QuestionGrammar Pin
Kevin Lockerby4-Nov-15 2:26
professionalKevin Lockerby4-Nov-15 2:26 
AnswerRe: Grammar Pin
Sander Rossel4-Nov-15 2:49
professionalSander Rossel4-Nov-15 2:49 
GeneralRe: Grammar Pin
CAReed7-Nov-15 3:30
professionalCAReed7-Nov-15 3:30 
GeneralRe: Grammar Pin
Sander Rossel7-Nov-15 4:27
professionalSander Rossel7-Nov-15 4:27 
QuestionWorking with infinite collections is doable Pin
Paulo Zemek3-Nov-15 11:36
professionalPaulo Zemek3-Nov-15 11:36 
You wrote "Working with infinite collections isn’t very do-able in C# either".
Actually, working with infinite collections is very doable in C#. The problems are methods like ToList() which will force all of them to be allocated in memory.

IEnumerable (of T), yield return and many LINQ methods are perfectly capable of working with infinite collections.

For example:
public IEnumerable<int> GenerateInfiniteNumbers()
  var random = new Random();
    yield return random.Next();

And you can easily work with such a method using things like Where(), Skip(), Take() etc.
Obviously not as performant as a set, yet completely doable.
AnswerRe: Working with infinite collections is doable Pin
Sander Rossel3-Nov-15 20:21
professionalSander Rossel3-Nov-15 20:21 
GeneralRe: Working with infinite collections is doable Pin
Paulo Zemek4-Nov-15 7:15
professionalPaulo Zemek4-Nov-15 7:15 
GeneralRe: Working with infinite collections is doable Pin
Sander Rossel4-Nov-15 7:47
professionalSander Rossel4-Nov-15 7:47 
GeneralRe: Working with infinite collections is doable Pin
Paulo Zemek4-Nov-15 8:07
professionalPaulo Zemek4-Nov-15 8:07 
GeneralRe: Working with infinite collections is doable Pin
Sander Rossel4-Nov-15 11:06
professionalSander Rossel4-Nov-15 11:06 
AnswerRe: Working with infinite collections is doable Pin
Sander Rossel4-Nov-15 6:54
professionalSander Rossel4-Nov-15 6:54 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Technical Blog
Posted 3 Nov 2015


19 bookmarked