In this post, you will find a simple introduction to set theory.

In this post, I have a little something different for you. No new language or framework, but something that’s been around for millennia: mathematics.

When asking programmers about math, you’ll find two kinds of people, those who say you don’t need math to be a good programmer and those who say math is essential. Personally, I think both are true. For some applications and industries, you really don’t need advanced maths, but go into robotics, machine learning, statistics, or that kind of thing and you’re going to need maths, lots of it. And whether you need it or not, computers, programming languages and databases all wouldn’t exist without math.

For now, let’s put it this way: knowing a thing or two about math gives you an edge as a programmer!

Math is everywhere: physics, chemistry, biology, economy and, yes, even in the arts! For this series, I’ll focus on the math we need in IT. I don’t know how many entries it’s going to have or what I’ll be discussing (and what I won’t be discussing), but I’ll be sure to keep it somewhat practical. No prior math knowledge is assumed. Don’t worry, I’ll still post about code once in a while too!

- Maths in IT #1: Basic set theory
- Maths in IT #2: Venn diagrams [CodeProject]
- Maths in IT #3: Algebra of sets [CodeProject]
- Coming soon…

## Collections

Do I need to tell you why we should study collections? You probably use them in your code every day in the form of arrays, database tables, lists or hash tables. What you probably didn’t know is that collections have a lot of mathematical theory! Since collections are very important in both math as in programming, I’m going to start this series here. More specifically, we’re going to talk about sets.

A *set* is an *unordered* collection containing only *distinct* values.

Let’s look at an example of a set in real life. Do you collect anything? Maybe you collect old records, each record in your collection can be uniquely identified and doubles are for trading or selling. Also, no matter in which order you put the records on the shelf, it’s still the same collection of records. So a record collection is really a set.

Likewise, we can have a collection of paintings or stamps. Another collection is an *alphabet*, for example, the English, Russian or Greek alphabet. Using these alphabets, we can construct a *language*. We have natural languages (like the ones I just mentioned) and formal languages. Programming languages like C#, Java, C or Haskell are examples of formal languages.

A collection in math is usually indicated by a single suggestive capital letter, such as for records, for stamps or for alphabet. If we have more than one collection, we can use index notations to uniquely identify them: …

The notation for a single collection containing elements , and is . This is called the explicit definition. We can now declare a collection as .

And of course, we can have a collection of collections: . Notice that collection is unique even though and are already elements in other collections.

The following set of sets isn’t valid: . Because the order of elements in sets is ignored, this set contains the same set twice, but sets also must have distinct values.

Now let’s say we have English alphabet , Russian alphabet and Greek alphabet . The collection of alphabets is written as .

Now we want to indicate that is an element of . We do this using the following syntax: .

To indicate that (the Greek letter lambda) is not an element of , we use notation: .

We can compare collections. Collections are considered to be equal when they contain exactly the same elements, no more and no less. , (remember, order is ignored), and (the collection does not equal ).

So far, we’ve seen only explicitly defined collections. We can also implicitly define collections. This is especially useful when a collection contains too many elements to write down. For example, a collection containing all countries on Earth. The notation for such a collection is as follows: . You should read that as “the collection consisting of all (objects) x for which x is a country on Earth”.

Generally, we can say that an implicitly defined collection is in the form of where, in this example, is the statement that is a country on Earth. Actually is a function taking parameter x and returning whether x is or isn’t a part of the set. We’ll look at functions in a later blog post.

To indicate how many elements are in any given (finite) collection, we can use notation . So and (where E is the English alphabet). Of course, this is only possible when our collection is finite (we can count the elements).

If a collection is empty (it contains no elements) or some collection we can use a special symbol . And of course (an empty set has elements).

When we allow the same element to appear in a collection more than once and we start taking the order of elements into consideration, we’re speaking of a *row*. The syntax for a row is simply putting the elements next to each other. For example, using the set , we can make the rows *a*, *ab*, *abc*, *baac*, *caab*, but not *baad* (because *d* is not in the set).

## Infinite Collections

So far, we’ve looked at finite collections. Let’s look at some infinite collections now. Consider the collection of all numbers: 1, 2, 3, 4… In theory, we can always add 1 to any number, so we can never stop counting. A few of these collections are so important that they got their own symbol.

First, we have the natural numbers: .

If we add negatives to the collection, we get all whole numbers, or integers: .

If we also allow fractions, like (a half) or (a quarter), then we have the collection of rational numbers .

Not all numbers can be expressed in fractions, for example pi (, the surface of a circle with radius 1) or . When we want to include those numbers, we get the collection of real numbers .

Now suppose we want all positive natural numbers, so 1, 2, 3… (excluding 0). We can indicate this with . Likewise, all negative numbers -1, -2, -3… can be indicated using . And of course we can use , , and to indicate positive or negative fractions and real numbers too.

And we can now define new infinite collections using implicit definitions. The collection of all even numbers, for example, can be defined as follows: .

So here is a collection consisting of all x for which x is an element in (all integers) and x is even.

## Subsets

If a collection A contains elements that are all elements of another collection B, we say that A is a *subset* of B. For example, is a subset of because a, b and c are all letters in the English alphabet.

More formally, we can say that if A and B are both collections and for every applies , then A is a subset of B.

We can write this as . Since, in the previous example, A does not contain all letters in the English alphabet B, we can also say that B is not a subset of A, this is written as .

Given the above definition, we can also say that or A is a subset of itself. The empty collection is a subset of all collections (including itself).

When a collection , but , then we say that A is a *proper subset* of B. We may write this as .

Sometimes you may see the notation to indicate that A is a subset of B that may or may not be equal to B.

I’ll simply use throughout the series.

When and , then .

In a similar manner, we can say that B is a superset of A if A is a subset of B. This is notated as (it’s the subset-symbol reversed). And, of course, we can say to indicate that B is a superset of A that may or may not be equal to A. For a proper superset, we may use notation.

And when and , then .

## Some Code

I promised I’d keep this series somewhat practical. So let’s look at some code. Consider the following C# sample using your favorite .NET collection class.

List<int> list = new List<int>();
list.Add(1);
list.Add(2);
list.Add(3);
list.Add(1);
int thirdItem = list[2];

The list now contains the items 1, 2, 3 and 1 again. We can also get the item at the n^{th} position, which is only useful when we know the order of the elements within the list. Based on that, we can conclude that the List class in .NET is not a set. If we want a set in .NET, we can use the HashSet<T> class instead.

HashSet<int> set = new HashSet<int>();
set.Add(1);
set.Add(2);
set.Add(3);
if (!set.Add(1))
{
Console.WriteLine("Item 1 couldn't be added.");
}

Because a set only contains unique items, a hash of each item can be generated which means that lookup time for sets is much faster than that of lists (especially when the size of the list grows). The `HashSet<T>`

can be compared to the Dictionary<TKey, TValue> class, but without values.

Unfortunately, C# doesn’t know list generators. Working with infinite collections isn’t very do-able in C# either. Let’s say you’d like to get all positive even numbers, . We have a few problems. First isn’t available in C#. We could take Int32.MaxValue (but that threw an OutOfMemoryException). So let’s just take all positive evens smaller than or equal to a million.

IEnumerable evens = from x in Enumerable.Range(1, 1000000)
where x % 2 == 0
select x;
List evaluatedEvens = evens.ToList();

Something like that is really the closest we can get in C# without going through a lot of trouble.

And at this point, I stand corrected. Paulo Zemek pointed out that it’s actually pretty easy to work with infinite collections by utilizing the yield keyword. The next (Console Application) example illustrates this (although `int`

will eventually overflow…).

static void Main(string[] args)
{
foreach (int i in Evens().Take(10))
{
Console.WriteLine(i);
}
Console.ReadKey();
}
static IEnumerable Evens()
{
return XToInfinity(1).Where(i => i % 2 == 0);
}
static IEnumerable XToInfinity(int start)
{
int current = start;
while (true)
{
yield return current;
current++;
}
}

That’s still a lot of typing though…

Let’s take another language that solves these kinds of problems a little better, a language that is a little closer to actual maths, Haskell.

evens = [ x | x <- [1..], x `mod` 2 == 0]

And there you have it. Notice that the code is actually pretty close to the mathematical notation. Take each x where x is an element of [1..] (one to infinity) and where x is even.

We can easily take the first 10 items of this infinite collection without crashing the program or encountering out of memory exceptions.

*Main> take 10 evens
[2,4,6,8,10,12,14,16,18,20]

This isn’t a blog on Haskell, so I can’t really expand on what it does, but I can say that Haskell is a “lazy” language, which means it doesn’t evaluate the items in the list until you actually need them. In this case, it will only evaluate the first ten items of *evens*, keeping it from hogging our memory and CPU.

That’s it for now. We’ve had a very basic introduction to sets, but if you’re not familiar with this stuff, it gets hard pretty quickly. So let’s take it nice and easy. Next time, we’re going to combine collections and take a look at the Venn diagram.

Hope to see you next time!

The post Maths in IT #1: Basic set theory appeared first on Sander's bits.