
Introduction
Code that generates the Mandelbrot set is a favorite target of obfustactionists (
obfuscate: v. to make obscure). The main reasons for this are probably: 1) images of the Mandelbrot set are familar to most people, even if they don't know anything about computers, mathematics or fractals, and 2) the code required to generate the images is very simple.
The Mandelbrot Set
First, a quick introduction to the Mandelbrot Set and the theory behind it. The Mandelbrot Set is a set of complex numbers that exhibit an interesting behavior when run through a simple formula. When this formula is applied repeatedly (with the result from one calculation used as the input to the next calculation), numbers within the Mandelbrot set will, no matter how many times you apply the formula, not get farther than a certain distance from 0,0i (the origin of the complex plane). Numbers not in the Set will, after some number of iterations, exceed that magic distance. The numbers that "escape" are not in the Set, but they are the interesting values when making images. The basic technique is to use the number of iterations required to make the number escape as an index into a color table and to color the values inside the set as plain black.
The formula
z
i = z
2 + z
0
In the formula, z
0 is the initial value and z
i is the value at iteration
i. The "escape" radius is 2: if z
i ever exceeds a distance of 2 from the origin, that point is considered to be outside the set. Don't let the complex numbers scare you. The math required to do this is very simple and you don't even need to understand it to use it.
How to make a picture
First you decide on an area of the complex plane that you want to investigate. Pick a rectangular area of the plane, say from (-2.3, -1i) to (.65, .95i). Then decide on the number of points in that area that you want to evaluate: essentially, the size of the output image. Then, for each point in the output image, find the corresponding point on the imaginary plane and run it through the formula until the value escapes, or you decide you've done enough iterations. Then when you've stopped calculating, you set the output pixel to a color based on the number of iterations it took the value to escape (the final
i), if it escaped at all.
The code we'll start with
#include "stdafx.h"
#include <conio.h>
void main()
{
int rows = 40, cols = 60;
int maxiter = 255;
double fViewRectReal = -2.3, fViewRectImg = -1.0;
double fMagLevel = 0.05;
for (int y=0; y < rows; y++)
{
double fCImg = fViewRectImg + y * fMagLevel;
for (int x=0; x < cols; x++)
{
double fCReal = fViewRectReal + x * fMagLevel;
double fZReal = fCReal;
double fZImg = fCImg;
bool bInside = true;
int n;
for (n=0; n < maxiter; n++)
{
double fZRealSquared = fZReal * fZReal;
double fZImgSquared = fZImg * fZImg;
if (fZRealSquared + fZImgSquared > 4)
{
bInside = false;
break;
}
fZImg = 2 * fZReal * fZImg + fCImg;
fZReal = fZRealSquared - fZImgSquared + fCReal;
}
if (bInside)
{
putchar(' ');
}
else
{
putchar('+');
}
}
putchar(10);
}
while (!kbhit());
}
A console app???
The first thing you'll notice is that this is not a Windows app, it's a plain C app.
Q: Why?
A: There's no point in obfuscating a whole MFC app. There is too much code required for even the simplest dialog app. But a console app can be literally a single function.
Obfuscation
The idea behind obfuscation is to cause the person looking at the code it to wonder how the code can possibly do what it does. There are many ways to approach this. The basic method is to remove any hints in the code as to the algorithm, the data structures, the flow or the purpose - make the code as hard to follow as possible. This is what I'll show here. Another level of obfuscation is to change the actual algorithm used so that it's impossible to tell what the program is doing, even if the code is perfectly formatted and even if it's commented. I'll only approach this level here.
Get to it already
The first thing I'll do is a little optimization. Notice that we're looping over the output pixels with the x and y loops, and calculating c (c is just z
i from the formula - the initial input) based on x and y. Also notice that x and y are only used to do two things: count off rows and columns, and calculate the real and imaginary parts of c. And, to top it off, the distance between the one value for c and the next is constant in both dimensions:
double fCImg = fViewRectImg + y * fMagLevel;
.
So????
This means we can get rid of x and y and simply loop over the real and imaginary parts of c. x and y were really only there to make things easy to understand, and that's exactly what we're trying to avoid. So,
for the y loop, instead of this:
for (int y=0; y < rows; y++)
{
double fCImg = fViewRectImg + y * fMagLevel;
...we get this:
for (double fCImg = fViewRectImg;
fCImg < fViewRectImg + rows * fMagLevel; fCImg+=fMagLevel)
{
Well, that's certainly hard to read, but it's too long and does too much unnecessary calculation. everything in there is a constant. So, let's get rid of the math.
for (double fCImg = -1; fCImg < 1; fCImg+=fMagLevel)
{
And now we don't need the fViewRectImg variable anymore, either. Likewise, for the x loop, we go from:
for (int x=0; x < cols; x++)
{
fCReal = fViewRectReal + x * fMagLevel;
....
to
for (double fCReal =-2.3; fCReal < .7; fCReal+=fMagLevel)
{
...
And, now we can get rid of fViewRectReal and the "rows" and "cols" variables, too. And while we're at it, let's get rid of fMagLevel too. It's just a constant. One more simple thing, let's move all of the variable declarations to the top of the function, out of the loops. And, let's put all of the double's on the same line. This will cut down the number of times we see the keyword "double". So, we've removed seven variables and two lines of code: no big deal. The code is still readable, though it's somewhat less flexible - before you could change the number of rows and columns or the level of magnification, or the view rect pretty easily. Now, those variables are all combined - we're increasing the specialization of the code - and that's a good thing, for obfuscation.
void main()
{
int maxiter = 255;
double fZReal, fZImg, fZRealSquared, fZImgSquared, fCReal, fCImg;
bool bInside;
int n;
for (fCImg = -1; fCImg < 1; fCImg+=.05)
{
for (fCReal =-2.3; fCReal < .7; fCReal+=.05)
{
fZReal = fCReal;
fZImg = fCImg;
bInside = true;
for (n=0; n < maxiter; n++)
{
fZRealSquared = fZReal * fZReal;
fZImgSquared = fZImg * fZImg;
if (fZRealSquared + fZImgSquared > 4)
{
bInside = false;
break;
}
fZImg = 2 * fZReal * fZImg + fCImg;
fZReal = fZRealSquared - fZImgSquared + fCReal;
}
if (bInside)
{
putchar(' ');
}
else
{
putchar('+');
}
}
putchar(10);
}
while (!kbhit());
}
The first trick
The maxiter variable holds the maximum number of iterations we'll allow before bailing out of the calculation section: 255. That number should feel important to you; it's the number of values a char can hold. So, we can use that special property of a char in a tricky way. Get rid of the variable 'n'. Add a variable at the top of the function called "iter" of type char. Take the inner loop:
for (n=0; n < maxiter; n++)
{
and replace it with
for (iter=1; iter; iter++)
{
This is counting on the fact that, when iter reaches -1, adding one to it will cause it to go to zero. How does it get to -1 from +1 ? By going up to 127, then flipping to -128 and counting
up to 0. And because for loops work by exiting when the middle expression turns to zero, our loop still works like we want, even though the path iter takes to get there is not exactly what you see in counters. It's ugly and non-intuitive and that's what we want.
Trick two?
Next, look at the
if (bInside)
section. The only time bInside is true is if we go through all 255 iterations of the loop; otherwise, we bailed early. Well, if we went through all 255 iterations, iter is going to be zero. So, bInside is redundant. We could replace
if (bInside)
with
if (!iter)
, but that doesn't buy us very much. Instead, lets make use of my favorite C operator, the "?", or "in-line if". So, get rid of bInside and replace that whole
if (bInside)...else
section with this:
putchar(iter ? '+' : ' ');
. The "?" operator is awesome.
Trick three
One nice feature of the for loop is that the middle expression, the one that decides when to stop the loop, can be as complex as you want, as long as it can be evaluated as an integer or boolean expression. That means we can take the
if (fZRealSquared + fZImgSquared > 4)
test out of the body of the inner loop and put it in the middle expression of the controlling for expression. So, the inner loop goes from:
for (iter=1; iter; iter++)
{
fZRealSquared = fZReal * fZReal;
fZImgSquared = fZImg * fZImg;
if (fZRealSquared + fZImgSquared > 4)
{
break;
}
fZImg = 2 * fZReal * fZImg + fCImg;
fZReal = fZRealSquared - fZImgSquared + fCReal;
}
to
fZRealSquared = fZImgSquared = 0;
for (iter=1; iter && (fZRealSquared + fZImgSquared < 4); iter++)
{
fZRealSquared = fZReal * fZReal;
fZImgSquared = fZImg * fZImg;
fZImg = 2 * fZReal * fZImg + fCImg;
fZReal = fZRealSquared - fZImgSquared + fCReal;
}
We have to initialize fZImgSquared and fZRealSquared before we test them. In the original version, the test was done after the values were calculated. Now we'll just set them to zero, which guarantees they'll get into the loop at least once. Note that this will cause the value of iter to be off by one in many cases. But, we don't care about the actual value here - we only care if it takes less than 255 iterations to escape. This trick won't effect the overall result in any meaningful way, and anyway, we'll fix it in a minute.
Even better, because the middle expression in a for loop is evaluated each time through the loop, you can actually get rid of the third expression in the loop, the place where you normally increment the counter, and move the increment to the middle section:
for (iter=1; ++iter && (fZRealSquared + fZImgSquared < 4);)
{
Ahhh... now it's getting ugly.
Even better than that, let's abuse the equals operator, too! As you may know, when you assign something in C, the assignment actually produces a return value: a = 3 + 5;
returns, in a sense, "8". Normally we just let that return value vanish. But, we can use it. Here's our inner loop:
for (iter=1; ++iter && (fZRealSquared + fZImgSquared < 4);)
{
fZRealSquared = fZReal * fZReal;
fZImgSquared = fZImg * fZImg;
fZImg = 2 * fZReal * fZImg + fCImg;
fZReal = fZRealSquared - fZImgSquared + fCReal;
}
Let's move the first two calculations into the middle part of the for expression:
for (iter=1; ++iter && ((fZRealSquared = fZReal * fZReal) +
(fZImgSquared = fZImg * fZImg) < 4);)
{
fZImg = 2 * fZReal * fZImg + fCImg;
fZReal = fZRealSquared - fZImgSquared + fCReal;
}
Increment? nah
Hmmm... Now, what can we do with the third part of that for expression? It looks so empty. I know, let's put the rest of the inner loop in there! We can use the magic "comma" operator to combine the two statements. The comma operator is similar to the semicolon, but with some special side-effects. None of the special side-effects are interesting here, we just want to put two expressions into the for statement, where a semicolon wouldn't work. And while we're at it, let's put the
fZRealSquared = fZImgSquared = 0;
line from immediately before the for expression into the first part:
So, here's our inner loop now, with a semi-colon at the end because there's no loop body.
for (fZRealSquared = fZImgSquared = 0, iter=1;
++iter && ((fZRealSquared = fZReal * fZReal) +
(fZImgSquared = fZImg * fZImg) < 4);
fZImg = 2 * fZReal * fZImg + fCImg,
fZReal = fZRealSquared - fZImgSquared + fCReal);
How's
that for ugly? I love it. Here's the whole thing:
void main()
{
char iter;
double fZReal, fZImg, fZRealSquared, fZImgSquared, fCImg, fCReal ;
for (fCImg = -1; fCImg < 1; fCImg+=.05)
{
for (fCReal =-2.3; fCReal < .7; fCReal+=.05)
{
fZReal = fCReal;
fZImg = fCImg;
for (fZRealSquared = fZImgSquared = 0, iter=1;
++iter && ((fZRealSquared = fZReal * fZReal) +
(fZImgSquared = fZImg * fZImg) < 4);
fZImg = 2 * fZReal * fZImg +
fCImg, fZReal = fZRealSquared - fZImgSquared + fCReal);
putchar(iter ? '+' : ' ');
}
putchar(10);
}
while (!kbhit());
}
More of that!
It's starting to look like we can put anything we want into a for expression. Let's see how far we can take that idea. The first putchar call looks like it's part of the inner loop, but it's not. The inner loop has no body - everything happens inside the for statement itself and the loop body is empty (note the semicolon after the final parenthesis on that for(...)). So, the putchar actually belongs to the middle loop. Let's make that more obvious by moving it into the middle for expression:
for (fCReal =-2.3; fCReal < .7; fCReal+=.05, putchar(iter ? '+' : ' '))
{
Let's move the fZReal and fZImg assignments in there, too.
for (fCReal =-2.3; fCReal < .7;
fCReal+=.05, putchar(iter ? '+' : ' '),
fZReal = fCReal, fZImg = fCImg)
{
OK, let's put the other putchar call into its for loop, too:
for (fCImg = -1; fCImg < 1; fCImg+=.05, putchar(10))
And here's the whole thing:
void main()
{
char iter;
double fZReal, fZImg, fZRealSquared, fZImgSquared, fCImg, fCReal;
for (fCImg = -1; fCImg < 1; fCImg+=.05, putchar(10))
{
for (fCReal =-2.3; fCReal < .7; fCReal+=.05,
putchar(iter ? '+' : ' '), fZReal = fCReal, fZImg = fCImg)
{
for (fZRealSquared = fZImgSquared = 0, iter=1;
++iter && ((fZRealSquared = fZReal * fZReal) +
(fZImgSquared = fZImg * fZImg) < 4);
fZImg = 2 * fZReal * fZImg + fCImg,
fZReal = fZRealSquared - fZImgSquared + fCReal);
}
}
while (!kbhit());
}
I have no body!
C is clever. If you have a controlling statement like an
if
,
while
or
for
that only has a single statement in its body, you don't need curly braces. So, we can get rid of the curly braces in the other for loops, too.
void main()
{
char iter;
double fZReal, fZImg, fZRealSquared, fZImgSquared, fCImg, fCReal;
for (fCImg = -1; fCImg < 1; fCImg+=.05, putchar(10))
for (fCReal =-2.3; fCReal < .7; fCReal+=.05,
putchar(iter ? '+' : ' '), fZReal = fCReal, fZImg = fCImg)
for (fZRealSquared = fZImgSquared = 0, iter=1;
++iter && ((fZRealSquared = fZReal * fZReal) +
(fZImgSquared = fZImg * fZImg) < 4);
fZImg = 2 * fZReal * fZImg + fCImg,
fZReal = fZRealSquared - fZImgSquared + fCReal);
while (!kbhit());
}
Now in my opinion, that's awesome.
Final tweaks
Let's change this line:
for (fZRealSquared = fZImgSquared = 0, iter=1;
to this:
for (fZRealSquared = fZImgSquared = (iter=1)--;
And just for fun, let's take that "fZRealSquared =" expression and put it in the final part of the preceding for expression. It's nice to start a for loop with an empty initialization section, once in a while. And let's change the second putchar, to this
putchar(iter ? '?' : ':')
. The repetition of the ? and the : make it hard to read. That's
nice.
And finally...
Let's replace all the long variable names with single letters - like "O", "l", "i", "o", "I" and "_", get rid of any extra white space and lose the
while(kbhit())
line:
void main()
{
double O,x,o,I,l,i;char _;for(l=-1;l<1;l+=.05,
putchar(10))for(i=-2.3;i<.7;i+=.05,putchar(_?
'?':':'),O=i,x=l,o=I=(_=1)--)for(;++_&&((o=O*
O)+(I=x*x)<4);x=2*O*x+l,O=o-I+i);
}
Conclusion
We took a 1030+ character, 60 line program and reduced it to 7 lines and 182 characters, not bad. But most importantly, we reduced it to a nearly incomprehensible stream of ASCII garbage. Mission accomplished.
P.S.
If you'd like to see some more fractals, look
here.
P.P.S.
After some more tinkering, I managed to get it down to 156 characters. Here's the result:
main()
{
float C,l,c,o,O,I=-20;char _;for(;I++<20;puts(""))
for(O=-46;O<14;putchar(_?42:32),O++)for(C=l=_=0;o
=l*l,c=C*C,l=2*C*l+I/20,C=c-o+O/20,o+c<4&&++_;);
}
What's different?
First, the obvious:
- "float" instead of "double", saves one character.
- puts("") instead of putchar(10), saves three.
- Removed the void from "main" saves five chars.
The other characters were much harder to come by and involved removing some redundant initializations, and changing the outer two loops to increment by one instead of by a fraction (forcing two "/20"s later on, but saving two characters overall).
For something completely different...
As was pointed out in a comment, the fact that there are still three "for" loops makes deciphering the code somewhat easy. So, though I'm not going to explain how I
did it (maybe in another article...), here's a version of the program that uses only one for loop:
#include "stdio.h"
main()
{
int b=0,c=0,q=60,_=q;for(float i=-20,o,O=0,l=0,j,p;j=O*O,p=l*l,
(!_--|(j+p>4)?fputc(b?q+(_/3):10,(i+=!b,p=j=O=l=0,c++,stdout)),
_=q:l=2*O*l+i/20,O=j-p+o),b=c%q,c<2400;o=-2+b*.05);
}
The cool things about this one are:
- One for loop
- 186 characters
- Multi-valued output. While the other two versions only output two characters (one for inside the set and one for outside the set), this one allows for many different output characters, depending on the number of iterations required to escape. This means you can see contours outside the set itself, which is closer to how most Mandelbrot viewers show things. You can change the number of characters used by changing... well, you can figure that out yourself.
- Some variables are doing multiple duty; don't read the rest of this sentence if you want to figure out the code for yourself! "q", for example is the number of columns, the max number of iterations and the base character code for output; "b" is a boolean that determines when to print a newline, when to increment the real part of z, etc..
Cheers!
-c