Click here to Skip to main content
Click here to Skip to main content

Big O Algorithm Analyzer for .NET

By , , 16 Feb 2010
 
This Developer centered article with an index rating 5 out of 5

Article Index Rating:
Developer - Developer            (5)     of 5      (Intermediate)
Architect - Architect             (1.67) of 5     (Beginner)
Graphic Designer - Graphic Designer   (3.33) of 5     (Intermediate)

Download Article_Demo.zip - 64.83 KB

Download Article_Src.zip - 110.18 KB

 Article2.PNG

Table of Contents

Introduction 

   After doing some research on 'Big O Notation' I read some posts from some developers explaining that you have to do 'Big O Equations' in your head.  Perhaps they were pulling some ones leg?  I don't know.  But it gave me the idea to create this tool, which can help to find the 'Big O function' for a given algorithm.  The application uses C# Reflections to gain access to the target .net assembly.  Infinite asymptotic's and instrumentation are used to graph the function in relationship to time.  When using infinite asymptotic's it is necessary to create a method or property which takes an integer as input.  The integer value is the infinity point, the point at which the function evaluates to an infinite quantity.  This point can be time, value, etc.  In the examples Fibonacci functions are evaluated. 

Here is a graph of Two linear recursive Fib's: Article1.PNG

Here is the original prototype graphing an exponential recursive Fib:

Image 1 - Exponential Fibonacci

ExponentialFib.PNG

I used Derek Bartram's WPFGraph in the Demo which is not licensed for commercial or non profit.  If you need to use this tool for commercial or non profit, you can extend the functionality of the prototype.  There are a few minor glitches with my implementation of Derek's WPFGraph.  Major problem is with the performance scaling.  In my prototype I use modulus scaling which is very exacting.  I tried to use the same scaling with Derek's graph, however the performance grid lines don't exactly match the points on the grid.  If you need exact grid lines and fast performance extend the prototype. 

Image 2 - Example of the prototype displaying O(n log n) Fibonacci

Article3.PNG

  The application is broiler plate, this should give you the ability to extend the features to your own specification.  Here is the same graph using Derek's graph:

Image 3 - Modified graph with grid lines and power graph

Article.PNG

Another minor glitch is the time base jump at the start of the graph line.  Graphing with high infinity values improves this, however a threading fix is probably necessary.  I have a request into MSDN for a fix.  I will post the fix once I get it.

The tool uses heuristic to find the 'best guess' Big O Function:

Image 4 - The heuristic graph can also be displayed

Article5.PNG 

And allows the user to review the details:

Image 5 - The calculated results from the heuristic

Article4.PNG

Special thanks to Derek for all his hard work!  Great Graph!

Mathematical Approach:

The mathematical approach to fining the Big 'O' is to mark up the algorithm using function indexes.  The is the 'it takes brains' method of finding the solution the developers were discussing in the online help thread.  Here is an example I found on the internet which illustrates the 'mark-up' and calculation.

int foo(int n)
{
  int p = 1;	//c1	  x	1
  int i = 1;	//c1	  x	1
  while(i < n)	//c2	  x	n
  {
    int j = 1;	//c1	  x	(n - 1)
    
    while(j < i)//c2	  x	((1/2)n^2 - (3/2)n + 2)
    {
      p = p * j;//(c1+c3) x	((1/2)n^2 - (3/2)n + 1)
      j = j + 1;//(c1+c4) x	((1/2)n^2 - (3/2)n + 1)
    }
    i = i + 1;	//(c2+c4) x	(n - 1)
  }
  return p;	//c5	  x	1
}

Where:
c1 = Assignment
c2 = Comparison
c3 = Multiplication
c4 = Addition
c5 = Return


The mathematical equation is simplified:

Step 1:
(c1 + 1/2c2 + 1/2c3 + 1/2c3)n^2 +
(-c1 - 1/2c2 - 3/2c3 - 1/2c4)n +
(2c1 + 2c2 + c3 + c5)

Step 2:
(c1 + 1/2c2 + 1/2c3 + 1/2c3)n^2

Step 3:
n^2

Therefore the Big 'O' for this foo algorithm is n^2 or quadratic.

The full example can be viewed at:

Big O Notation - CS Animated  

About the Code

The application uses the Model, Viewer, Presenter; Factory, and Publisher / Subscriber patterns. The viewer is the XAML Window which allows for easy design by designers, the presenter is implemented in the xaml.cs of the window. The factory and publisher-subscriber pattern is implemented in the AnalysisFactory. The MVP pattern is a good stepping stone in prototype development as it is easy to make custom abstractions with out sacrificing code base.

About the Examples:

The examples featured in this demo use the Fibonacci sequence to illustrate different methods of calculating the Fibonacci number for a given index.  The examples are to be used with the infinite asymptotic approach in the future this article will be updated with examples using the instrumented approach.

The asymptotic is a 'running time' asymptotic not an 'operation count' asymptotic.  This is important to understand.  Mathematics and computer code can both be represented using Big 'O' notation for operation count which are evaluated differently then currently featured in this article.  In the future an addition will decompile the reference code and calculate both the operational count and the running time.  The running time does offer some 'real world' values which would not be discovered suing the operation counting method.  For example: The code under test has an algorithm running in log time and at the surface the algorithm might appear to be linear.  The operating system, and the application background process can also affect real world results.

To create a simple asymptotic test all that is required is that the method invocation be publicly visible and that it take a single argument which is the number of iterations to run.  The code performing the logic uses .NET Reflections to gain access to the assembly and display the public methods.

Here are some examples of the Fibonacci algorithms used in the example:

Figure 1 - Closed Form Fib Class

    class fib 
    {
        double n = 0;
        int n2 = 0;
        public double next
        {
            get
            {
                return n;
            }

            set
            {
                n = Math.Floor(((Math.Pow(fib.golden(), 
                    Convert.ToDouble(value))) -
                    Math.Pow((1 - fib.golden()), 
                    Convert.ToDouble(value))) / Math.Sqrt(5));
            }
        }

        private static double golden()
        {
            return (1 + Math.Sqrt(5)) / 2;

        }
    }

Figure 2 - Closed Form Test Code O(1)

        public static double ClosedFormFib(double n)
        {
            fib f = new fib();
            f.next = n;
            return f.next;
        }

Figure 3 - Linear Fib - Actually O(n log n)

The author who posted this algorithm incorrectly marked it as a linear solution when in fact it runs in log time.

        public static double LinerFib(double n)
        {
            double previous = -1;
            double result = 1;
            double sum = 0;

            for (double i = 0; i <= n; ++i)
            {
                sum = result + previous;
                previous = result;
                result = sum;
            }
            return result;
        }

Figure 4 - Linear Recursive Fib - Again O(n log n)

        public static double LinerRecursiveFib(double n)
        {
            return Fib(n, 1, 1);
        }

        private static double Fib(double n, double n1, double n2)
        {
            if (n == 1 || n == 2)
            {
                return 1;
            }
            else if (n == 3)
            {
                return n1 + n2;
            }
            else
            {
                return Fib(n - 1, n1 + n2, n1);
            }
        }

Figure 5 - Exponential Fib - O(n^2)

        public static int ExponentialRecursiveFib(double n)
        {
            if (n == 1 || n == 2)
                return 1;
            else
                return ExponentialRecursiveFib(n - 1) + 
                       ExponentialRecursiveFib(n - 2);
        }

At this point you may be thinking 'well this is good if the algorithm takes an integer as input for loading. What about other algorithms which don't such as sorting etc.' To answer this question first the data type and algorithm must first be considered. For example a string array bubble sort. For testing this algorithm using asymptotic approach I would create a constructor in my class which will be tested. Something like:

Figure 6 - Testing non-numeric data types and algorithms

  

  ArrayList al = new ArrayList();

  void LoadMe(){ LoadMe(1000); }

  void LoadMe(double d)
  {
    //create random string elements and add to string array
    //add each element to the ArrayList.
  }

  public static double TestBubbleSort(double n)
  {
    //Get each string array from ArrayList then call the bubble sort
    //on the string array.
  }

This method will work for loading and testing with out having to custom instrument the test code. The code of data load the ArrayList should be incurred at start up, however the size must be hard coded into the test code.

Points of Interest:

Interesting things about heuristics

The heuristics uses the Meta.Numerics.Statistics class library to find a strong correlation between the items in two series. Series 1 (test results) contains the values of the test run in actual form and value. Series 2 (control group) is a series of values from the equation in form of the following:

Figure 7 - Big 'O' function series generation

        //O(n log n)
        static double nLogn(int n)
        {
            double r = Math.Log(n) * n;

            if (r.ToString() == "NaN" || 
                r.ToString().IndexOf("Infinity")
                != -1)
                return 0;
            else
                return r;
        }

        public static DecoratedMarker nLognLine(int n)
        {
            double[] ret = new double[n];

            for (int y = 0; y < n; y++)
                ret[y] = Functions.nLogn(y);

            return new DecoratedMarker(ret, "O(n log n)", null);
        }
The above is computed for each Big 'O' function under test. A function library can be found in the solution and is easily updated. The exact statistical formula used is 'Person's Rio'. More information on Person's Rio can be found on other wiki's. Other formulas could also be used. Some ideas I have considered are: Finding the difference under the arch of the curve formula, correlation of ratios, etc. The statistical package placed in its own class so additional packages can easily be loaded or overloaded.

Interesting things about the graph

I really don't totally understand the magic of modulus.  I do value it's power though.  I had to tinker with the modulus equations to get the scaling functions to work in the prototype.  Perhaps I need to brush up on modulus.  Here are the modulus functions for scaling:

((dm.Point[1] * PointScale % infinityPoint) / 10) * PointRullerDivision;  //Gets the vertical scaling.

(aggT * TimeScale % totTime) / 10 * TimeScaleDivision; //Gets the horizontal scaling.

PointScale is calculated by dividing 100 by infinityPoint (this is an inverse function), making the function 100 base. PointRullerDivision is calculated by dividing the graph Canvas by 100 (compliment to inverse function). The horizontal (x axis, time) is caculate the same way, however the aggregate time ; aggT, is used in place of the point index dm.Point[1]. Total time is calculated using a LINQ to Object lambda function (also pure magic as far as I know).

Math.Round(dms.Sum(p => p.Point[0]),0)

I thought it was interesting that after refactoring the prototype to use Derek's graph the modulus functions still worked.  Derek's scaling calculates the scale by dividing sections of the scale in a recursive loop.  I did run into a small problem with a rounding error after refactoring.  Sometimes it's the little things that make a big difference.  Rounding the totTime to Zero decimal places and not rounding the aggT causes a displacement in the performance graph where the x axis grid lines do not match a point on the graph, but will not work with Derek's graph unless the total time is rounded.  The prototype works perfectly.

Another point of interest is Derek's grid points.  If you get the rendered geometry of one of his grid points, the bounds evaluates to infinity.  I tried all methods of getting the position to correct the rounding error.  Nothing worked!  Must be magic.  So the featured application has many workarounds for the integration of the WPFGraph.

The Power Graph

The power graph is a what I think of as a clever way to visualize the performance of the algorithm under test. The formula used is only a weighted metric.

Pseud Code

If unit n-1 is two times n then color value = color value - 10

this does of coarse have a dependency on the grid lines which are equally spaced vertically in increments of 10 and then asymmetrically spaced on the horizontal according to where the vertical line and the new point cross x,y axis. This produces a visual which has does not distort the mathematical formula of the graph Ie. Linear, Log, or Exponential.

Conclusion and the Next Phase:

In conclusion I discovered this research is unique to what I have done and demonstrated. While it is currently causing controversy I believe it is a significant advancement in where the 'Rubber Meets the Road'. Mathematical proofs for algorithms can be useful, but when engineering is concerned the proof doesn't meet the pudding! This is often the case in theory and development and engineering. It is my hope that this tool will become useful to mathematicians and engineers for establishing real and live estimations of how an algorithm runs in real time. I plan to complete this prototype and add features that allow an algorithm to be written with out the use of Visual Studio, right in the application it self. I will also be converting the MVP pattern to MVVM. And take a further look at the heuristic model being used, which is what seems to be causing controversy. If anyone with expertise would like to join me in this venture just let me know. The final product will be posted on The CodePlex, and have follow up articles here on The Code Project to demonstrate it's usefulness, and areas that it is not acceptable for use. That is it for now this article is a wrap!

External Links:

Prototype Research

Graphic Designer - Graphic Designer (3.33) Developer - Developer (5) Architect - Architect (1.67)

Revision History:

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Authors

dawright
United States United States
Member
I am a .NET developer who works daily on enterprise-scale applications using C#, SQL, XML, ASP.NET, and myriad other technologies. My academic background is in physics and economics.
 
I am the original architect of Sandcastle managed reference documentation engine and of the Meta.Numerics library for scientific computation.

Proto-Bytes
CEO AW Proto-Code, Inc.
United States United States
Member
Organisation (No members)

I’m a self learned wiz kid turned Architect. Stared with an Apple IIe, using AppleSoft Basic, ProDos and Begal Basic at age 10.
 
Picked up LPC in college before the Dot Com Explosion. Wrote some Object C in the USAF for one of my instructors. Got a break from a booming computer manufacture testing server software. Left the Boom and went to work for a Dot Com built from the ashes of Sun Micro in CS. Mentoring in Java solidified me as a professional developer. Danced in the light of the sun for 13 years, before turning to the dark side. An evil MVP mentored me in the ways of C# .NET. I have not looked back since.
 
Interests include:
 
~ Windows Presentation Foundation and Silverlight
~ Parallel Programming
~ Instruction Set Simulation and Visualization
~ CPU to GPU code conversion
~ Performance Optimizations
~ Mathematics and Number Theory
~ Domain Specific Languages
~ Virtual Machine Design and Optimization
~ Graphics Development
~ Compiler Theory and Assembler Conversion Methodology
~ CUDA, OpenCL, Direct Compute, Quantum Mechanics
 
IEEE Associate Member 2000

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
GeneralI'm Derekmemberderek_bartram25 Feb '13 - 12:50 
Always nice to come across someone using my (now somewhat old) code! Very interesting article, nicely done.
 
If anyone has a pressing feature they would like adding, give me a shout.
Do you like fishes? I do.

GeneralMy vote of 1membermojtaba ali Zadeh1 Mar '12 - 14:52 
because he introduced new software!
GeneralMy vote of 5memberFilip D'haene27 May '11 - 4:29 
Excellent article!
 
Thanks for sharing. Smile | :)
GeneralMy vote of 5memberChantiPDM11 Jan '11 - 11:19 
Useful tool. Nice article. 5 points.
NewsNotes on building the solution using Visual Studio 2010 RCmemberTheArchitectmc1 Mar '10 - 19:36 
I have just successfully built the solution and tested it using Visual Studio 2010, everything works just fine; however there are some notes:
 
1. Meta Numeric's must be installed from CodePlex.
2. The Binaries for the WPFGraph, ColorPicker, and Generics are in the demo download bin folder and not a part of the solution. They must be added to the bin folder of the solution to resolve the references.
3. the WPFGraph must be removed from the solution and then added as a reference, not sure why this is the case but a compiler error occurs and the reference is not automatically resolved unless you do this manually.
Other then that it works perfectly!

 
Cool | :cool:
GeneralRe: Notes on building the solution using Visual Studio 2010 RCmemberMember 774982210 Dec '12 - 17:19 
Is anyone else getting an error in VS2010 (even after following above VS2010-specific steps) that Meta.Numerics.MultivariateStatistics doesn't contain a definition for PearsonRTest method? I can't build the solution (specifically the main WPFBigOAnalyzer project) in VS2010 currently - this is where I'm stuck
AnswerRe: Notes on building the solution using Visual Studio 2010 RCmemberJoe Skeen29 Apr '13 - 12:03 
I had the same issue when I added the reference to Meta.Numerics. The current version (v2.1) doesn't have that method, but if you add the version it originally referenced (v1.2) it compiles fine.
Meta Numerics Version 1.2 (CodePlex)[^]
NewsBlog on this article updated:memberTheArchitectmc28 Feb '10 - 14:16 
Here is the link to the Blog:
 
Big O Analyser for .NER - Febuary Update[^]
 
Thumbs Up | :thumbsup:
GeneralRe: need help maybe im just a noob..memberTheArchitectmc5 Feb '10 - 9:18 
Hmm, I don't know what the problems are with the WPF graph. I did specialize the WPF graph a little bit, you can get the built assembly from the demo. As for Meta.Numerics you will have to go to The Code Plex and download it there here is the link:
 
Meta.Numerics at The Code Plex[^]
 
Since they both projects have different licenses then the one I use I could not add them to the source of the project since the licenses are not compatible.
 
Once you install the Meta.Neumerics it will be added to the .NET assembly bin folder on your machine and then it should have no problem resolving the missing assembly.
 
See thread below on the WPF Graph changes and where Derek's graph is located, it's here on The Code Project. The changes were minor and if you need to know what they are simple use any disassembler on the WPF Graph I have in the demo bin folder.
 
Sniff | :^)
 
Did you try this with the demo or the source code for the project?
 
"Make everything as simple as possible, but not simpler."
-- Albert Einstein
 
"It didn't matter to us whether people believed in us. We believed in ourselves. We had the courage to follow our own path."
~~Nvidia's Jen-Hsun Huang

GeneralMy vote of 1mvpPaul Conrad1 Jan '10 - 11:23 
Could not compile the source code. Please provide information about what libraries are needed from WPFGraph to successfully compile the project and I will consider revoting for a higher score.
GeneralRe: My vote of 1 - [Mising Derek's WPF Graph Modifications]memberTheArchitectmc3 Jan '10 - 5:49 
The source code for the WPFGraph is included, at lease the part I have made modifications to. You will need to add the assembly to the project if it is not already. Or you could simply just add the assembly got the WPFGraph from the demo. Additionally the Meta.Neumerics lib is not in the download since it is a CodePlex project.
 
The source for the WPFGraph is not in the downloadable source. The binary is in the 'Release' directory. You can use any decompiler to get the full source listing. However, there are very few modifications to the graph it self. The modifications made 'Power Graph' and 'Grid Lines' are made the source of the project. Look at Window.XAML and window.cs.
 
Derek's WPFGraph...
 
A WPF Graph Control Library[^]
 
Let me know if I can be of more help.
 
~TheArch
 
modified on Sunday, January 3, 2010 11:55 AM

 
modified on Friday, February 5, 2010 3:21 PM

GeneralRe: My revote of 5mvpPaul Conrad4 Jan '10 - 8:01 
Thank you and I'll go through Derek's source code and take it from there... I did check out the demo and it looks really cool, thus, an appropriate revote Big Grin | :-D
 
"The clue train passed his station without stopping." - John Simmons / outlaw programmer
 
"Real programmers just throw a bunch of 1s and 0s at the computer to see what sticks" - Pete O'Hanlon
 
"Not only do you continue to babble nonsense, you can't even correctly remember the nonsense you babbled just minutes ago." - Rob Graham


NewsRe: My revote of 5 [Missing Derek's WPF Graph modifications]memberTheArchitectmc4 Jan '10 - 11:34 
Blush | :O
 
Thank you Paul!
 

Paul Conrad wrote:
I did check out the demo and it looks really cool

 
Yes, I had much fun doing the R&D on this one. It was always one of those things to learn; 'Big O', I still have many more ideas that will make it into the next increment. I will add an 'algorithm experimenter' where you can write .NET code and then click a run button to see how well the code performs and whet Big O it finds. Maybe the same for the Big O functions as well... If you have any other creative ideas let me know I'll add it..
 
Rose | [Rose]
 
modified on Friday, February 5, 2010 3:20 PM

QuestionDid you vote lower then a 5?memberTheArchitectmc28 Dec '09 - 4:19 
I am trying to improve this article and would like to know if you voted lower then a five why, and how I should improve the article.
 
Also, I would ask everyone who reads the article to vote, esp. if you book mark the article or try the code.
 
Thanks,
TheArch
NewsDid you create a new Big O Function Heuristic (Share um' here)memberTheArchitectmc21 Jul '09 - 14:51 
If you would like to share any new Big O Functions to add to the function library post them to this thread.
 
Thanks,
~TheArch
Thumbs Up | :thumbsup:
 
modified on Wednesday, July 22, 2009 10:24 PM

NewsBugs in the Tool (Report um' here)memberTheArchitectmc21 Jul '09 - 14:21 
Know bugs as of 22JUL2009:
 
1. The first time an algorithm is tested the heuristic graph does not display.
<font color="blue">   Workaround: Right click on the empty graph and click 'Refresh'.</font>
<font color="green">   Need ideas from the community on how to fix.</font>
 
2. When switching from one algorithm to another an exception is thrown.
<font color="green">   Need ideas from the community on how to fix.</font>
 
3. (Unknown) Instrumentation works.  I have not tested this feature yet.  It's a 'ToDo'.
 
4. Hick-up in timebase for testing algorithms.<font color="blue">   Workaround: If
 needed you can use the Max (time) property in the Analyzer class to remove any ernios hick-
ups.  I did not add the workaround because I don't think this is a good fix.  I have a request
 into a MSND newsgroup for a solution to the problem.  I think what is necessary is that an
 exception trap must be created when the OS interrupts the executing code.  However I have not
 been able to find how to trap an OS interruption.</font>
   <font color="orange">Temp Fix: I will add a check bow to disable the performance graph if
 checked in the next increment.</font>
 
5. Rounding problem with Derrek's WPF Graph.  This is most certainly cause by the way I use
 modulos to calculating the point scale.  This reduces the performance graph to only working
 correctly when the infinity point is divisible by 100.
   <font color="blue">Workaround:  If you care evaluating lower order infinity points such as
 the Exponential Recursive Fib (infinity reached in time >~70), you could add a disable
 performance graph test if the infinity point is not divisible by 100.  I think this is an okay
 workaround and will add it to the next increment when I add the instrumentation examples.</font>
 
6. The combo list box for Big O function does not do anything at all.  The proof of concept
 prototype I envisioned this feature to allow the user to select an initial best guess
 function.  This is how development works sometimes; you have an idea, write the code, and the
 idea seems not as necessary as was thought.
   <font color="blue">Fix:  After evaluating higher order infinity points I realized this 
feature would be necessary after all.  For example:  The Liner Recursive Fib evaluated at IP: 1000
 sometimes return Big O(n<sup>2</sup>).  The author claims it is linear.  There is only a 
slight variance in the correlation of .000002 between O(n) and O(n<sup>2</sup>), and really 
close variance on O(n log n) and o(log n).  I'm going to add a default threshold of .000002
 and allow this to be edited.  If the user selected Big O O(n) in this situation the threshold
 would be broken and Big O(n) would be reported.  I will also add documentation in the
 Statistical Data tab to indicate the threshold was broken and a different function was
 selected.  I will list all functions that are close to the threshold in the 'Best Guess'
 results.</font>

QuestionWhat are your thoughts? How can I improve this article?memberTheArchitectualizer12 Jul '09 - 4:10 
Please let me know if there is something I am doing wrong, or something which needs improvements. I will make any changes necessary.
 
Thanks.
 
modified on Wednesday, July 22, 2009 10:28 PM

AnswerRe: What are your thoughts? How can I improve this article?memberS. Senthil Kumar12 Jul '09 - 4:48 
1. I'm afraid I don't get the purpose of the application. I get that given a function that takes an integer, it shows a graph of its execution time against the integer value. How does that tell you what the big O notation is? Sure, you could tell a O(n) function from a O(log n) function, but what about other functions that take polynomial time? O(n log n)?
2. Only few algorithms can be translated into a function that takes an integer. How would you analyze the running time of a sorting algorithm, for example? Quick sort, for example, has worst case running time if the input array has the same elements - how'd you figure that out programmatically using a timing approach?
3. I had a quick look at your code and noticed a few strange things.
a. Why do you run at the highest priority?
b. http://msdn.microsoft.com/en-us/library/system.gc.keepalive.aspx[^] doesn't disable garbage collection - it merely prevents the reference from being collected if there are no references from the code
 
Regards
Senthil
_____________________________
My Home Page |My Blog | My Articles | My Flickr | WinMacro

AnswerRe: Big O function finding has not been implemented yet.memberTheArchitectualizer12 Jul '09 - 5:42 
S. Senthil Kumar wrote:
1. I'm afraid I don't get the purpose of the application. I get that given a function that takes an integer, it shows a graph of its execution time against the integer value. How does that tell you what the big O notation is? Sure, you could tell a O(n) function from a O(log n) function, but what about other functions that take polynomial time? O(n log n)?

 
'The purpose is to find the Big O. I am going to complete the heuristics this weekend for finding the 'closest' Big O.' The tools will not magically guess what the limitations of the algorythm are. It simple helps finding the relationships to a known Big O. for example if you look at the exponential recursive fib function you would notice that below 26(n) or so it changes from O(n) to O(n^2) so you could write the function as:
 
O(n) < 26 = O(n^2) > 26
 
Also the linear fib functions show they really are not linear they are closer to O(n log n).

 

S. Senthil Kumar wrote:
2. Only few algorithms can be translated into a function that takes an integer. How would you analyze the running time of a sorting algorithm, for example? Quick sort, for example, has worst case running time if the input array has the same elements - how'd you figure that out programmatic using a timing approach?

 
'This is where instrumenting the function is necessary. You would load the data type according to your data profile, and create a data series at each target loading profile. Other things which are useful through instrumenting are: graphing the relationships of things like: Object pile up and garbage collection, growth of thread stack size, etc.'
 

S. Senthil Kumar wrote:
a. Why do you run at the highest priority?

 
'This blocks all threads from interruption except the OS threads. I will add some configuration options in the next increment to change some of these settings.'
 

S. Senthil Kumar wrote:
b. http://msdn.microsoft.com/en-us/library/system.gc.keepalive.aspx[^] doesn't disable garbage collection - it merely prevents the reference from being collected if there are no references from the code

 
'Yes, this is correct. I disable the collection of objects on the instrumented execution so the object heap can be analyzed. I will add configuration options in the next increment.'
 
Thanks for the questions, hope you got the right answers.
Cool | :cool:
 
modified on Wednesday, July 22, 2009 10:29 PM

GeneralRe: Big O function finding has not been implimented yet.memberS. Senthil Kumar12 Jul '09 - 6:09 
TheArchitectualizer wrote:
I disable the collection of objects on the instrimented execution

 
But you do have a reference to the object in your code, so the GC.KeepAlive call isn't necessary.
TheArchitectualizer wrote:
'This blocks all threads from intruption except the OS threads.

 
The GC thread is not an OS thread. If that gets blocked and the function you are measuring creates objects, then there's a good chance your app will run out of memory very soon.
 

TheArchitectualizer wrote:
You would load the data type acording to your data profile, and create a data series at each target loading profile.

 
But without looking at the algorithm, how will you choose the data profiles? How'll you know what input will get you best, average and worst case running time? If you've already figured that out, it's just a small step to find the running time in terms of Big O.
 

TheArchitectualizer wrote:
O(n) < 26 = O(n^2) > 26

 
If f(n) is O(g(n)), you're expressing the fact that f(n) grows at or less than that rate at which g(n) grows. The constants don't matter (in theory). Therefore your function is O(n^2). For small values of n, most functions look linear anyway.
 
That said, I can see one practical use for this - it helps find the number x beyond which the algorithm starts to take too much time.
 
TheArchitectualizer wrote:
The purpose is to find the Big O

 
Hmm, I'd be impressed if you can pull it off with this approach. You are essentially trying to guess the internals of an algorithm by looking at its running time.
 
How do you plan to deal with cases where a single invocation takes a long time - say 1 hour?
 
Regards
Senthil
_____________________________
My Home Page |My Blog | My Articles | My Flickr | WinMacro

GeneralRe: Big O function finding has not been implemented yet.memberTheArchitectualizer12 Jul '09 - 6:50 
S. Senthil Kumar wrote:
Hmm, I'd be impressed if you can pull it off with this approach. You are essentially trying to guess the internals of an algorithm by looking at its running time.
 
How do you plan to deal with cases where a single invocation takes a long time - say 1 hour?

 
S. Senthil Kumar wrote:
But you do have a reference to the object in your code, so the GC.Keep Alive call isn't necessary

 
Well yes I do have a strong reference, but the executing code may create references to other objects.
 

S. Senthil Kumar wrote:
The GC thread is not an OS thread. If that gets blocked and the function you are measuring creates objects, then there's a good chance your app will run out of memory very soon.

 
'Right, but other threads in the target could cause erroneous results. The OS threads can't be blocked. ie. when the OS does something like a page fault. I'm blocking everything in the application domain.'
 

S. Senthil Kumar wrote:
But without looking at the algorithm, how will you choose the data profiles? How'll you know what input will get you best, average and worst case running time? If you've already figured that out, it's just a small step to find the running time in terms of Big O.

 
'Who said I wasn't going to look at the algorithm? I plan to recompile the target using Reflections.Emit.
S. Senthil Kumar wrote:
If f(n) is O(g(n)), you're expressing the fact that f(n) grows at or less than that rate at which g(n) grows. The constants don't matter (in theory). Therefore your function is O(n^2). For small values of n, most functions look linear anyway.
'
 
That said, I can see one practical use for this - it helps find the number x beyond which the algorithm starts to take too much time.

 
'Yes at this stage of prototype development this is all it is good for. Hopefully the next increment will have more value'
 
'Yeah, this is a heuristic method. Using both Engineering Heuristics and Computer Science Heuristics, and maybe a little human.
 
One approach to finding long running functions is just to let it run. I plan on adding a method of running the heuristic at different levels of complexity and have a break point at which execution time is seen as almost infinite. From the low level runs a close relationship can be found to approximate where the curve extends. This is how heuristics works, it's not an exact mathematical proof, rather aids in discovering a proof.
'
 
Cool | :cool:
 
modified on Wednesday, July 22, 2009 10:30 PM

GeneralRe: Big O function finding has not been implimented yet.memberS. Senthil Kumar12 Jul '09 - 8:04 
TheArchitectualizer wrote:
Well yes I do have a strong refrence, but the executing code may create refrences to other objects.

 
Ok, so how does that make you call GC.KeepAlive on the strong reference you have?
TheArchitectualizer wrote:
'Right, but other threads in the target could cause ernious results. The OS threads can't be blocked. ie. when the OS does something like a page fault. I'm blocking everything in the application domain.'

 
Thread priorities are maintained machine wide, so it's not just your app domain. My point is that setting your thread priority to the max might cause the GC and finalizer threads to be not scheduled, and that would be disastrous.
 
Regards
Senthil
_____________________________
My Home Page |My Blog | My Articles | My Flickr | WinMacro

AnswerRe: Big O function finding has not been implimented yet. [modified]memberTheArchitectualizer12 Jul '09 - 8:36 
S. Senthil Kumar wrote:
Thread priorities are maintained machine wide, so it's not just your app domain. My point is that setting your thread priority to the max might cause the GC and finalizer threads to be not scheduled, and that would be disastrous.

 
'I think you have a good point. Most of my research on GC is from working on Java for years. I guess it's a bad assumption to think it works the same in .net. I have never seen a parallel in .net to Sun's Java Language Specification. So I do not have total domain knowledge for .net. This is as close as possible solution to what I am accustom to in Java.'
 
modified on Wednesday, July 22, 2009 10:32 PM

NewsRe: Big O function finding has not been implemented yet. (Add Heuristics)memberTheArchitectmc21 Jul '09 - 13:38 
Heuristics are now in the code base. The tool correctly finds the Big O function for a given know set of Big O functions.
 
Big O Functions implemented:
Linear:
O(1)
O(n)
 
Log:
O(n log n)
O(log n)
O(log log n)
 
Exponential:
O(n2)
O(n3)
O(nn)
 
Factorial:
O(n!)
 
I do not know how to correctly implement the following:
O(α(n))
O((log n)c)
O(nc)
O(nc)
 
If any one can provide solutions for these functions I will add it to the code base.
 
Tips:
 
When evaluating a possible O(1) function, higher order Infinity Points must be used, additionally running the test multiple times will improve the best guess.
 
The above is also generally true for all functions, however the ones I tested at infinity point 100 returned the correct result about 80% of the time.
 
Oh yes,
The project uses the Meta.Numeric Scientific lib by dawright. You will have to download the code from CodePlex. dawright helped alot, so I gave him credit for the article as a co-author.
 
~TheArch
 
modified on Wednesday, July 22, 2009 10:35 PM

AnswerRe: What are your thoughts? How can I improve this article?mvpPete O'Hanlon13 Jul '09 - 2:47 
First of all, you have introduced your own standard for target audience/skill level. This is a fairly confusing muddle, and is at odds with CPs own rating system.
 
Second point - why MVP? MVVM is well on the way to becoming the de-facto standard in WPF development. Would you mind elaborating on why you chose MVP over MVVM (and yes, I appreciate they appear very similar, but MVVM is closer to the passive presenter pattern, whereby the presenter is not responsible for any of the model-sync work).
 

"WPF has many lovers. It's a veritable porn star!" - Josh Smith

As Braveheart once said, "You can take our freedom but you'll never take our Hobnobs!" - Martin Hughes.

My blog | My articles | MoXAML PowerToys | Onyx



GeneralRe: What are your thoughts? How can I improve this article?memberTheArchitectualizer13 Jul '09 - 3:26 
Pete O'Hanlon wrote:
First of all, you have introduced your own standard for target audience/skill level. This is a fairly confusing muddle, and is at odds with CPs own rating system.

 
'Sorry didn't intend to be at odds with CP. In my articles I try to address developers, architects, and designers. I simply index the article for those audiences. If an architect is browsing and seems that the article only has a 1.67 (Golden Ratio) rating for an article of mine it give an indication of who my target audience is.
 
The article is index according to CP: Architect, Dev, QA, and Design. But has no way indicating relevance.
'
 

Pete O'Hanlon wrote:
Second point - why MVP? MVVM is well on the way to becoming the de-facto standard in WPF development. Would you mind elaborating on why you chose MVP over MVVM (and yes, I appreciate they appear very similar, but MVVM is closer to the passive presenter pattern, whereby the presenter is not responsible for any of the model-sync work).

 
'Hmm, Okay good point. I honestly have not done any research on presentation patterns since 2008. I will look at MVVP MVVM today. I was going to do some refactoring on this project any way, before I produce the next increment.'
 
modified on Wednesday, July 22, 2009 10:36 PM

GeneralRe: What are your thoughts? How can I improve this article?mvpPete O'Hanlon13 Jul '09 - 3:42 
OK - MVVM is well described by Josh Smith and Karl Shifflett - and I have an article here[^] on CP that uses MVVM exclusively (and as it's a fairly clean implementation, extraneous details don't get in the way).
 

"WPF has many lovers. It's a veritable porn star!" - Josh Smith

As Braveheart once said, "You can take our freedom but you'll never take our Hobnobs!" - Martin Hughes.

My blog | My articles | MoXAML PowerToys | Onyx



GeneralRe: What are your thoughts? How can I improve this article?memberTheArchitectualizer13 Jul '09 - 5:34 
Pete O'Hanlon wrote:
OK - MVVM is well described by Josh Smith and Karl Shifflett - and I have an article here[^] on CP that uses MVVM exclusively (and as it's a fairly clean implementation, extraneous details don't get in the way).

 
Thanks, I will look at it. I'm looking at the MSDN article right now. They name Martin Fowler as the source of Presentation Model(PM), and that MVVM uses the same abstractions.
 
WPF Apps With The Model-View-ViewModel Design Pattern[^]
GeneralRe: What are your thoughts? How can I improve this article? (Use MVVM Pattern)memberTheArchitectualizer13 Jul '09 - 8:36 
Okay,
 
Got the basics going to give it a try. It's much more complicated than MVP. I am going to provide a link for the MVP prototype for less experienced readers. I'm also going to add a "Dynamic View/ViewModle" for configuration. I will post the update to this code base, and use it in the following increments.
 
Thanks for the suggestion.
Wink | ;)
QuestionRe: What are your thoughts? How can I improve this article?memberTheArchitectmc23 Jul '09 - 10:51 
Hmm,
 
The MVVM Patter is really difficult to implement. Would you like to give me a hand. I don't understand the routed event model very well.
 
I have refactored about 80% the only thing left to do the eventing wire up. I will give you total credit if you like.
 
The other option which I think is better from a design stand point is to use the new Behavior model in Blend 3. What are your thoughts?
 
Thanks!
~TheArch
AnswerRe: What are your thoughts? How can I improve this article?membersupercat913 Jul '09 - 5:57 
I'm not quite sure I see the practical usefulness of discovering the supposed O(n) of a function, since on modern machines secondary effects like CPU caching have an increasing impact on execution time. A program which accesses memory in cache-friendly fashion may execute ten times as many instructions as one which accesses data more randomly, and yet still run faster. If it's necessary for any data to get swapped out to disk, even increasing instruction count 1,000 fold may be worthwhile if it reduces the amount of swapping.
 
Unfortunately, I don't know any really good way of figuring out how to make programs run optimally except to test out various implementations and see how they fare with data sets of the expected size and "texture".
GeneralRe: What are your thoughts? How can I improve this article?memberTheArchitectualizer13 Jul '09 - 6:44 
supercat9 wrote:
I'm not quite sure I see the practical usefulness of discovering the supposed O(n) of a function, since on modern machines secondary effects like CPU caching have an increasing impact on execution time. A program which accesses memory in cache-friendly fashion may execute ten times as many instructions as one which accesses data more randomly, and yet still run faster. If it's necessary for any data to get swapped out to disk, even increasing instruction count 1,000 fold may be worthwhile if it reduces the amount of swapping.

 
'I don't know. They teach Big O in the Ninth grade now. I figured that I should learn Big O since the new crop of developers will all be using it. 'IA' tests the bounds of an algorithm under a certain circumstance. Instrumentation + 'IA' allows the algorithm to be tested for specific conditions like you are describing. It is useful to explore the design of the algorithm using different technologies. For for example: I have x running on y, how does x run if I use Parallel Extensions on z?
 
Random access is faster than sequential or cache-friendly fashion esp. if you use the correct data type like a hash.
 
Increasing the instruction count 1,000 fold might make a different if the factor is executed in a loop. 1,000 quickly becomes 10^9.
'
 

supercat9 wrote:
Unfortunately, I don't know any really good way of figuring out how to make programs run optimally except to test out various implementations and see how they fare with data sets of the expected size and "texture".

 
'Maybe you are missing the point, this is what this tool is designed to do. Test the bounds of an algorithm to find the optimal conditions.
 
I'm refactoring for MVVM right now, I will add examples of instrumentation and data bounds checking.
'
 
Cool | :cool:
 
modified on Wednesday, July 22, 2009 10:38 PM

GeneralRe: What are your thoughts? How can I improve this article?membersupercat914 Jul '09 - 6:06 
Random access is faster than sequential or cache-friendly fashion esp. if you use the correct data type like a hash.
 
Actually, hash tables are a common example of what I'm talking about. In many situations, it may be faster to use a hash table which is small enough that it has some collisions, and then use an efficient means of handling those in cache-localized structures (e.g. have each bucket point to an array, and have each array reallocated at twice the size if it gets full) than it would be to have a hash table which is large enough to avoid collisions. The latter approach might manage to only require looking at one data item for each lookup, but nearly every single lookup would result in a cache miss. Each cache line could hold multiple hash-table entries, but most of the time all but one would be blank, so the effective capacity of the cache would be limited. By contrast, if the hash table were more dense, a bigger portion of it (or perhaps the whole thing) could fit in cache. The increased hash-collision rate for the smaller tables could be more than offset by the improved cache performance.
 
Of course, variations in architecture and database size may cause a larger table to be better. Even if a particular size of cache would be optimal for machine X, a larger one may be optimal on both machine Y which has a larger cache (that could thus handle a larger table without too many extra cache misses) or machine Z with a smaller cache (which would have mostly cache misses even with the smallest practical table). Or the database could get so large that the extra time spent linear-searching each item would get to be more than the time required for a direct lookup that causes a cache miss.
 
I miss the days when computers executed code in nicely-predictable fashion; actually, I do a lot of programming for such machines (embedded controllers, generally). On the other hand, the new systems do manage some pretty incredible performance.
GeneralRe: What are your thoughts? How can I improve this article?memberTheArchitectualizer14 Jul '09 - 11:29 
Hash table actually have a load factor. When the load reaches this factor, the the table re-hashes and grows in size. This is why it's important to know how much load and what size to start out with. You then set the load factor by your expectations on load.
 
I think the default load factor is 70%. So if you have a hash size 100 after 70 items in are added to the hash it will re-hash to size 200 and so on. The only time there are collisions is when: 1. your hash function is no good. 2. you are adding the an object to the hash with a duplicate key. Hash functions are guaranteed to have no collisions as long as the key is unique.
 
Re-hashing causes performance problems, this is why it is so important to 'size right' and set the load accordingly.
 
The hash for a given key is calculated based on the size and load of the table. So when the hash scales to a larger size, it re-indexes all the keys.
 
modified on Wednesday, July 22, 2009 10:40 PM

GeneralRe: What are your thoughts? How can I improve this article? (Add More Examples)memberTheArchitectualizer13 Jul '09 - 8:39 
Okay I am going to add some examples of for:
 
Instrumentation
IA + Instrumentation
Data Loading
 
They will be posted to this code base with the addition of the MVVM refactor.

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web02 | 2.6.130516.1 | Last Updated 16 Feb 2010
Article Copyright 2009 by dawright, Proto-Bytes
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid