 |
|
 |
If you would like to share any new Big O Functions to add to the function library post them to this thread.
Thanks, ~TheArch
modified on Wednesday, July 22, 2009 10:24 PM
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
Know bugs as of 22JUL2009:
1. The first time an algorithm is tested the heuristic graph does not display. Workaround: Right click on the empty graph and click 'Refresh'. Need ideas from the community on how to fix.
2. When switching from one algorithm to another an exception is thrown. Need ideas from the community on how to fix.
3. (Unknown) Instrumentation works. I have not tested this feature yet. It's a 'ToDo'.
4. Hick-up in timebase for testing algorithms. Workaround: If needed you can use the Max (time) property in the Analyzer class to remove any ernios hick- ups. I did not add the workaround because I don't think this is a good fix. I have a request into a MSND newsgroup for a solution to the problem. I think what is necessary is that an exception trap must be created when the OS interrupts the executing code. However I have not been able to find how to trap an OS interruption. Temp Fix: I will add a check bow to disable the performance graph if checked in the next increment.
5. Rounding problem with Derrek's WPF Graph. This is most certainly cause by the way I use modulos to calculating the point scale. This reduces the performance graph to only working correctly when the infinity point is divisible by 100. Workaround: If you care evaluating lower order infinity points such as the Exponential Recursive Fib (infinity reached in time >~70), you could add a disable performance graph test if the infinity point is not divisible by 100. I think this is an okay workaround and will add it to the next increment when I add the instrumentation examples.
6. The combo list box for Big O function does not do anything at all. The proof of concept prototype I envisioned this feature to allow the user to select an initial best guess function. This is how development works sometimes; you have an idea, write the code, and the idea seems not as necessary as was thought. Fix: After evaluating higher order infinity points I realized this feature would be necessary after all. For example: The Liner Recursive Fib evaluated at IP: 1000 sometimes return Big O(n2). The author claims it is linear. There is only a slight variance in the correlation of .000002 between O(n) and O(n2), and really close variance on O(n log n) and o(log n). I'm going to add a default threshold of .000002 and allow this to be edited. If the user selected Big O O(n) in this situation the threshold would be broken and Big O(n) would be reported. I will also add documentation in the Statistical Data tab to indicate the threshold was broken and a different function was selected. I will list all functions that are close to the threshold in the 'Best Guess' results.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
Please let me know if there is something I am doing wrong, or something which needs improvements. I will make any changes necessary.
Thanks.
modified on Wednesday, July 22, 2009 10:28 PM
|
| Sign In·View Thread·PermaLink | 5.00/5 (1 vote) |
|
|
|
 |
|
 |
1. I'm afraid I don't get the purpose of the application. I get that given a function that takes an integer, it shows a graph of its execution time against the integer value. How does that tell you what the big O notation is? Sure, you could tell a O(n) function from a O(log n) function, but what about other functions that take polynomial time? O(n log n)? 2. Only few algorithms can be translated into a function that takes an integer. How would you analyze the running time of a sorting algorithm, for example? Quick sort, for example, has worst case running time if the input array has the same elements - how'd you figure that out programmatically using a timing approach? 3. I had a quick look at your code and noticed a few strange things. a. Why do you run at the highest priority? b. http://msdn.microsoft.com/en-us/library/system.gc.keepalive.aspx[^] doesn't disable garbage collection - it merely prevents the reference from being collected if there are no references from the code
|
| Sign In·View Thread·PermaLink | 5.00/5 (2 votes) |
|
|
|
 |
|
 |
S. Senthil Kumar wrote: 1. I'm afraid I don't get the purpose of the application. I get that given a function that takes an integer, it shows a graph of its execution time against the integer value. How does that tell you what the big O notation is? Sure, you could tell a O(n) function from a O(log n) function, but what about other functions that take polynomial time? O(n log n)?
'The purpose is to find the Big O. I am going to complete the heuristics this weekend for finding the 'closest' Big O.' The tools will not magically guess what the limitations of the algorythm are. It simple helps finding the relationships to a known Big O. for example if you look at the exponential recursive fib function you would notice that below 26(n) or so it changes from O(n) to O(n^2) so you could write the function as:
O(n) < 26 = O(n^2) > 26
Also the linear fib functions show they really are not linear they are closer to O(n log n).
S. Senthil Kumar wrote: 2. Only few algorithms can be translated into a function that takes an integer. How would you analyze the running time of a sorting algorithm, for example? Quick sort, for example, has worst case running time if the input array has the same elements - how'd you figure that out programmatic using a timing approach?
'This is where instrumenting the function is necessary. You would load the data type according to your data profile, and create a data series at each target loading profile. Other things which are useful through instrumenting are: graphing the relationships of things like: Object pile up and garbage collection, growth of thread stack size, etc.'
S. Senthil Kumar wrote: a. Why do you run at the highest priority?
'This blocks all threads from interruption except the OS threads. I will add some configuration options in the next increment to change some of these settings.'
S. Senthil Kumar wrote: b. http://msdn.microsoft.com/en-us/library/system.gc.keepalive.aspx[^] doesn't disable garbage collection - it merely prevents the reference from being collected if there are no references from the code
'Yes, this is correct. I disable the collection of objects on the instrumented execution so the object heap can be analyzed. I will add configuration options in the next increment.'
Thanks for the questions, hope you got the right answers.
modified on Wednesday, July 22, 2009 10:29 PM
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
TheArchitectualizer wrote: I disable the collection of objects on the instrimented execution
But you do have a reference to the object in your code, so the GC.KeepAlive call isn't necessary.
TheArchitectualizer wrote: 'This blocks all threads from intruption except the OS threads.
The GC thread is not an OS thread. If that gets blocked and the function you are measuring creates objects, then there's a good chance your app will run out of memory very soon.
TheArchitectualizer wrote: You would load the data type acording to your data profile, and create a data series at each target loading profile.
But without looking at the algorithm, how will you choose the data profiles? How'll you know what input will get you best, average and worst case running time? If you've already figured that out, it's just a small step to find the running time in terms of Big O.
TheArchitectualizer wrote: O(n) < 26 = O(n^2) > 26
If f(n) is O(g(n)), you're expressing the fact that f(n) grows at or less than that rate at which g(n) grows. The constants don't matter (in theory). Therefore your function is O(n^2). For small values of n, most functions look linear anyway.
That said, I can see one practical use for this - it helps find the number x beyond which the algorithm starts to take too much time.
TheArchitectualizer wrote: The purpose is to find the Big O
Hmm, I'd be impressed if you can pull it off with this approach. You are essentially trying to guess the internals of an algorithm by looking at its running time.
How do you plan to deal with cases where a single invocation takes a long time - say 1 hour?
|
| Sign In·View Thread·PermaLink | 5.00/5 (1 vote) |
|
|
|
 |
|
 |
S. Senthil Kumar wrote: Hmm, I'd be impressed if you can pull it off with this approach. You are essentially trying to guess the internals of an algorithm by looking at its running time.
How do you plan to deal with cases where a single invocation takes a long time - say 1 hour?
S. Senthil Kumar wrote: But you do have a reference to the object in your code, so the GC.Keep Alive call isn't necessary
Well yes I do have a strong reference, but the executing code may create references to other objects.
S. Senthil Kumar wrote: The GC thread is not an OS thread. If that gets blocked and the function you are measuring creates objects, then there's a good chance your app will run out of memory very soon.
'Right, but other threads in the target could cause erroneous results. The OS threads can't be blocked. ie. when the OS does something like a page fault. I'm blocking everything in the application domain.'
S. Senthil Kumar wrote: But without looking at the algorithm, how will you choose the data profiles? How'll you know what input will get you best, average and worst case running time? If you've already figured that out, it's just a small step to find the running time in terms of Big O.
'Who said I wasn't going to look at the algorithm? I plan to recompile the target using Reflections.Emit.
S. Senthil Kumar wrote: If f(n) is O(g(n)), you're expressing the fact that f(n) grows at or less than that rate at which g(n) grows. The constants don't matter (in theory). Therefore your function is O(n^2). For small values of n, most functions look linear anyway. '
That said, I can see one practical use for this - it helps find the number x beyond which the algorithm starts to take too much time.
'Yes at this stage of prototype development this is all it is good for. Hopefully the next increment will have more value'
'Yeah, this is a heuristic method. Using both Engineering Heuristics and Computer Science Heuristics, and maybe a little human.
One approach to finding long running functions is just to let it run. I plan on adding a method of running the heuristic at different levels of complexity and have a break point at which execution time is seen as almost infinite. From the low level runs a close relationship can be found to approximate where the curve extends. This is how heuristics works, it's not an exact mathematical proof, rather aids in discovering a proof.'
modified on Wednesday, July 22, 2009 10:30 PM
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
TheArchitectualizer wrote: Well yes I do have a strong refrence, but the executing code may create refrences to other objects.
Ok, so how does that make you call GC.KeepAlive on the strong reference you have?
TheArchitectualizer wrote: 'Right, but other threads in the target could cause ernious results. The OS threads can't be blocked. ie. when the OS does something like a page fault. I'm blocking everything in the application domain.'
Thread priorities are maintained machine wide, so it's not just your app domain. My point is that setting your thread priority to the max might cause the GC and finalizer threads to be not scheduled, and that would be disastrous.
|
| Sign In·View Thread·PermaLink | 5.00/5 (1 vote) |
|
|
|
 |
|
 |
S. Senthil Kumar wrote: Thread priorities are maintained machine wide, so it's not just your app domain. My point is that setting your thread priority to the max might cause the GC and finalizer threads to be not scheduled, and that would be disastrous.
'I think you have a good point. Most of my research on GC is from working on Java for years. I guess it's a bad assumption to think it works the same in .net. I have never seen a parallel in .net to Sun's Java Language Specification. So I do not have total domain knowledge for .net. This is as close as possible solution to what I am accustom to in Java.'
modified on Wednesday, July 22, 2009 10:32 PM
|
| Sign In·View Thread·PermaLink | 5.00/5 (1 vote) |
|
|
|
 |
|
 |
Heuristics are now in the code base. The tool correctly finds the Big O function for a given know set of Big O functions.
Big O Functions implemented: Linear: O(1) O(n)
Log: O(n log n) O(log n) O(log log n)
Exponential: O(n2) O(n3) O(nn)
Factorial: O(n!)
I do not know how to correctly implement the following: O(α(n)) O((log n)c) O(nc) O(nc)
If any one can provide solutions for these functions I will add it to the code base.
Tips:
When evaluating a possible O(1) function, higher order Infinity Points must be used, additionally running the test multiple times will improve the best guess.
The above is also generally true for all functions, however the ones I tested at infinity point 100 returned the correct result about 80% of the time.
Oh yes, The project uses the Meta.Numeric Scientific lib by dawright. You will have to download the code from CodePlex. dawright helped alot, so I gave him credit for the article as a co-author.
~TheArch
modified on Wednesday, July 22, 2009 10:35 PM
|
| Sign In·View Thread·PermaLink | 5.00/5 (1 vote) |
|
|
|
 |
|
 |
First of all, you have introduced your own standard for target audience/skill level. This is a fairly confusing muddle, and is at odds with CPs own rating system.
Second point - why MVP? MVVM is well on the way to becoming the de-facto standard in WPF development. Would you mind elaborating on why you chose MVP over MVVM (and yes, I appreciate they appear very similar, but MVVM is closer to the passive presenter pattern, whereby the presenter is not responsible for any of the model-sync work).
"WPF has many lovers. It's a veritable porn star!" - Josh Smith As Braveheart once said, "You can take our freedom but you'll never take our Hobnobs!" - Martin Hughes. My blog | My articles | MoXAML PowerToys | Onyx
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
Pete O'Hanlon wrote: First of all, you have introduced your own standard for target audience/skill level. This is a fairly confusing muddle, and is at odds with CPs own rating system.
'Sorry didn't intend to be at odds with CP. In my articles I try to address developers, architects, and designers. I simply index the article for those audiences. If an architect is browsing and seems that the article only has a 1.67 (Golden Ratio) rating for an article of mine it give an indication of who my target audience is.
The article is index according to CP: Architect, Dev, QA, and Design. But has no way indicating relevance. '
Pete O'Hanlon wrote: Second point - why MVP? MVVM is well on the way to becoming the de-facto standard in WPF development. Would you mind elaborating on why you chose MVP over MVVM (and yes, I appreciate they appear very similar, but MVVM is closer to the passive presenter pattern, whereby the presenter is not responsible for any of the model-sync work).
'Hmm, Okay good point. I honestly have not done any research on presentation patterns since 2008. I will look at MVVP MVVM today. I was going to do some refactoring on this project any way, before I produce the next increment.'
modified on Wednesday, July 22, 2009 10:36 PM
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
OK - MVVM is well described by Josh Smith and Karl Shifflett - and I have an article here[^] on CP that uses MVVM exclusively (and as it's a fairly clean implementation, extraneous details don't get in the way).
"WPF has many lovers. It's a veritable porn star!" - Josh Smith As Braveheart once said, "You can take our freedom but you'll never take our Hobnobs!" - Martin Hughes. My blog | My articles | MoXAML PowerToys | Onyx
|
| Sign In·View Thread·PermaLink | 5.00/5 (1 vote) |
|
|
|
 |
|
 |
Pete O'Hanlon wrote: OK - MVVM is well described by Josh Smith and Karl Shifflett - and I have an article here[^] on CP that uses MVVM exclusively (and as it's a fairly clean implementation, extraneous details don't get in the way).
Thanks, I will look at it. I'm looking at the MSDN article right now. They name Martin Fowler as the source of Presentation Model(PM), and that MVVM uses the same abstractions.
WPF Apps With The Model-View-ViewModel Design Pattern[^]
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
Okay,
Got the basics going to give it a try. It's much more complicated than MVP. I am going to provide a link for the MVP prototype for less experienced readers. I'm also going to add a "Dynamic View/ViewModle" for configuration. I will post the update to this code base, and use it in the following increments.
Thanks for the suggestion. 
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
Hmm,
The MVVM Patter is really difficult to implement. Would you like to give me a hand. I don't understand the routed event model very well.
I have refactored about 80% the only thing left to do the eventing wire up. I will give you total credit if you like.
The other option which I think is better from a design stand point is to use the new Behavior model in Blend 3. What are your thoughts?
Thanks! ~TheArch
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
I'm not quite sure I see the practical usefulness of discovering the supposed O(n) of a function, since on modern machines secondary effects like CPU caching have an increasing impact on execution time. A program which accesses memory in cache-friendly fashion may execute ten times as many instructions as one which accesses data more randomly, and yet still run faster. If it's necessary for any data to get swapped out to disk, even increasing instruction count 1,000 fold may be worthwhile if it reduces the amount of swapping.
Unfortunately, I don't know any really good way of figuring out how to make programs run optimally except to test out various implementations and see how they fare with data sets of the expected size and "texture".
|
| Sign In·View Thread·PermaLink | 4.00/5 (1 vote) |
|
|
|
 |
|
 |
supercat9 wrote: I'm not quite sure I see the practical usefulness of discovering the supposed O(n) of a function, since on modern machines secondary effects like CPU caching have an increasing impact on execution time. A program which accesses memory in cache-friendly fashion may execute ten times as many instructions as one which accesses data more randomly, and yet still run faster. If it's necessary for any data to get swapped out to disk, even increasing instruction count 1,000 fold may be worthwhile if it reduces the amount of swapping.
'I don't know. They teach Big O in the Ninth grade now. I figured that I should learn Big O since the new crop of developers will all be using it. 'IA' tests the bounds of an algorithm under a certain circumstance. Instrumentation + 'IA' allows the algorithm to be tested for specific conditions like you are describing. It is useful to explore the design of the algorithm using different technologies. For for example: I have x running on y, how does x run if I use Parallel Extensions on z?
Random access is faster than sequential or cache-friendly fashion esp. if you use the correct data type like a hash.
Increasing the instruction count 1,000 fold might make a different if the factor is executed in a loop. 1,000 quickly becomes 10^9. '
supercat9 wrote: Unfortunately, I don't know any really good way of figuring out how to make programs run optimally except to test out various implementations and see how they fare with data sets of the expected size and "texture".
'Maybe you are missing the point, this is what this tool is designed to do. Test the bounds of an algorithm to find the optimal conditions.
I'm refactoring for MVVM right now, I will add examples of instrumentation and data bounds checking.'
modified on Wednesday, July 22, 2009 10:38 PM
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
Random access is faster than sequential or cache-friendly fashion esp. if you use the correct data type like a hash.
Actually, hash tables are a common example of what I'm talking about. In many situations, it may be faster to use a hash table which is small enough that it has some collisions, and then use an efficient means of handling those in cache-localized structures (e.g. have each bucket point to an array, and have each array reallocated at twice the size if it gets full) than it would be to have a hash table which is large enough to avoid collisions. The latter approach might manage to only require looking at one data item for each lookup, but nearly every single lookup would result in a cache miss. Each cache line could hold multiple hash-table entries, but most of the time all but one would be blank, so the effective capacity of the cache would be limited. By contrast, if the hash table were more dense, a bigger portion of it (or perhaps the whole thing) could fit in cache. The increased hash-collision rate for the smaller tables could be more than offset by the improved cache performance.
Of course, variations in architecture and database size may cause a larger table to be better. Even if a particular size of cache would be optimal for machine X, a larger one may be optimal on both machine Y which has a larger cache (that could thus handle a larger table without too many extra cache misses) or machine Z with a smaller cache (which would have mostly cache misses even with the smallest practical table). Or the database could get so large that the extra time spent linear-searching each item would get to be more than the time required for a direct lookup that causes a cache miss.
I miss the days when computers executed code in nicely-predictable fashion; actually, I do a lot of programming for such machines (embedded controllers, generally). On the other hand, the new systems do manage some pretty incredible performance.
|
| Sign In·View Thread·PermaLink | 4.00/5 (1 vote) |
|
|
|
 |
|
 |
Hash table actually have a load factor. When the load reaches this factor, the the table re-hashes and grows in size. This is why it's important to know how much load and what size to start out with. You then set the load factor by your expectations on load.
I think the default load factor is 70%. So if you have a hash size 100 after 70 items in are added to the hash it will re-hash to size 200 and so on. The only time there are collisions is when: 1. your hash function is no good. 2. you are adding the an object to the hash with a duplicate key. Hash functions are guaranteed to have no collisions as long as the key is unique.
Re-hashing causes performance problems, this is why it is so important to 'size right' and set the load accordingly.
The hash for a given key is calculated based on the size and load of the table. So when the hash scales to a larger size, it re-indexes all the keys.
modified on Wednesday, July 22, 2009 10:40 PM
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |
|
 |
Okay I am going to add some examples of for:
Instrumentation IA + Instrumentation Data Loading
They will be posted to this code base with the addition of the MVVM refactor.
|
| Sign In·View Thread·PermaLink | |
|
|
|
 |