|
The whole point of the article is to say “they are wrong, and I’m the only one who knows the Truth, but I won’t tell you”.
No algorithm explanation provided.
Code provided is hard to read and even harder to understand what’s going on. Variable names like “array1” or “words2” aren’t really informative.
Algorithm complexity estimation does not include time required to prepare all the supplemental data (like words inversion) or extract the final result.
|
|
|
|
|
Thanks for your comment.
You nailed me on "they are wrong", but I didn't intend "they", just "he", I also gave credit to him when he gave an idea that helped me improve my design. Nor did I intend to sound like the one voice in the wilderness shouting out the truth in tongues so you wouldn't understand me.
Figuring out what to include in the article, vs letting you read the code is tough too.
The point of the code was to write a puzzle solving routine and run tests against the code, both to verify the code works as intended and to see how it performs, if it performs as expected, and how fast it does it.
The puzzle supplies an array of characters in which you need to find the supplied words. Sorry, thought that the names were obvious. The combination of those names identifies the parts of the puzzle, and the numbers identifies the version. When I started, I hadn't designed a naming convention so 1 was as good as any. Later I thought about dating the puzzles so 304 means David supplied that puzzle on March 4th. I don't intend to carry on copying puzzles for years.
This was not a well-planned out code design, where who's affected and how were taken into account, nor was a team of people working on the naming convention used. The lifetime of the code wasn't taken into consideration either. I admit it is a sloppy, very poor design for long-term support by a variety of members in the team.
As far as algorithm. I did describe the dictionary's algorithm as NlogN for loading and logN for retrieving data. In this code there is a cost in loading the dictionary but as a "gut" feel the definition of the dictionary's cost is insignificant compared to retrieving it in the array multiple times.
The "real" cost is going through the array, searching
I hope I did get across that in general all the versions from front to back and back to front are stored, so in general the keys will be twice the number of characters supplied in the search words and then is cut down to just the "unique" definitions needed.
Sorry, I never mentioned in the zip file, the "txt" files are generally results of running the code and most are "older" ,so the data shouldn't be the same and the format may not be either because I stored them as I was developing the code.
By the way, array2 is a copy of the array provided by the referenced author and isn't a "wonderword" puzzle.
So array1 has a 15X15 array of characters:
SASLOBMYSSTIURF
TPPRIESTSGROWLL
UPSGGBITENOTODA
NLEFNROVIDSVODM
NEEWINEAEUEYDTE
OSDPTRHSGATRSHN
IDSRSMEUIIAAHGV
TEEDARAPLHETKIE
ARTSOGOICFYRSNR
VESSRCTROMANCET
IUEMUROMOPARTYU
TQVNECELTICEKAM
LNRFMVILLAGERSN
UOADOOFPROTECTU
CCHTREESILVANUS
That you need to find 43 words like
Stir and Apples:
SASLOBMYSSTIURF Stir x=8, xd=-1, yd=1 Apples x=1, xd=0, yd=1 Symbols x=8, xd=-1, yd=0 Fruits x=14, xd=-1, yd=0 Fruit x=14, xd=-1, yd=0 Flamen x=14, xd=0, yd=1
Stir is the ninth character (x=8) on the above line, it moves horizontally left (xd=-1) and down (yd=1) and Apples is the second character and reads vertically both Fruits and Fruit starts on the last character and reads Left (Symbols does too.) They both exist because this is output from a concatenated puzzle that has maybe 800 search words and another one wanted to find Fruit but Flamen was definitely intended to be part of this puzzle (I think check word1) because it is an unusual word.
Note in the output I provide stats too:
I didn't find Stir on this line, I found it on the forth line down in reverse order: ...RI, RIC, RIG, RIN, RIS, RISE, RISK, RIT...
I had to search for R, RI, and RIT before I got a key with one word in it.
RIT is a reverse key lookup, but I bet a buck RIS is a forward key look-up with two words in it and I have to go through the entire keyed list to find those two words. RIT is the last dictionary value it found and then just had one more character in the keyword struct.
In this case, earlier it says
"Loop counts level 0: 4950, 1: 19800, 2: 59580 Array size: 4950, Search words: 855 words found: 1155 Characters in words: 5339 Search Keys: 2472"
level 0 is the number of characters stored in the array: 4950 divide that by 225=22 concatinated puzzles and probably isn't the number of puzzles your zip text files has because this grows as I add more as time goes on.
I got "...RI, RIC..." because as you go on, it lists the 2,472 keywords it kept and used.
19,800/4=4,950 No surprise because all 26 starting characters are keyed, this counts the starting characters that are keyed and advance to the 4 direction loop whether it goes forward or not. 59,580 is a combination of key lookup and continued searches. (S in Stir is counted even when it isn't keyed if it is found.) So, false starts and successful searches are counted. A false start has at least the second character to match, but won't count when the next character in that direction isn't found.
So many duplicate words found has to mean that I appended the same puzzle(s) multiple times.
Also, I did warn you about how easy (not) it is to read because I mentioned bubble and quick sort methods and how the second is much more efficient, but much harder to read.
This code has much more comments than I am used to seeing, did they help you out at all?
|
|
|
|
|
VadimAlk wrote: Algorithm complexity estimation does not include time required to prepare all the supplemental data (like words inversion) or extract the final result.
When giving an Algorithm complexity estimation you don't say, this part of the code is going to be 10%, 2 is 25%, 3 is 15% and 4 50%
The main ratings are N, NlogN, and N^2, that's your Algorithm complexity estimation.
If part of your code is N^2 and another part is NlogN, the estimation is N^2, if part is NlogN and another part is N, it's NlogN.
Instantiating a keyword is technically an N process, it first goes through every character and makes it upper case (N=# characters in string) using a built-in process
Looking up an individual character in a string is 1, converting to uppercase is 1, putting it in stringbuilder is 1, doing that to N characters is N, I'm not sure but I think converting it into a string is 1. Since all the words are about the same length, creating an array of keywords is N. (creating one keyword costs the same on average as every other one.) When I create a dictionary of string keys with a list of keywords, I actually create n*2 strings for each keyword and load each one into a dictionary. Loading it into a dictionary is NlogN.
I create a second dictionary, go through the list of keywords and copy from front to back, and back to front. find the keyword and if it has 1 value stop looking. Again an NlogN process. The net effect is that I have a list of key strings that are pretty consistently 3 times the number of search words left in the dictionary. Now I go through the array, looking in 4 directions from each starting point. That's N but N2 vs N1. Now I look up a second key logN2 until I find the last key and convert to an N process.
The time complexity is NlogN even if it is N1logN1, N2logN1 then converts to N3
|
|
|
|
|
Maybe we are talking about different things, but what do you denote under (N) in this circumstances?
On the input we have 2-dimentional array of size (A x A) plus (B) arrays [words] of variable length (let’s call the average length of the word – C). All three variables (A, B and C) would affect the speed of the calculation and therefore should be presented in complexity formula.
The computational complexity of the program provided is roughly:
B * C^2 * Log(B) * Log(C) + A^2 * Log(A) * Log(B)
|
|
|
|
|
Technically we are talking about a one dimensional array of type string and another one dimensional array of type string. Since the code is working with individual characters within the string like it is a two dimensional array, I'm not going to quibble about that. If you double both length and width, you quadruple the size that the code is working with. Viewed that way, yes it is N^2. We are NOT evaluating the size of the problem, we evaluating the PROCESS. If you quadruple the size of the problem and the time quadruples, the process is N, NOT N^2.
With a bubble sort, you double the length of an int array, the time it takes to process is quadruple the time for the half sized array. There is nothing funny about the size of the int fields passed that causes the sort to take longer, it's the process and how it handles it that determines the complexity. When I show the statistics, I show the total number of characters in the array and the words. In the words I also show the total number of words because that is related to how many items are being tracked against the array. In the linear article each word is evaluated at every location, so you double the words and double the size of the array, you quadruple the processing time. 2 words, 20 characters in the array, 40 operations. 4 words, 40 characters 160 operations. (Yes boundary conditions change these numbers, but not significantly.)
OK when you create the first dictionary of values, the number of string keys in the dictionary is less than twice the number of characters in all the words. Don't know how many, don't care. The number of keyword objects stored in the lists ARE twice as many. With NWw (number of words in word list) the keys written to the dictionary isn't predetermined, the number of keys written to initially will generally be twice the length of the word. Each word will also try to store in twice the length, but some keys will be shared. When I was debugging the keys before reducing size, it was generally over 4 times the number of words. After reduction the keys are less than three per word. Again, writing to a dictionary is an NLogN process. I would expect adding a keyword to an existing keyed list counts the same. The second dictionary is less than 3 times the number of words, this time it stops copying to the second dictionary as soon as the list reduces to one keyword.
When I was testing, I never changed the width of the array, so adding length increased the size linearly with the number of lines. At one point I had increased the length 11 times. Multiplying the log of the larger set of keys by 11 and dividing by the log of the smaller set of keys, the expected time increase was slightly over 18 times and what the heck, it took slightly over 18 times as long. Other tests weren't so satisfactory. So, just theory and one test support the NlogN process.
Anyway one N is the number of characters in the array and the other N is the number of keys in the dictionary. The amount of work creating the dictionary is insignificant to the number of times the dictionary is accessed going through the array. The main work is NAlogNW. NA is the number of characters in the array, NW is the final number of keys retained in the dictionary for the words.
A is the number of characters in the array, B is the number of keys in the dictionary. It is immaterial that there are 4 directions being checked for A, it is immaterial part of the process doesn't access the dictionary and does N amount of work (1 to 1 check of characters in the final word to the characters in the array.) The O complexity remains NlogN, there never is a complexity calculation like B * C^2 * Log(B) * Log(C) + A^2 * Log(A) * Log(B)
The complexity is 1, N, N^2, logN, or NlogN. I don't know any other way to state it. The dimension of the array is immaterial, the number of characters in the array is one N, the number of keys in the dictionary is the other N. It is certainly a little incredible to me that the keys are less than 3 times the number of words defined in the keys, especially when the first word will produce twice as many keys as characters in the word.
Just know that THE main amount of work is going through the array and retrieving the keys, failing to retrieve a key, finding the final key that has ONE word in the list and a minor additional verification that the ONE word's characters all match the characters in the array.
|
|
|
|
|
|
I have had absolutely NO formal training in Big O notation. What I know about it is what I've read about it on the web. and there is NEVER conflicting information available there.
Wickipedia wrote: The sort has a known time complexity of O(n^2), and after the subroutine runs the algorithm must take an additional 55n^3+2n+10 time before it terminates. Thus the overall time complexity of the algorithm can be expressed as
T(n)=55n^3+O(n^2).
Here the terms 2n+10 are subsumed within the faster-growing O(n^2). Again, this usage disregards some of the formal meaning of the "=" symbol, but it does allow one to use the big O notation as a kind of convenient placeholder.
Note that I altered the quote for the special character 2 that didn't copy so well here. I fully understand subsuming 2n+10, but I don't understand why "+O(n^2)" didn't get subsumed too, because if n is 1000, who cares about a million when you are comparing it to 55 billion? That's 55.001 billion, or more accurately 55.00100201 B. (And then, it IS only an estimate. ) The whole article is hard for me to read since it's been 40 decades since I really used algebraic equations. Even then, = was NOT order dependent so switching the order would not change the meaning, though the example was perfectly clear on how and why it could be order dependent. I certainly didn't know enough to know why the equations couldn't be swapped.
Why anyone would pick a bubble sort routine when the quick sort has a known complexity of NlogN is beyond me. (Going by memory 150K took 1.6 minutes vs. 4 milliseconds. Sorting 150M with the quick sort took 49+ seconds.) 55 is also irrelevant. 55 what? N^3 is really the only relevant value in the cost. If N=100 costs 10 seconds, N=1000 costs 10,000 seconds (About 13 minutes short of 3 hours. You need to divide 10 and 10,000 by 55 to find the cost if N^3 wasn't done 55 times, but the ratio of the cost is still cubed.)
Also subsuming processes that aren't relevant to the overall cost makes sense to me. In this case, N1 is the number of characters in the array and N2 is 3 times the number of words to search. The process of my routine is then N1LogN2. I don't care what dimensions you pass in the array, N1 is the number of characters I have to process. In estimations, edge cases are ignored, so when I bypass looking in a direction because the shortest word would go beyond the boundary, that isn't counted in any estimate because it is, overall, irrelevant to the cost.
|
|
|
|
|
If it helps, I am continuing to prove I can't prove anything about the performance of my code except it doesn't follow my expected behavior for it. Nor can I explain its behavior.
I wrote a second version of the routine where I removed the write statements and the counter variables and output the dictionaries and byte array I created back to the calling code. (Like a real version of code instead of a test version.) I am not running the stopwatch during these tests.
These are all sequentially run. I started the test with an array of (15X60) characters and then incremented by 15 rows (225 characters) for each following test. Time is loaded into a list and so are the numbers, Size is the number of characters in the array, Words are the number of search words, KeyCnt is the number of keys defined from the words, LogN is the power of 2 needed to search through the keys. IE 2^9 is 512 so the dictionary can find or retrieve one of 468 keys in 9 steps. FYI 2^12 = 4096.
Note that the other "Linear time" solution allows a maximum of 2,500 characters.
Time: 00:00:00.0313997, Size: 900, Words: 160, KeyCnt: 468, LogN: 9
Time: 00:00:00.0154774, Size: 1125, Words: 214, KeyCnt: 622, LogN: 10
Time: 00:00:00.0156245, Size: 1350, Words: 248, KeyCnt: 728, LogN: 10
Time: 00:00:00.0157654, Size: 1575, Words: 291, KeyCnt: 849, LogN: 10
Time: 00:00:00.0154845, Size: 1800, Words: 331, KeyCnt: 955, LogN: 10
Time: 00:00:00.0156250, Size: 2025, Words: 370, KeyCnt: 1067, LogN: 11
Time: 00:00:00.0156250, Size: 2250, Words: 409, KeyCnt: 1181, LogN: 11
Time: 00:00:00.0156266, Size: 2475, Words: 444, KeyCnt: 1287, LogN: 11
Time: 00:00:00.0156244, Size: 2700, Words: 487, KeyCnt: 1411, LogN: 11
Time: 00:00:00.0312506, Size: 2925, Words: 525, KeyCnt: 1512, LogN: 11
Time: 00:00:00.0470131, Size: 3150, Words: 568, KeyCnt: 1642, LogN: 11
Time: 00:00:00.0311107, Size: 3375, Words: 606, KeyCnt: 1749, LogN: 11
Time: 00:00:00.0312505, Size: 3600, Words: 606, KeyCnt: 1749, LogN: 11
Time: 00:00:00.0468733, Size: 3825, Words: 643, KeyCnt: 1860, LogN: 11
Time: 00:00:00.0312516, Size: 4050, Words: 675, KeyCnt: 1952, LogN: 11
Time: 00:00:00.0467241, Size: 4275, Words: 710, KeyCnt: 2054, LogN: 12
Time: 00:00:00.0312489, Size: 4500, Words: 746, KeyCnt: 2149, LogN: 12
Time: 00:00:00.0312472, Size: 4725, Words: 788, KeyCnt: 2275, LogN: 12
Time: 00:00:00.0468755, Size: 4950, Words: 819, KeyCnt: 2372, LogN: 12
Time: 00:00:00.0313909, Size: 5175, Words: 853, KeyCnt: 2476, LogN: 12
Time: 00:00:00.0468300, Size: 5400, Words: 888, KeyCnt: 2569, LogN: 12
Time: 00:00:00.0467224, Size: 5625, Words: 924, KeyCnt: 2669, LogN: 12
Time: 00:00:00.0468750, Size: 5850, Words: 924, KeyCnt: 2669, LogN: 12
Time: 00:00:00.0781238, Size: 6075, Words: 961, KeyCnt: 2776, LogN: 12
Time: 00:00:00.0468749, Size: 6300, Words: 991, KeyCnt: 2872, LogN: 12
Time: 00:00:00.0781266, Size: 6525, Words: 1034, KeyCnt: 2994, LogN: 12
Time: 00:00:00.0625005, Size: 6750, Words: 1074, KeyCnt: 3107, LogN: 12
Time: 00:00:00.0781243, Size: 6975, Words: 1109, KeyCnt: 3206, LogN: 12
PS I noticed 6 back the number of words didn't change (924), yup, I appended the same puzzle twice.
|
|
|
|
|
Well, made a little progress. Finally noticed the increments were always about 15 ms. Loaded the program on my old machine which had an older VS. Couldn't run the new code on the old machine until I installed .NET 4.5. Still 15 ms increments. Compiled on my 2010 VS and got better increments but still inconsistent results:
Time: 00:00:00.0150009, Size: 900, Words: 160, KeyCnt: 468, LogN: 9
Time: 00:00:00.0080004, Size: 1125, Words: 214, KeyCnt: 622, LogN: 10
Time: 00:00:00.0060003, Size: 1350, Words: 248, KeyCnt: 728, LogN: 10
Time: 00:00:00.0070004, Size: 1575, Words: 291, KeyCnt: 849, LogN: 10
Time: 00:00:00.0110006, Size: 1800, Words: 331, KeyCnt: 955, LogN: 10
Time: 00:00:00.0090005, Size: 2025, Words: 370, KeyCnt: 1067, LogN: 11
Time: 00:00:00.0180011, Size: 2250, Words: 409, KeyCnt: 1181, LogN: 11
Time: 00:00:00.0130007, Size: 2475, Words: 444, KeyCnt: 1287, LogN: 11
Time: 00:00:00.0140008, Size: 2700, Words: 487, KeyCnt: 1411, LogN: 11
Time: 00:00:00.0150008, Size: 2925, Words: 525, KeyCnt: 1512, LogN: 11
Time: 00:00:00.0240014, Size: 3150, Words: 568, KeyCnt: 1642, LogN: 11
Time: 00:00:00.0180011, Size: 3375, Words: 606, KeyCnt: 1749, LogN: 11
Time: 00:00:00.0220012, Size: 3600, Words: 643, KeyCnt: 1860, LogN: 11
Time: 00:00:00.0350020, Size: 3825, Words: 675, KeyCnt: 1952, LogN: 11
Time: 00:00:00.0220013, Size: 4050, Words: 710, KeyCnt: 2054, LogN: 12
Time: 00:00:00.0420024, Size: 4275, Words: 746, KeyCnt: 2149, LogN: 12
Time: 00:00:00.0260015, Size: 4500, Words: 788, KeyCnt: 2275, LogN: 12
Time: 00:00:00.0450025, Size: 4725, Words: 819, KeyCnt: 2372, LogN: 12
Time: 00:00:00.0300017, Size: 4950, Words: 853, KeyCnt: 2476, LogN: 12
Time: 00:00:00.0580033, Size: 5175, Words: 888, KeyCnt: 2569, LogN: 12
Time: 00:00:00.0310018, Size: 5400, Words: 924, KeyCnt: 2669, LogN: 12
Time: 00:00:00.0450025, Size: 5625, Words: 961, KeyCnt: 2776, LogN: 12
Time: 00:00:00.0520030, Size: 5850, Words: 991, KeyCnt: 2872, LogN: 12
Time: 00:00:00.0350020, Size: 6075, Words: 1034, KeyCnt: 2994, LogN: 12
Time: 00:00:00.0670038, Size: 6300, Words: 1074, KeyCnt: 3107, LogN: 12
Time: 00:00:00.0540030, Size: 6525, Words: 1109, KeyCnt: 3206, LogN: 12
Time: 00:00:00.0400023, Size: 6750, Words: 1151, KeyCnt: 3308, LogN: 12
30 15X15 arrays concatenated, 15X450, is 6,750 characters
Ratios based on numbers and theory.
(6300*12)/(1125*10)=6.72
(6525*12)/(1125*10)=6.96
(6750*12)/(1125*10)=7.2
Actual ratios based on time
670038/80004=8.375
540030/80004=6.75
400023/80004=5
Gotta have much larger puzzle numbers to remove the time variance problems and see if the code settles into predicted performances.
|
|
|
|
|
You may be right about the complexity being more complex than NlogN, but I can't figure out what it is and sometimes the NlogN calculation is almost spot on.
I built a random string routine that would return N strings that were 3 to 13 characters long. I gave up on getting enough real puzzle combinations of words and array that would start to show the NlogN increase I expected, so I just doubled the array size and used the random routine to double the number of strings. With the real strings, the number of words substantially reduced with each increment because so many words were reused. (47 words were used in the last puzzle, but 37 were new. 7425 characters in the array.)
Time: 00:00:00.0782659, Size: 6975, Words: 1194, KeyCnt: 3429, LogN: 12
Time: 00:00:00.0937493, Size: 7200, Words: 1234, KeyCnt: 3533, LogN: 12
Time: 00:00:00.0623589, Size: 7425, Words: 1261, KeyCnt: 3614, LogN: 12
Time: 00:00:00.1873698, Size: 14850, Words: 2519, KeyCnt: 6724, LogN: 13
Time: 00:00:00.5156492, Size: 29700, Words: 5032, KeyCnt: 12745, LogN: 14
Time: 00:00:01.1093771, Size: 59400, Words: 10024, KeyCnt: 25507, LogN: 15
Time: 00:00:02.9218735, Size: 118800, Words: 19912, KeyCnt: 50239, LogN: 16
Note that I threw out the time measurement for the last "real" puzzle because it missed out on both ends of the 16 MS error range. Real times don't have parenthesis around them, the ones that do are the expected time ratios. The following are rounded to two place accuracy:
1873698/937493=2
(14850*13)/(7200*12)=2.23
5156492/1873698=2.75
(29700*14)/(14850*13)=2.15
1.1093771/.5156492=2.15
(59400*15)/(29700*14)=2.14
2.9218735/1.1093771=2.63
(118800*16)/(59400*15)=2.13
Checking to see if natural logs change the last answer. (nope)
50239 ln=10.825
25507 ln=10.147
(118800*10.825)/(59400*10.147)=2.13
|
|
|
|
|
You may be right about the complexity being more complex than NlogN, but I can't figure out what it is and sometimes the NlogN calculation is almost spot on.
Well, big O notation describes the limiting behaviour of a function, i.e. worst possible scenario.
Effectively it guarantees that for ANY input data you’ll get result not slower than {your Big O}.
Moreover, this is a theoretical limitation of the algorithm, not code. Real program will follow the Big O estimation only on really big data (we are talking about millions of samples), when details of particular implementation and execution (OS thread’s granularity, time cost of memory allocation, random OS events stealing your execution time and so on) become irrelevant.
|
|
|
|
|
VadimAlk wrote: Moreover, this is a theoretical limitation of the algorithm, not code.
OK, just to be sure, I looked it up: "In mathematics, an algorithm is a defined set of step-by-step procedures that provides the correct answer to a particular problem." (Pretty much what I thought.)
In any venue, an algorithm is a step by step process. You can't get any more exacting in describing your algorithm's process than code. The Linear process link follows another exacting algorithm that just happens to be N^2 in big O notation (BigO is a description of the result of following an algorithm.)
VadimAlk wrote: Real program will follow the Big O estimation only on really big data That's interesting. I'm not even close to millions and the estimates close in on NlogN as I double the code, some samples process FASTER than NLogN. (We're still in the "subject to timing error" section of testing.) And of course he would have to modify his logic because it will only go up to 50 in either direction at this time (2,500 characters max.) My last test was a 15 by 8,400 array
Throwing out the additional calculations (Moving the pointers, comparing values, etc.) the best the Linear process could do is almost 25 seconds with my device. (I'm randomly picking 3 to 13 character strings and I averaged the word length to 5 characters)
Real stats:
Time: 00:00:00.0985822, Size: 7650, Words: 1296, KeyCnt: 3709, LogN: 12
Time: 00:00:00.0934091, Size: 7875, Words: 1332, KeyCnt: 3808, LogN: 12
Time: 00:00:00.2056951, Size: 15750, Words: 2663, KeyCnt: 7092, LogN: 13
Time: 00:00:00.5244971, Size: 31500, Words: 5317, KeyCnt: 13454, LogN: 14
Time: 00:00:01.2674775, Size: 63000, Words: 10585, KeyCnt: 26983, LogN: 15
Time: 00:00:02.8451531, Size: 126000, Words: 20999, KeyCnt: 53067, LogN: 16
Some calculations:
2.8451531/1.2674775=2.245 (Actual ratio)
32/15=2.133 (NlogN theory)
2.133*16/15=2.275 (In theory, just adding second logN for creating dictionary, not cutting it down later)
1.2674775/.5244971=2.417 (Note that doubling the amount substantially reduced the ratio closer to my theory of the algorithm's efficiency.)
30/14=2.143
5244971/2056951=2.55
28/13=2.154
2056951/934091=2.202
26/12=2.167
2056951/985822=2.087
126000/15=8400
Estimate for "Linear" (really N^2) process. Number of characters in array, times directions, times average word length, times the number of words to search:
126000*4*5*20999=52,917,480,000
Estimated FLOP per second on my machine:
2^31 = 2,147,483,648
My machine never comes close to that because any loop takes 2 FLOP to process. I would be amazed if it processed in 50 seconds, you might be lucky and get the code to run in 50 seconds if your rating was twice as fast as mine.
52917480000/2147483648=24.6
|
|
|
|
|
Well, I extended it into the beginning millions, that's about the limit of my willingness to test how long it takes. Instead of getting closer to linear which NlogN does, doubling the size almost tripled the time. That's better than N^2 but not much better.
Time: 00:00:00.0878191, Size: 8100, Words: 1359, KeyCnt: 3881, LogN: 12
Time: 00:00:00.2080606, Size: 16200, Words: 2715, KeyCnt: 7199, LogN: 13
Time: 00:00:00.5169059, Size: 32400, Words: 5422, KeyCnt: 13707, LogN: 14
Time: 00:00:01.2398773, Size: 64800, Words: 10800, KeyCnt: 27460, LogN: 15
Time: 00:00:02.9396989, Size: 129600, Words: 21430, KeyCnt: 53858, LogN: 16
Time: 00:00:07.0300787, Size: 259200, Words: 42201, KeyCnt: 100235, LogN: 17
Time: 00:00:17.3356945, Size: 518400, Words: 82250, KeyCnt: 188889, LogN: 18
Time: 00:00:47.7689043, Size: 1036800, Words: 158012, KeyCnt: 371541, LogN: 19
Time: 00:02:19.3798683, Size: 2073600, Words: 299750, KeyCnt: 734074, LogN: 20
|
|
|
|
|
Here are test times followed by comments in parentheses
Time: 00:00:00.1249272, Size: 9000, Words: 1450, KeyCnt: 4126, LogN: 13 (40 15X15 puzzles)
Time: 00:00:00.1025983, Size: 9900, Words: 1600, KeyCnt: 4529, LogN: 13 (44 puzzles)
Time: 00:00:00.1115712, Size: 10350, Words: 1680, KeyCnt: 4757, LogN: 13 (46 puzzles, last test using real data)
Time: 00:00:00.2931676, Size: 20700, Words: 3356, KeyCnt: 8861, LogN: 14 (Start of double size, random strings)
Time: 00:00:00.7321922, Size: 41400, Words: 6698, KeyCnt: 17083, LogN: 15
Time: 00:00:01.5926357, Size: 82800, Words: 13335, KeyCnt: 34239, LogN: 16
Time: 00:00:03.7760567, Size: 165600, Words: 26396, KeyCnt: 65767, LogN: 17 (consistently near triple time)
Time: 00:00:09.0619847, Size: 331200, Words: 51772, KeyCnt: 121039, LogN: 17
Time: 00:00:23.5500326, Size: 662400, Words: 100409, KeyCnt: 231045, LogN: 18
Time: 00:01:07.5761955, Size: 1324800, Words: 192046, KeyCnt: 457319, LogN: 19
Time: 00:03:13.0995981, Size: 2649600, Words: 363108, KeyCnt: 895087, LogN: 20
I was thrown a bit when the time ratios were edging towards 3 times longer when the puzzle size was doubled and the words almost doubled. (Asked for double, but random strings rejected in each double as duplicate.)
Then I remembered, "fail early, fail often" was a precept of good design. I had built this design with the thought that the random appearing data in the array was built so that only a few words would intersect because there really was meaningful data in the array.
By increasing the words to search for in a random manner, it also increased the false paths available. Since the last key count is edging towards double the unique 4 character sequences (457K) it is probable that almost every direction is checking against 5 character matches before failing, and it is finding all three character words defined. Just the second double size should be finding a substantial percent of the unique three character sequences. Anyway, it is NOT failing early and often anymore.
Note that if you changed the boundary of the "linear" solution to 140,000 X 15 instead of 50 X 50 just summing an average of 5 characters per word and not making any other calculations (ie changing x and y positions to find the character) would take nearly 2 hours. I wouldn't be surprised if the actual execution would take weeks or months.
The following calculations are all possible character sequences based on single direction comparisons, not the actual dual direction comparison my code makes and cuts the following counts in half:
26^3=17,576 26^4=456,976 26^5=11,881,376 26^6=308,915,776 26^7=8,031,810,176
26^13= 26^6*26^7= 2,481,152,873,203,736,576
|
|
|
|
|
This doesn't have anything to do with the price of beans in China, because the latest increases are too high but it is interesting comparing this instance of the last real puzzle time to the last bogus increase in size that I used:
Time: 00:00:00.1174948, Size: 10575, Words: 1703, KeyCnt: 4826, LogN: 13
Time: 00:03:34.3048642, Size: 2707200, Words: 368705, KeyCnt: 909030, LogN: 20
2707200/256= 10575
0.1174948*256*7-180=30.55
The last actual (bogus puzzle) time was 3:34.3, from the last actual data, the predicted time for the bogus data is 3:30.55
(7 is LogN: 20 minus LogN: 13, N is 256 times bigger, 180 seconds is 3 minutes. I am also using the wrong math because 20/13 is about 1.5, not 7 times bigger so this isn't at all a valid time prediction for NlogN.)
|
|
|
|
|