

How many records are there, how big are the keys, and how big are the records? Do you have one, two, or more disk drives available for processing?
If the keys are small enough (and there are few enough of them) that they can all fit into memory, you should start by sorting the keys (each one accompanied by an integer giving the location in the original file). Then proceed as I described.
If the number of records so large that e.g. only 10% of the keys will fit into memory, then I would suggest that you come up with some means of partitioning the keys into, say, 65,536 buckets. It doesn't matter whether the distribution is particularly even, provided that no single bucket holds more than 10%, and preferably no more than 2% or so. Make a pass through the file and count how many keys fit into each bucket.
Once that is done, count how many buckets one could add, starting at the bottom, before they totaled 10% of the records. Make a pass through the original file reading into RAM all the records that fit into those buckets. Then sort them in RAM and write them out. Then repeat the procedure, starting the the bucket after the last one that was used in the first pass. Then sort those and write those out.
The exact procedure you should use will vary depending upon what your data looks like and the number of separate disks available. Nonetheless, the key observations are (1) it's often good to sort with records containing just key and a reference to the original record, since the number of such records that can fit in RAM is larger than the number of full records that would fit; (2) it's better to think in terms of reading through a whole file, fetching some data into RAM and ignoring other data, than to think in terms of grabbing lots of little pieces of data scattered through a file; (3) though I haven't touched on this much, for really big jobs, having two or three hard drives will help things a lot.





"Mathematicians in California could be in line for a $100,000 prize (£54,000) for finding a new prime number which has 13 million digits."
BBC article[^].
Looks like it was done using distributed computing.
...that mortally intolerable truth; that all deep, earnest thinking is but the intrepid effort of the soul to keep the open independence of her sea; while the wildest winds of heaven and earth conspire to cast her on the treacherous, slavish shore.





The title is misleading since huge is relative to only the current one. But still .





Lol, how long will htey need to read the number in the oficial presentation?
Regards.

M.D.V.
If something has a solution... Why do we have to worry about?. If it has no solution... For what reason do we have to worry about?
Help me to understand what I'm saying, and I'll explain it better to you
“The First Rule of Program Optimization: Don't do it. The Second Rule of Program Optimization (for experts only!): Don't do it yet.”  Michael A. Jackson
Rating helpfull answers is nice, but saying thanks can be even nicer.





Hello!
I'm working at a nonrecursive implementation of a scapegoat tree ( Scapegoat tree "partial" paper ). At section 6.2 there is summarized an implementation of a nonrecursive rebalancing algorithm. I think I almost got how I should use those 2 stacks, but my problem is how can I "plug" the nodes in the right(how should I tell if a node is connected to another?!) position. I think reduced the problem to the determination of height of a node in the rebalanced tree using the weight of the whole tree and the position of the node in the inorder traversal. The height determination should be only O(1) because the whole rebalance should be O(n) in time and O(logn) in space.
Thanks in advance!





can anyone suggest me an algorithm which is interesting to study?
and also if ever it has problems..
thanks..
i desperately need it





An algorithm to determine all the prime numbers up to some given limit. Start with Atkin's sieve. The problem with Atkin's algorithm is that if not coded properly it can be slower than the Sieve of Eratosthenes (an ancient algorithm).
...that mortally intolerable truth; that all deep, earnest thinking is but the intrepid effort of the soul to keep the open independence of her sea; while the wildest winds of heaven and earth conspire to cast her on the treacherous, slavish shore.





It all depends on what YOU are interested in. Tell folks what you are interested in, and some ideas might come about.
"The clue train passed his station without stopping."  John Simmons / outlaw programmer
"Real programmers just throw a bunch of 1s and 0s at the computer to see what sticks"  Pete O'Hanlon
"Not only do you continue to babble nonsense, you can't even correctly remember the nonsense you babbled just minutes ago."  Rob Graham





Simulated Annealing.
If the Lord God Almighty had consulted me before embarking upon the Creation, I would have recommended something simpler.
 Alfonso the Wise, 13th Century King of Castile.
This is going on my arrogant assumptions. You may have a superb reason why I'm completely wrong.
 Iain Clarke
[My articles]





Does that alter NP hard to NP soft?





It's interesting to study.
If the Lord God Almighty had consulted me before embarking upon the Creation, I would have recommended something simpler.
 Alfonso the Wise, 13th Century King of Castile.
This is going on my arrogant assumptions. You may have a superb reason why I'm completely wrong.
 Iain Clarke
[My articles]





Hi everyone,
I'm not sure if this is the correct spot to ask for help, and if it seems as if I am in the wrong place, I apologize in advance.
I'm tying to come up with a kind of algorithm or equation where I can enter a player's statistics for a completed game, such as attempted passes, completions, yards, touchdowns, interceptions, etc. Now you may be saying this is the quarterback "passer rating" for American Football, and it sort of is, but I'd like the calculation to be a little different.
What I'm trying to do is come up with a way where my friends and I can guess how the player will perform during the game, and based on whos "number" is closest to the actual "number" after the game is completed. Then, the losers buy the winner a drink, or something of that nature.
I'd like to be able to make sure certain statistical categories are weighted differently based on how important we think they are, and I'd also like the final number to somehow fall between 100 and 1000, just as an example  with a wide range that creates more possibilities.
Also, I want to do with with all of the offensive positions in the NFL, including quarterback, wide receiver, tight end, running back, and kicker. I realize they will all have different categories that weigh differently and interact with each other differently as well.
I'm hoping to find an example equation that is not extremely complicated, but something that most nonmath genious type people, like me, can understand.
Again, I feel weird posting this here, and I apologize if this is in the wrong place. Whoever is the admin can have my permission to delete it if it seems out of place.
Thank you all for your time. I'm hopiong this might be fun for you, hehe!
Pat





Sounds like you're talking about some kind of Bayesian statistic where you have a prior distribution (i.e. the weights that you and your friends will select). The simplest way to implement this would be some kind of weighted sum. So you would use an equation like:
Statistic = weight1*statistic1 + weight2*statistic2 + weight3*statistic3 + ...
The problem is that this statistic needs to be compared to the "true" statistic which you would have to agree on in advance. You would thus need to know how the NFL officially calculates their statistics and compare yours to theirs.
If you want to use your own statistic, then you need some way of estimating the true value from the sample. Your real problem lies in deciding what the "true" value is and how to estimate the weights in the above equation if you aren't going to use and official NFL stat.
...that mortally intolerable truth; that all deep, earnest thinking is but the intrepid effort of the soul to keep the open independence of her sea; while the wildest winds of heaven and earth conspire to cast her on the treacherous, slavish shore.





I have switched from Office 2003 to Office 2007 (Excel 2007). After moving InterpX function call arguments to column IW and beyond, it returns a "#VALUE!". I know Excel 2003 has a limit of IV columns so I'm guessing the Interp32.xll addin (InterpX function) must have a column limitation. Is there a version of this addin that can be used in Excel 2007?
Thanks,
Michael
michael.slipper@navy.mil
modified on Thursday, September 18, 2008 2:10 PM





Probably it's a builtin limitation since the interpolation function needs to access data on both sides of the column (i.e. for column IV it would need data in column IX and IW to do interpolation). Interpolation shouldn't be that hard, you can probably add some VB macro to handle the calculation for columns greater than IX.
...that mortally intolerable truth; that all deep, earnest thinking is but the intrepid effort of the soul to keep the open independence of her sea; while the wildest winds of heaven and earth conspire to cast her on the treacherous, slavish shore.





That doesn't ring true. The range of data consists of two columns, with a known value that lies within one column. The function should determine the interpolated value within the other column. Therefore I wouldn't expect that any function would be trying to grab data out of the "EXCEL" range you specified. ...and yes, interpolation is simple until you're doing it 5 million times in a spreadsheet... therefore the ".xll".





M.Slipper wrote: The range of data consists of two columns, with a known value that lies within one column. The function should determine the interpolated value within the other column.
That's extrapolation. Interpolation fills in missing values and you thus need a scenario similar to the following:
x1(known)  x(unknown)  x2(known)
So you need values on either side of the unknown value which is what I already explained. If in earlier versions of Excel there was a column limitation, your .xll would be coded to be aware that IX was the ultimate column possible and thus would not consider columns beyond IX  most likely the reason for the #VALUE error and the reason you continue to get the error even in spreadsheets (e.g. Excel 2007) that no longer have this limitation.
If excel can't handle it (and really, why are you trying to do 5,000,000 calculations using Excel??) write a c++ function. Then dump the data back into excel by calling the COM server. The other thing you might want to think about is how sparse is your data that you have to interpolate 5,000,000 times?
...that mortally intolerable truth; that all deep, earnest thinking is but the intrepid effort of the soul to keep the open independence of her sea; while the wildest winds of heaven and earth conspire to cast her on the treacherous, slavish shore.





Evidently I didn't make my explanation clear enough. I have two columns representing two separate data arrays. What I'm trying to do is clearly interpolation, not extrapolation. I know a value that lies within two adjacent data points within one of the arrays. I'm trying to interpolate between the two corresponding points in the other array...again interpolation, not extrapolation. Clearly simple stuff, but not if it can't be done in one Excel cell per calculation, and not fast enough if it can't be done by a function in an ".xll". the 5 million was an exaggeration to make you aware that doing these multiple calculations in VBA would require too much computational time. The InterpX function in the Interp32.xll add worked just fine in Excel 2003. The issue is having the arrays in Excel columns beyond column IV.
Example:
The following works fine in either Excel 2003 or Excel 2007:
=InterpX(C$670,Report!$IU$734:$IU$765,Report!$IV$734:$IV$765,TRUE)
The issue comes into play when one /or both of the arrays are moved beyond column IV.
Example:
=InterpX(C$670,Report!$IW$734:$IW$765,Report!$IX$734:$IX$765,TRUE)





To work around the column limitation can you link to a column in another sheet? Send row IX data to Sheet2 column A1, for example and use that in the interpX call?
...that mortally intolerable truth; that all deep, earnest thinking is but the intrepid effort of the soul to keep the open independence of her sea; while the wildest winds of heaven and earth conspire to cast her on the treacherous, slavish shore.





I've thought of that, but my Excel workbook already has well over 20 sheets. It's getting way too hard to manage, that's why I upgraded to Excel 2007. I had already run out of fonts in Excel 2003, and I thought the additional fonts & columns in 2007 would be a godsend. There has to be a way to upgrade the Interp32.xll addin with an updated InterpX that will allow the use of the additional columns available in Excel 2007. Hopefully the Gentleman (JChampion) that created Interp32.xll will read my frustration and apply his expertise to upgrade the addin. I'm sure the "X" number of scientists and engineers using Excel beyond its limited capabilities will be appreciative.





I haven't had a chance to download and look at the source, but the limitation could very well be hardcoded. You can email the author directly (look for one of his posts and then click the Email link) or you can also post a reply under the article (it should notify him). Maybe he can shed some more light on the limitation problem. I seem to be out of solutions/suggestions at the moment.
...that mortally intolerable truth; that all deep, earnest thinking is but the intrepid effort of the soul to keep the open independence of her sea; while the wildest winds of heaven and earth conspire to cast her on the treacherous, slavish shore.





Thanks....
The link to the article & source is provided below
http://www.codeproject.com/KB/macros/InterpolationAddin.aspx?df=100&forumid=25034&exp=0&select=1094573





Had a quick look at the source. There's quite a few places in there where the limits are defined as 255. I think this line from Xlcall.h is the important one though:
#define xlUDF 255
From MSDN:
xlUDF
Calls a userdefined function (UDF). This function allows a DLL to call Visual Basic for Applications (VBA) userdefined functions, XLM macro language functions, and registered functions contained in other addins.
[...]
Zero or more arguments to the userdefined function. When you are calling this function in versions earlier than Excel 2007, the maximum number of additional arguments that can be passed is 29, which is 30 including pxFnRef. In Excel Microsoft Office 2007, this limit is raised to 254, which is 255 including pxFnRef.
Looks like it could be hardcoded into Excel itself. Now, I'm not an expert on this but if you are trying to pass cell references beyond the 255 column limit this could be what is causing the problem.
...that mortally intolerable truth; that all deep, earnest thinking is but the intrepid effort of the soul to keep the open independence of her sea; while the wildest winds of heaven and earth conspire to cast her on the treacherous, slavish shore.





This Call seems to have corrected the max number of arguments issue but not necessarily a column limitation... Excel 2003 and versions before have 256 columns (0:255). Excel 2007 has 16384 columns and I expect that it is the limitation keeping InterpX from working. So I guess you're on the right track. Maybe there is another 255 imbeded in the code elsewhere.
Also, there is a microsoft site
http://msdn.microsoft.com/enus/library/aa730920.aspx[^]
that addresses some issues with "Developing Addins (XLLs) in Excel 2007".
If I had the time and inclination I would tackle this issue, but with my current workload that isn't possible. If you know how to contact the original author of the InterpX function & Interp32.xll article (JChampion), maybe he would again tackle the mods needed to enable his Interp32.xll to work in Excel 2007.
Again, thanks for any and all your help.
Michael Slipper
michael.slipper@navy.mil
modified on Tuesday, September 23, 2008 5:32 PM





"Mathmaking seems the opposite of automatic, which is why scientists long thought it had nothing to do with our ancient, preverbal sizeemup ways. Yet a host of new studies suggests that the two number systems, the bestial and celestial, may be profoundly related, an insight with potentially broad implications for math education. "
NY Times article[^].
An interesting article from the New York Times about the human capacity for mathematics and its implications for math education.
...that mortally intolerable truth; that all deep, earnest thinking is but the intrepid effort of the soul to keep the open independence of her sea; while the wildest winds of heaven and earth conspire to cast her on the treacherous, slavish shore.




