Add your own alternative version
Stats
37.1K views 344 downloads 34 bookmarked
Posted
4 Jun 2017

Comments and Discussions


Peter:
This subject is of personal interest to me and I suspect to many others as well. For what it worth, here are things I would do if I were the author of this document. I fully agree that this is your document and you should call make your own decisions I am only a reader trying to under stand what you've written.
1 The first thing I would do is address the issue of this being a long and somewhat complex
document, so I would work on reducing the length and getting rid of nonessentials like 'less than or equal to'. The people reading this document understand basic math so use <= .
2 streamline your sentences fo example:
A common use case of RNDINTRANGE is to simulate die rolls. For example, "to simulate rolling a sixsided die, generate a random number from 1 through 6 by calling RNDINTRANGE(1, 6)"
to..
"To simulate rolling a sixsided die, call RNDINTRANGE(1, 6)"
Percentagewise this is a huge reduction in words with no loss of clarity.
Another example:
The following method, RNDU01(), generates a random number 0 or greater and 1 or less:
METHOD RNDU01()
return RNDINT(X) / X
END METHOD
Why not convert this to " RNDINT(X) / X generates a random number in [0,1] "
Where you explain what [0,1] means versus (0,1] and (0,1) in "Notation and Definitions "
Also delete trivia like:
* true and false are the two Boolean values.
* sin(a), cos(a), and tan(a) are the sine, cosine, and tangent of the angle a, respectively, where a is in radians.
* asin(a), acos(a), and atan(a) are the inverse sine, inverse cosine, and inverse tangent of a, respectively.
* pow(a, b) is the number a raised to the power b.
* abs(a) is the absolute value of a.
* sqrt(a) is the square root of a.
Do you honestly believe anyone likely to read your document doesn't know what these these items mean ??
There are hundreds of places in your document that could be reduced in wording and by doing so would collectively reduce the size of your document and make it easier to read.





This is really a good paper that can be greatly improved by changing some elements of the writing style and correcting some errors. It is not very easy to read.
A few specific examples:
1. change the wording "0 or greater and less than .." that endlessly appears in the document to " >= 0 and < .." The former greatly slows down the reading of this document
2. The authors definition of "PDF" in the "Notations and Definitions" section is incomprehensible to the average reader.
Why not write "for a continuous CDF function, PDF(x) = First derivative of the CDF(x)" which is correct and easy to understand.
3. The Fdistribution, Tdistribution and their noncentral versions are not 'less common distributions' as the author implies; rather they VERY extensively used on a daily basis by those involved in data analysis, simulation studies and numerous research, scientific, and engineering applications
4. after the section "Notation and Definitions used in this document"
The author should point out, that to generate a random variable from any distribution whose CDF is known, one first generates a uniform random variable( say X) then calculates Y= ICDF(X) where ICDF is the inverse CDF function corresponding to CDF
I have been following the various iterations of this document, and each iteration is better than the previous but it is still hard reading





 I use the term "0 or greater and less than X" rather than, say, "between 0 and X" or "from 0 to X", because those two forms are ambiguous. Which of the following do you think is best?
 "0 or greater, but less than X"
 "from 0 to X, but not X"
 "from 0 up to but not including X"
 "greater than or equal to 0 and less than X"
 "greater than or equal to 0, but less than X"
 Some other form.
 The issue will be addressed in the next version.
 The list of "other" distributions became unwieldy, so I separated the list into two, one of which contains the "most common" distributions. The classification is not intended to be a value judgment on those distributions in particular situations.
 I will incorporate this into the section "Generating Random Numbers from an Arbitrary Distribution".
Do you have any other suggestions to make this document easier to read?





Technically speaking, every single number generation algorithm you described is a pseudorandom number generator (PRNG). In fact, every single number generation algorithm in existence is a PRNG, even all the cryptographic secure ones. The one and only exception is the (explicit twosource extractor and resilient function), which you did not describe in your article. Their paper was unsurprisingly titled, "Explicit TwoSource Extractors and Resilient Functions".
Another PRNG you could have covered is the excellent and very simple PCG PRNG. An example of a PCG PRNG is:
typedef struct { uint64_t state; uint64_t inc; } pcg32_random_t;
uint32_t pcg32_random_r(pcg32_random_t* rng) {
uint64_t oldstate = rng>state;
rng>state = oldstate * 6364136223846793005ULL + (rng>inc1);
uint32_t xorshifted = ((oldstate >> 18u) ^ oldstate) >> 27u;
uint32_t rot = oldstate >> 59u;
return (xorshifted >> rot)  (xorshifted << ((rot) & 31)) }
Yet another very useful PRNG is the Galois or Fibonacci sequence Linear Feedback Shift Register (LFSR), which is used for satellite transmissions and for error correction algorithms.





You probably meant to comment in my other article, "Random Number Generator Recommendations for Applications", not here. Describing the underlying nature of an RNG is outside the scope of this document.





I didn't mean to comment on any other article other than this one.
If any the algorithms you described actually outputted random values, there would be no need for any of them except that one algorithm that could actually output truly random values. So the question is, how effective is each algorithm at generating almost random numbers? You (or someone) needs to make statistical comparisons of the outputs of each and every algorithm so a programmer can then compare or judge one algorithm to another for effectiveness at generating "random" numbers in their application.
None of the algorithms you describe in your article are cryptographically secure, and while a cryptographically secure PRNG may be cost and time prohibitive to predict, they can be cracked. There is only one truly random number generator algorithm in existence, and since it is truly random it is impossible to predict, and that algorithm is called the explicit twosource extractor and resilient function.





All of the randomnumber generation algorithms in this document assume the existence of an underlying random number generator (as I define it in the introduction) of some kind  and this document doesn't describe how that generator works or, in general, even whether that generator is a twosource extractor, a pseudorandom number generator, or some other kind. Not only you know that "costprohibitive" does not mean "impossible" to predict, but I know that, as well.
In general, the methods in this document do not assume any particular quality of the underlying RNG.
modified 4Sep17 2:34am.





Just remember, since none of the algorithms you described are actually random, for a given input they will always give the same output.





Excellent article with extensive coverage of this subject area. Also very well written
Easily gets 5 star rating.
My only negative comment is the article should specify mode detailed pseudocode for RNDINT(N)





I've included a detailed section on how RNDINT(N) can be implemented.





Peter, have you any observations on employing a reservoir sampling algorithm to select a subset of items from a large dataset, please? I’ve not used reservoir sampling myself, but it appears to be a neat way of dealing with a large amount of data without the need for a correspondingly large amount of memory.





Thanks for the excellent article. It seems to me that the initial if statement of your Shuffle method may be redundant as the while loop will not run unless size(list)>=2 .





If the list is empty, i might be 1, which may cause issues if i is implemented using an unsigned type available in certain programming languages.





Thanks, Peter. I was thinking in terms of signed integers. +5 from me.






The BoxMuller example is already adequately explained in the comments to that method's pseudocode. Also, next time, type your comment in the message, not just in the title, which is meant to summarize the comment only.






General News Suggestion Question Bug Answer Joke Praise Rant Admin Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

