Algorithms

Luc Pattyn19-Dec-08 7:25

19-Dec-08 7:25

You could replace 65521 (the largest prime in range) by 65413 (the next prime) with the same result, forgoing the extra counters/flags. Intuitively I agree summing powers mod prime would hold all the information, and extracting it seems not obvious.

However IMO all that is way too complex. Here is mine:

you can accumulate the sum, that's good enough for 1 missing/extra number.

For two numbers, you can accumulate the sum and also the product, like so:
- threat zero separately (with a simple flag);
- use a double ("d0") to multiply all numbers from 1 to 65535
- use a double ("d2") to multiply all numbers present
- rescale both to prevent overflow (say divide both by 65536 each time d0 exceeds 65536)
- finally divide d0 by d2 (or d2 by d0) to get the product.
- even though d0 and d2 are both inaccurate (they lost most of the digits), their quotient
accurately represents the product of the missing (or extra) numbers.
With a 32-bit mantissa you can discriminate all possible products that have the same sum, the
extreme case is 65532*65535 against 65533*65534.
Having sum and product, a simple equation generates the two numbers.

For three numbers (say x y z):
- accumulate the sum x+y+z
- accumulate the product xyz
- accumulate the product (x+1)(y+1)(z+1)
- the products need 48-bit mantissa, a double still is OK
- a simple substitution yields xy+yz+zx
From these three (xyz, , xy+yz+zx, z+y+z) you can find x, y and z either analytically,
or by trial-and-error: just try all possible values of z (1..65535), and solve what remains
for x and y. Most z values will have no solution, you end up finding six permutations of the
one real solution.

For four numbers, you need a 64-bit mantissa, which a double does not offer; so you need a higher-
precision math package. Then accumulate products and two shifted products, in the end leading to
a fourth-order set of equations for which I haven't figured out yet if there is an acceptable
way to solve it.

Smile | :)

Luc Pattyn [Forum Guidelines] [My Articles]

Fixturized forever. Confused | :confused:

supercat919-Dec-08 8:56

19-Dec-08 8:56

Your approach for n=3 seems to devolve into trial and error; the same sort of trial and error would allow for a solution using the (mod 65521) maths. Using a smaller modulus would require having more flags to handle the numbers between the modulus and 65535.

A nice (but sub-optimal) bit-based approach for handling the two-number case would be to keep sixteen words, holding the xor of all the items in the list with bit 0 set, all the items with bit 1 set, etc. At least one of the numbers must have at least one bit set which is clear in the other number, and so that number may be read out easily.

For the case of n=3, a similar approach could work except that instead of doing an xor, one would have to hold a count of how many numbers had the various bits set. If there is exactly one number missing which had a particular bit set, the number may be easily identified. With three numbers, though, it's possible that a number of bits will be present in two missing numbers and none in exactly one. This situation may be resolved, however, by nothing that there must be exactly three missing numbers, and so if there are three different patterns of missing bits they must derive from the missing numbers. I don't quite know how to describe the analysis algorithmically, but it works out.

I think a similar approach will work with n=4, though it starts to get messier. I have no idea about n=5 and beyond.

Luc Pattyn19-Dec-08 9:36

19-Dec-08 9:36

supercat9 wrote:
Your approach for n=3 seems to devolve into trial and error; the same sort of trial and error would allow for a solution using the (mod 65521) maths. Using a smaller modulus would require having more flags to handle the numbers between the modulus and 65535.

I tend to disagree. First of all there is an analytical solution for one root of a cubic equation;
second, when a trial loop over z is executed, what is left is a simple quadratic equation, so each z-value needs an IsSquare(discriminant) test, which is much more straightforward than your modulo-prime arithmetic.

I did look into xoring the numbers for n=2 earlier, but decided against it; the product being very
easy to handle. I am not sure I see how xoring helps you for n=2 or higher. For n=2 if the two
missing numbers are identical except for 1 bit, then its easy, otherwise...
Just try numbers 0-7 with missing 2 and 3 (easy), 2 and 5 (now what?).

Smile | :)

Luc Pattyn [Forum Guidelines] [My Articles]

Fixturized forever. Confused | :confused:

supercat921-Dec-08 12:28

21-Dec-08 12:28

For the n=2 case, if there were 16 xors produced (xor of all numbers with bit 0 set, with bit 1 set, etc.) then if the missing numbers were 2 and 5, then xor[0] and xor[2] would both be 5, while xor[1] would be 2. Easy.

If the numbers shared a bit but each had at least one bit the other lacked (e.g. the values were 3 and 5) then the xor's would have three non-zero values: xor[0] would be 6 (3 xor 5), xor[1] would be 3, and xor[2] would be 5. Simplest thing to do in that scenario is to ignore the non-zero xor value which does not have the 'indicated' bit set (i.e. xor[0] in this case, since the value 6 has bit 0 clear). The other two non-zero xor values are the missing numbers.

If one number is a subset of the other (e.g. 2 and 3), then one of the non-zero xors will have its corresponding bit clear (e.g. xor[1] will equal 1, which has bit 1 clear) and the other will have it set. The one that has the bit set is one of the missing numbers; xor that with the other one to get the other missing number.

For n=3, one will have a count of bit values, mod 4. Bits that are missing from none of the numbers or from all three will be trivial to place accurately. Bits that are missing from exactly one of the numbers will be easy to place; if any such bits exist, the problem will then decompose to n=2. The only hard case is where three different bits are each used by exactly two of the missing numbers (e.g. if the missing numbers are 3, 5, and 6, then tally[0][2..0] = (1,1,2); tally[1][2..0] = (1,2,1); tally[2][2..0] = (2,1,1). The tally values make it possible to determine which bits are isomorphic; that in turn will make it possible to determine which numbers are missing.

cp987619-Dec-08 11:46

cp9876

19-Dec-08 11:46

Another approach:

run 17 sums: one simple sum (S) as before and then S0 is the sum of all numbers with bit 0 set, S1 is the sum of all with bit 1 set .. etc.

Now we can precompute the expected answers for S0 .. S15, call these E0 .. E15.

If there are two numbers missing, then we get a+b from E-S, and then by looking at En-Sn, these can take the values {0,a,b,a+b} and as a and b must differ in at least one place, a or b must appear. So we can always solve the 2 case.

It is possible to do 3 and 4 in most cases, and I suspect you can prove you can do 3 in all cases but someone may prove otherwise.

Peter
"Until the invention of the computer, the machine gun was the device that enabled humans to make the most mistakes in the smallest amount of time."

Luc Pattyn19-Dec-08 12:48

19-Dec-08 12:48

Very interesting indeed.

With numbers a, b and c missing, the difference between the 17 Ei and Si values could take the values
{0, a, b, c, a+b, a+c, b+c, a+b+c} which may or may not be 8 different numbers (a could equal b+c). Some of these should appear in the Ei-Si values; how many different numbers are we expecting? Not sure yet.

If a=arbitrary, b=arbitrary and c=0, then c would not help in providing different numbers, so
we would find at most {0, a, b, a+b}; we are sure to find a+b in E-S

If a=arbitrary, b=65535-a and c=0, then we would find at most {0, a, 65535-a, 65535}

Another point of view: if it were to always provide the unique solution for n=3, and n=4, and...
at what value of n would it start to fail, and why? Is 16 a relevant number, i.e. does widening
the numbers enable the algorithm to find more missing numbers? I guess not, since all the missing
nunbers COULD be in the range [0, 65535] however large the word width, so adding bits does not
strengthen the algorithm.

Smile | :)

Luc Pattyn [Forum Guidelines] [My Articles]

Fixturized forever. Confused | :confused:

supercat920-Dec-08 15:57

Re: Finding an extra/missing number [modified]

20-Dec-08 15:57

Just adding together numbers is apt to cause some confusion as bits carry from one place to the next. For n=3 or n=4 I would think it would be best to do something like:

/* For each number x read from the input */
for (i=0; i<16; i++)
{
  if (x & (2 << i))  /* Yes I know this isn't optimized */
  {
    count1[i] ^= (count0[i] & x);
    count0[i] ^= x;
  }
}

That will indicate whether a bit is set in 0, 1, 2, 3 (mod 4) of the missing numbers that have a particular bit set. The case where a bit is missing in all four numbers can be resolved by noting that there must be some other bit which does not appear in all four numbers, so the count0/1[] values for that bit will show that it's missing in at least some numbers.

cp987621-Dec-08 13:58

cp9876

21-Dec-08 13:58

Modify the process slightly to use one of supercat's ideas, as well as having 17 sums, also have 17 counts, so that you know how many are in each of the Sn sums. As well, this may not be necessary, but I'm lazy, so set a flag to detect 0.

For n=3, if 0 is missing treat as n=2, done. Otherwise if any of the Sn is only missing one number, then you get one missing number directly, simply add it back in then you can process as for 2. If any of the Sn is missing two numbers then as E-S gives you a+b+c then you can work out one number, add it back in and process for 2. Done. As the numbers are not all the same, all the sums can't be down by 0 or 3.

For n=4, if 0 is missing treat as n=3, done. Otherwise if any of the Sn is only missing one number, then you get one missing number directly, simply add it back in then you can process as for 3. Done. Otherwise if any of the Sn is missing three numbers, then you get one missing number from a+b+c+d -(a+b+c), so process as if one found. Done. Otherwise all are missing either 0, 2 or 4. Can anyone with more time than me show that under these conditions you can get three different pairs? My guess is that it is likely.

[edit] I guessed wrongly - {1,3,5,7} breaks it. [/edit]

Peter
"Until the invention of the computer, the machine gun was the device that enabled humans to make the most mistakes in the smallest amount of time."

modified on Sunday, December 21, 2008 9:42 PM

supercat922-Dec-08 9:58

22-Dec-08 9:58

1,3,5,7 would yield a total of 16 missing, a bit 0 subtotal of 16 (1+3+5+7), a bit 1 subtotal of 10 (3+7), and a bit 2 subtotal of 12 (5+7). Even the total and bit 0 subtotal suffice to show that the numbers are 1, 3, 5, and 7; since the bit 0 subtotal matches the total, all four numbers have to be odd. The smallest 4 odd numbers are 1, 3, 5, and 7; if any other numbers were chosen the total would exceed 16.

A breaking set of numbers would be {1,6,10,13) or (2,5,9,14). The carries from each bit position to the next serve to confuse things; using modulo arithmetic on each bit avoids that complication.

Mark Churchill22-Dec-08 18:55

Mark Churchill

22-Dec-08 18:55

Storing the list of "already found" numbers isn't a huge issue if you say encode them as runs. However the degenerate cases in this problem are quite nasty indeed. It wouldnt surprise me if this is actually equivalent to a restricted case of the subset-sum problem.

Mark Churchill
Director, Dunn & Churchill Pty Ltd
Free Download: Diamond Binding: The simple, powerful, reliable, and effective data layer toolkit for Visual Studio.

Alpha release: Entanglar: Transparant multiplayer framework for .Net games.

supercat923-Dec-08 6:55

23-Dec-08 6:55

Mark Churchill wrote:
Storing the list of "already found" numbers isn't a huge issue if you say encode them as runs.

After having read 32768 numbers from a set of 65536 possible values, there will be more than 2^65,527 possible states. No sort of run-length coding will get the worst-case (or even average-case) storage requirement below 65,527 bits (8191 bytes). The only way to make the problem workable is to rely upon the fact that the source of numbers also has considerable state information (no number will be read more than once). If there are four missing numbers, it's only necessary for the reader to distinguish (65536*65535*65534*65533)/4! different states at any time while reading the file (not counting the loop index).

Re: Question from my Phone Interview

stevepqr8-Jan-09 5:59

stevepqr

8-Jan-09 5:59

Calculate the index sum and the value sum for for the first index.

Compute the difference between the sums - if its zero then the first number is not missing.

Increment the index and recalculate the index sum and the value sum.
Again compute the difference between the sums and also the 2nd difference - if this value is zero then the first missing number has not been found, if the second difference is 1 then the current index is the first missing number.

Continue until the second difference is 2 then the index at this point is the second missing number. etc.

Only 3 variables required (index, index sum, value sum) differences are temporary storage and it works for any number of missing values in a single pass!!

Apathy Rules - I suppose...

Its not the things you fear that come to get you but all the things that you don't expect

C Change25-Jan-09 0:55

C Change

25-Jan-09 0:55

There are 100 unrepeated numbers, the sum of which will be (1 + 2 + 3 + ... + 100). The summation equivalent to factorial - not sure if there is a proper mathematical name for it. If you then add up all the numbers in your array, the difference between these two answers must be the "rogue" number. I've not tried it but I think that's it.

Edit: Ah! ... just realised there were loads of other answers on the next page ... so this is history not news D'Oh! | :doh:

What is Combinatorial Optimization?

alexdow13-Dec-08 9:39

alexdow

13-Dec-08 9:39

Papadimitriou and Steiglitz wrote the following [see citation, section 15.2]:

"A combinatorial optimization problem is thus the following form of computational problem.

"Given representations of the parameters... find the optimal feasible solution.

"We call this the optimization version of the problem. However, a combinatorial optimization problem can also be posed in the following, more relaxed, form.

"Given [the parameters]... find the cost of the optimal solution.

"This will be referred to as the evaluation version of a combinatorial optimization problem. It is immediate that, under the assumption that... [the cost] is not too hard to compute--the evaluation version of a combinatorial optimization problem cannot be much harder that the optimization version."

I omitted some notation and terms, the definitions of which were earlier in the chapter. As I understand it, they are saying that these two versions are pretty much the same, though the evaluation version may be a bit harder. I don't see how that could be. The evaluation version only seeks the cost of the optimal solution, whereas the optimization version seeks the optimal solution itself. In order to find that optimal solution, certainly you must need to compute the cost of it. Am I missing something?

Thanks,
Alex

Citation: Papadimitriou & Steiglitz, Combinatorial Optimization, Dover, 1982.

Re: What is Combinatorial Optimization?

Alan Balkany15-Dec-08 3:57

Alan Balkany

15-Dec-08 3:57

There may be a shortcut that lets you calculate the cost of an optimal solution (or at least prove some bounds on it) without actually calculating it.

This is similar to an existence proof in mathematics, where it's shown that a solution exists without actually finding it.

Some kind of LCS

aggressorus9-Dec-08 7:22

aggressorus

9-Dec-08 7:22

Hi,

I need to find some way to get the LCS for some special strings.

Imagine we have two CSV files (tables where , column delimiter)

1)
a b c,d e f

One row with 2 columns

2)

a,d
b,e
c,f

Three rows with 2 columns

When i'm trying to get the regular LCS i'm missing a lot of data:

a b c,d e f

a, d b,e c,f

LCS:
a b c f

But it must be totally equal, because three rows can be fitted in one.

Can someone suggest some algorithm, or some preprocessing method?

The main pain that there can be adds and deletes, so i need to find some longest path.

Thank you.

help me About Big O

bosszy7-Dec-08 2:00

bosszy

7-Dec-08 2:00

follow this algorithm ; What is there bigO

int sum = 0;
for(int i = 0 ; i < n ; i++){
      for(int j = 0 ; j < n*n; j++){
           sum += j;
      }
}

Thank you for Answer Wink | ;)

Luc Pattyn7-Dec-08 6:32

7-Dec-08 6:32

Hi,

you really should be able to figure it out yourself.
Just imagine you were to execute the code for some value of n and estimate the
number of important operations; and then for a value twice as large.
The ratio between those two "measures of complexity" should be visible in the big O.

Smile | :)

Luc Pattyn [Forum Guidelines] [My Articles]

Fixturized forever. Confused | :confused:

Paul Conrad9-Dec-08 11:37

Paul Conrad

9-Dec-08 11:37

bosszy wrote:
What is there bigO

You should really do your own homework. What do you think it is? You should really try it rather than waiting for an answer. The answer is really easy, and if you can't figure it out, then, well, don't know what to tell you.

"The clue train passed his station without stopping." - John Simmons / outlaw programmer

"Real programmers just throw a bunch of 1s and 0s at the computer to see what sticks" - Pete O'Hanlon

"Not only do you continue to babble nonsense, you can't even correctly remember the nonsense you babbled just minutes ago." - Rob Graham

Luc Pattyn9-Dec-08 13:13

9-Dec-08 13:13

Paul Conrad wrote:
don't know what to tell you

you don't know either? Big Grin | :-D

Luc Pattyn [Forum Guidelines] [My Articles]

Fixturized forever. Confused | :confused:

Paul Conrad9-Dec-08 13:15

Paul Conrad

9-Dec-08 13:15

Luc Pattyn wrote:
you don't know either?

Heck yeah, I know the answer to it (it is a polynomial), but not going to announce it for him to cheat on his assignment.

⁷³Zeppelin10-Dec-08 18:51

⁷³Zeppelin

10-Dec-08 18:51

See here[^].