Algorithms

Sanmayce12-May-14 0:18

Sanmayce

12-May-14 0:18

Thanks, I checked all the links you gave me, also searched for LZ, LZSS, compression in article names.
Strange, not a single article on LZ/LZSS, not to mention fast, let alone fastest.
You see Mr.MacCutchan, I want to share a stand-alone file-to-file compressor written in C featuring fastest decompression on textual data. I don't see it as utility but rather as an illustration/sample how to decompress some needed text-like data at speeds 18%-40% of memcpy().
The focus I want to be precisely on DECOMPRESSION, I don't want to be dragged into mumbo-jumbisms as "How do I compress? The LZSS is not explained. This is not an article. Poor description and similar complains."
Seeing some fellow members' articles on LZW and Huffman I see my approach in different light, I like benchmarks real-world ones, disassembly listings and comparisons with the best decompressors today.
That's why I ask for some preemptive feedback/directions.

Personally, it is annoying for me to want to share some code etude and instead of making some interesting place/topic for sharing ideas/benching with fellow coders to receive complains "THIS ARTICLE IS NOT GOOD", I am not a journalist to write articles, after all the CP logo says 'For those who code' not 'write articles'.

My point, new ideas/improvements are to be easily shareable, I think.

Richard MacCutchan12-May-14 1:24

12-May-14 1:24

If you are going to write then write, rather than discussing the what and how in these forums. If your article does not meet the requirements of the site then you have to accept the feedback, and either improve or change it as necessary.

image processing toolkits | batch image processing

C^hris L^osinger23-May-14 3:12

C^hris L^osinger

23-May-14 3:12

Sanmayce wrote:
I don't want to be dragged into mumbo-jumbisms as "How do I compress?

it's unlikely that many people are going to be familiar with the LZSS algorithm. and people are going to ask you for the compression side of it. so, i'd suggest including a simpler compressor with a description of the algorithm. you have to at least get the readers up to the point where they can follow your discussion of the decompressor. otherwise, you're starting the story in the middle.

The Bessel-Overhauser Spline interpolation - suitable values for the weight function

Sanmayce24-May-14 7:17

Sanmayce

24-May-14 7:17

Mr.Losinger,
have you seen the Wikipedia's article on LZSS: http://en.wikipedia.org/wiki/LZSS[^]

Now check my amateurish attempt to shed light on LZSS: http://www.sanmayce.com/Nakamichi/[^]

LZSS is simple enough yet writing an article is not easy, my wish is to contribute to the decompression part with one of the fastest (in top 3) etudes, benchmarks, C/Assembly snippets, not to define/explain the algorithm.

For a start, I need modern machine to test on, sadly I own only Core2, that's a nasty break for me since I use DWORD fetching (sucks on Core2), XMM and even ZMM which is so interesting to be seen what can bring to the table.
My naive expectations were that some fellow coders will help me at least with its benchmarking, sadly I see no interested coders on this topic.

One of my wishes is to show an etude, Nakamichi 'Sanagi' to be exact, 512bit targeted, luckily Intel C optimizer v14 (which I used) supports them but still no computer in my reach can run the executable.

"Tianhe-2 the world's fastest supercomputer according to the TOP500 list for June and November 2013 utilizes Xeon Phi accelerators based on Knights Corner."
/Wikipedia/

In my view such an benchmark is both important and interesting, thus the incoming Knights Corner/Landing processors will show the power of ZMMWORD moves.

"... Far more interesting is that Intel's Knights Landing not only will be an expansion card but also will serve as a standalone platform or processor ..."
/http://www.admin-magazine.com/Articles/Exploring-the-Xeon-Phi/

Simply, the near future will bring monstrous bandwidths, not utilizing them is just ... lame. I see how some 30 years old algorithms are/have to be 'pimped' and pumped.

>it's unlikely that many people are going to be familiar with the LZSS algorithm.
Agreed, but it is with everything like that, isn't it!

>i'd suggest including a simpler compressor with a description of the algorithm.
Sure.

>you have to at least get the readers up to the point where they can follow your discussion of the decompressor. otherwise, you're starting the story in the middle.
Of course, I am not some villain wanting to obscure information, but as I said my focus is entirely on textual decompression speed boosts, they alone took a lot of my time to wrestle with.

And for those who want to see what one of the best in the world has done:
http://fastcompression.blogspot.com/p/lz4.html

Comparing an eventual etude with two of the fastest (LzTurbo being the other) is a must, I see no better way to evaluate its worth.

Kenneth Haugland3-Apr-14 23:34

Kenneth Haugland

3-Apr-14 23:34

The formula that I'm using to create this is from the book "Mathematical tools in computer graphics with C# implementation", however it only had the formula and no code. You could find the same kind of formulations available online here[^], and skipback and forwards on the slides to see a bit more.

My Implementation looks like this:

''' <summary>
''' The Bessel-Overhauser Spline interpolation
''' </summary>
''' <param name="p0">First point</param>
''' <param name="p1">Second point</param>
''' <param name="p2">Third point</param>
''' <param name="p3">Forth point</param>
''' <param name="t"></param>
''' <param name="u">The value from 0 - 1 where 0 is the start of the curve (globally) and 1 is the end of the curve (globally)</param>
''' <returns></returns>
''' <remarks></remarks>
Public Function PointOnBesselOverhauserCurve(ByVal p0 As System.Windows.Point, ByVal p1 As System.Windows.Point, ByVal p2 As System.Windows.Point, ByVal p3 As System.Windows.Point, ByVal t() As Double, ByVal u As Double) As System.Windows.Point
    Dim result As New System.Windows.Point()

    Dim ViXPlusHalf, VixMinusHalf, ViYPlusHalf, ViYMinusHalf, ViX, ViY As Double
    ViXPlusHalf = (p2.X - p1.X) / (t(2) - t(1))
    VixMinusHalf = (p1.X - p0.X) / (t(1) - t(0))
    ViYPlusHalf = (p2.Y - p1.Y) / (t(2) - t(1))
    ViYMinusHalf = (p1.Y - p0.Y) / (t(1) - t(0))

    ViX = ((t(2) - t(1)) * VixMinusHalf + (t(1) - t(0)) * ViXPlusHalf) / (t(2) - t(0))
    ViY = ((t(2) - t(1)) * ViYMinusHalf + (t(1) - t(0)) * ViYPlusHalf) / (t(2) - t(0))


    ' Location of Controlpoints
    Dim PointList As New PointCollection
    PointList.Add(p1)
    PointList.Add(New Point(p1.X + (1 / 3) * (t(2) - t(1)) * ViX, p1.Y + (1 / 3) * (t(2) - t(1)) * ViY))

    ViXPlusHalf = (p3.X - p2.X) / (t(3) - t(2))
    VixMinusHalf = (p2.X - p1.X) / (t(2) - t(1))
    ViYPlusHalf = (p3.Y - p2.Y) / (t(3) - t(2))
    ViYMinusHalf = (p2.Y - p1.Y) / (t(2) - t(1))

    ViX = ((t(3) - t(2)) * VixMinusHalf + (t(2) - t(1)) * ViXPlusHalf) / (t(3) - t(1))
    ViY = ((t(3) - t(2)) * ViYMinusHalf + (t(2) - t(1)) * ViYPlusHalf) / (t(3) - t(1))

    PointList.Add(New Point(p2.X - (1 / 3) * (t(3) - t(2)) * ViX, p2.Y - (1 / 3) * (t(3) - t(2)) * ViY))
    PointList.Add(p2)

    ' Get the calcualted value from the 3rd degree Bezier curve
    Return PointBezierFunction(PointList, u)
End Function

I assumed that t(), and array of equal length to the number of points, would be the same in both X and Y directions. Now, how should adjust my t() values, and should they be dependent on the values X and Y, meaning that they have one value for X and one value for Y? And, is the implementation correct?

Re: The Bessel-Overhauser Spline interpolation - suitable values for the weight function

Kenneth Haugland6-Apr-14 0:30

Kenneth Haugland

6-Apr-14 0:30

I found out that if I number the t vector = {1,2,3 ... ,n} the Bessel-Overhauser implementation turns into Catmull-Rom Spline, so I guess it means that it is correct.

A complete test project can be downloaded here:
Spline Interpolation - history, theory and implementation[^]

Factoring algorithm

Member 41945931-Apr-14 4:46

Member 4194593

1-Apr-14 4:46

Does anyone have any comment on my factoring algo below:

A new method to factor large semi primes.

The plan is to factor large semi prime numbers. Start by picking 2 Prime Numbers of
appropriate digit size. Choose one as a 200 digit Prime Number, and the other as a 300
digit Prime Number, where their product would be a 500 digit prime number. Then convert
the two decimal digit prime numbers to binary values, and multiply them to form the binary
semi prime product. Calculate the bit size of the binary product. Save a copy of this semi
prime into another value whose bit size is the size of the semi prime rounded up to a
DWORD (mod 32 bit) bound, but left shift it until it fills the largest DWORD. Then save
another copy of this left shifted semi prime, but this time right shifted by 1 bit. These
copies of the semi prime will be used to compare the products of the highest test DWORDS
calculated in the algorithm below (see below for the explanation of why two copies are
used).

The semi prime product would then be factored using a new algorithm.

It has been suggested (by S. C. Coutinho in "The Mathemetics of Cipher", Chapter 11) that
to select the prime numbers for a key of a digit size of r, select one between a digit
size of ((4 / 10) * r) and ((45 / 100) * r). The starting point for the factoring will be
chosen as the middle value between these limits, and will step alternately higher and
lower until the factors are discovered. The first starting factor will then be the p (the
smaller number), and the other starting factor will be q (the larger number) which will be
set to ((10^r ) / p).

With an r of 500, the lowest limit (in digits) will be:

p = ((4 / 10) * r)
p = ((4 / 10) * 500)
p = (4 * 50)
p = 200 digits

The upper limit will be:

p = ((45 / 100) * r)
p = ((45 / 100) * 500)
p = (45 * 5)
p = 225 digits

The mid-point will be:

P = ((200 + 225) / 2)
p = 212 digits

Thus the starting p is a 212 digit number. The other starting factor will be:

q = ((10^r) / p)
q = ((10^500) / (10^212))
q = 288 digits

The p value is then the calculated digit size (converted to bits and then DWORDS), but
the q value must be calculated by subtracting the selected p value bit size from the
selected semi prime bit size (and then rounded up to DWORDS). These bit sizes define bit
fields that will be filled in from both ends with actual values during the factoring
process.

The bit size, digit size, and DWORD sizes of these values are as follows (all of these
powers of 2 are the largest exponent that would not exceed the associated power of 10,
and the DWORD count is the minimum count of DWORDS to contain that binary bit count):

2^707 = 10^212 = 23 DWORDS starting low
2^960 = 10^288 = 30 DWORDS starting high
2^999 = 10^300 = 32 DWORDS max high
2^1664 = 10^500 = 52 DWORDS max product

Three sets of pre-computed tables must be constructed to aid (actually enable) the
factoring of this semi prime.

The first table is saved as 32 bit odd factors (DWORDS), which, when multiplied and
masked to a DWORD, give the same masked product value. This table is saved as multiple
files with names matching the masked product value.

A second, similar, table is constructed for factors which all have the most significant
bit set and saved as files with names matching the highest 32 bits of the product value
(masked to 32 bits).

The third table is just a list of the prime numbers to be considered (all prime numbers
less than 300 decimal digits), but saved in a special way as a multi-level directory
structure as described below.

The entries for this third table consist of DWORD pairs where one of the DWORDS is the
lowest DWORD of the prime number and the other DWORD is the highest 32 bits of the prime
number. All prime numbers that share both of these end values and have the same bit size
will be saved in a directory whose name matches the two DWORDS (high DWORD value and low
DWORD value). The directory sets will be saved under a directory that relates to the bit
size.

One bit of explanation about the information that follows. When a string is enclosed in
quotation marks (""), it is an encoded alphanumeric string. When the characters are not
enclosed in quotes, then the characters are treated as independent decoded DWORD values.

At this point, a diagram may be appropriate. Consider a large prime number with DWORDS
indicated as Hx and Lx, and bit size as a 3 BYTE value indicated by Sx:

H1 H2 H3 H4 ... L4 L3 L2 L1 length Sx

The lowest level directory structure would be (where "Sx" are the different prime number
bit sizes from 2 to 1664):

"Sx"
"Sy"
...
"Sz"

Beneath each of these size directories would be a layer of directories named "H1x L1x"
(whose prime number bit size is "Sx" and where "H1x" and "L1x" are the first level
different highest and lowest DWORD values of the semi prime):

"H1x L1x"
"H1y L1y"
...
"H1z L1z"

Beneath each of these directories would be another layer of directories named "H2x L2x":

"H2x L2x"
"H2y L2y"
...
"H2z L2z"

Essentially, you are creating piecemeal path names on some drive or drives (at some
directory level) as a relative path name:

".\Sx\H1x L1x\H2x L2x\H3x L3x\H4x L4x\ ..."
".\Sx\H1y L1y\H2y L2y\H3y L3y\H4y L4y\ ...
...
".\Sx\H1z L1z\H2z L2z\H3z L3z\H4z L4z\ ..."

".\Sy\H1x L1x\H2x L2x\H3x L3x\H4x L4x\ ..."
".\Sy\H1y L1y\H2y L2y\H3y L3y\H4y L4y\ ...
...
".\Sy\H1z L1z\H2z L2z\H3z L3z\H4z L4z\ ..."

".\Sz\H1x L1x\H2x L2x\H3x L3x\H4x L4x\ ..."
".\Sz\H1y L1y\H2y L2y\H3y L3y\H4y L4y\ ...
...
".\Sz\H1z L1z\H2z L2z\H3z L3z\H4z L4z\ ..."

One other thing to note is that this method greatly reduces the duplication of some of
the DWORD values that would otherwise occur if the actual prime numbers were given as
just a list of full N digit values, i.e., consider 32 DWORD prime numbers of the
following form (where each prime number contained the same low 4 DWORDS and same high 4
DWORDS):

H1 H2 H3 H4 ... ... ... L4 L3 L2 L1

The "... ... ..." represent 24 DWORDS, some of which must be different (remember, this
is from a list of prime numbers that are all unique), yet H1 H2 H3 H4 and L4 L3 L2 L1
DWORDS would all be the same and thus not duplicated in this level structure. In the
maximum duplication form, the H1 through H15 and L1 through L15 DWORDS could all be the
same, and only the H16 and L16 DWORDS would be different.

Note that the data is not really binary data, but directory names which must be formed
with alphanumeric or special characters. What will be done to reduce the size of the data
is that the numeric DWORD values will be converted to encoded character values. Several
things were considered in this quest. If decimal conversion were done (0-9), then the
DWORDS would need 10 characters for each DWORD or 20 characters total. If 4 bit hex
conversions were used (0-9 and A-F), then the DWORDS would need 8 characters for each
DWORD or 16 characters total. If 6 bit ASCII64 encoding were used, then the DWORDS would
need 6 characters for each DWORD or 12 characters total but the resulting upper and lower
case letters would require the use of Unicode filenames, and this alone would double the
size of the data to 24 BYTES. Using a character set of upper case and numeric characters
and allowed special characters (26 + 10 + 16 or a 52 character conversion set) would allow
6 character conversions for each DWORD or 12 characters total, but would require divisions
to be used to extract the actual character indexes. Using a 32 character conversion set
(A-Z and 0-5) would result in 7 character conversions for each DWORD (splitting each
DWORD into seven 5 bit fields and then encoding each 5 bits) giving 14 characters for
both DWORDS, but since it would be better to treat both DWORDS as a single QWORD and
split the 64 bits into 13 parts saving even one more character, that method will be
selected. A multi-level table lookup will be used to convert the encoded 13 characters
back into a binary value (a column of 32 QWORD values indexed by the character value, and
13 columns of these QWORD values indexed by which character of the encoded value was
being decoded, all 13 resulting values would be accumulated by adding to yield the QWORD
binary value, then split to two DWORDS).

The size values in the first (lowest level) directory (three BYTE size values) also need
to be converted to character values. For small prime number size values (2, 3, 4, ...)
this would create character strings such as "AAAAB", and there would also be no need for
the High DWORD, thus such small prime numbers would be saved as only a truncated size
field (delete any leading 'A' characters) and would only contain a single DWORD value in
the next lower directory level (second level) such as:

First Second
Level Level

"B" for 2 bits
"AAAAAAB" for prime number 2
"AAAAAAC" for prime number 3
"C" for 3 bits
"AAAAAAE" for prime number 5
"AAAAAAG" for prime number 7
"D" for 4 bits
"AAAAAAK" for prime number 11
...
...

No attempt will be made to delete the leading 'A' characters in these short prime numbers.

When the bit size exceeds 32 bits, the second DWORD would be created to precede the first
DWORD in the directory name. This second (most significant) DWORD would consist of the
most significant 32 bits of the prime number and would always have the most significant
bit set (always be at least an encoded "B..."), and would result in an encoded 13
character directory name.

The max size of a prime number of 300 digits is 32 DWORDS or 16 QWORDS which would convert
to a directory name of 232 characters ((13 + 1) * 16) + (5 + 1) + (1 + 1)) (max directory
name size plus a separator times 16 sub directory levels plus size directory name plus a
separator plus the drive name plus a separator). This is well within the MAX_PATH for a
directory entry which is 260 BYTES so unicode path names are not required.

The factoring algorithm starts out by factoring the semi prime's lowest DWORD and highest
32 bits into lists of DWORD pairs (masked to a low DWORD for the low DWORD pair and to a
high DWORD for a high 32 bit pair) whose DWORD products match the appropriate end values
of the semi prime. Prime numbers whose ends match these pairs are the only prime numbers
that need to be considered in the factoring process, all other prime numbers cannot
possibly be the correct values. All four of the possibilities must be considered (one at
a time) when considering which value of a pair belongs to the larger prime number and
which value belongs to the smaller prime number. Assume that the end values for the first
(odd number) table pair values were A and B and the second (MSD bit set) table values were
C and D. The possible test primes would then be:

Small and Large

C ... A and D ... B
C ... B and D ... A
D ... A and C ... B
D ... B and C ... A

Start the building process by creating four number buffers (bit fields) whose bit size is
the bit size of the semi prime extended to a mod 32 bit size. Now process each of the
above mentioned 4 possible pairs, let us say, starting with the C ... A and D ... B pair.
Multiply them as follows:

C ... A
* D ... B
--------
RS = B * A (R ignored)
TU = D * C (U ignored)

One word about my selection of variable names R, S, T, U, W, X, Y, and Z (in the
example above and in the example below). I did not use the letter "V" because in my
text editor (PFE), while using the OEM fixed pitch font, the "U" (Uniform) and "V"
(Victor) letters appear almost the same and are easily confused.

Now, S would be equal to the lowest DWORD of the semi prime because the A and B characters
were entries in the factor table for the lowest DWORD in the semi prime, but R needs to be
calculated and saved. T would be equal to the highest 32 bits of the semi prime for the
same reason and U needs to be calculated. S and T will always match the end DWORDS of the
left shifted semi prime (also, see below). The R must be calculated as (A * B) >>= 32 and
T must be calculated as (C * D) >>= 32. This will be done once as the factor pair lists are
read and the R values and U values (1 for each pair) saved in two arrays for later use.

Look up the directory ".\Ss\C A" and the directory ".\Sl\D B" (where "Ss" is the small
test prime number size and "Sl" is the large prime number size), and the lowest of their next
level sub-directories (treat the sub-directories as a list of possible useful matches). If
either such directory or sub-directory entry does not exist, then skip to the next low or
high list entry and retry.

Access the first of the ".\Ss\C A" sub-directory entries and assume that it contains the 2
DWORDS as E F and that the first of the ".\Sl\D B" sub-directory entries contains the 2
DWORDS as G H. Combine the sub directory values with the parent directory values to
continue building the prime numbers as:

C E ... F A
* D G ... H B
------------

Multiply F A and H B and mask the product to a QWORD and check if the result matches the
least significant QWORD of the semi prime:

F A
* H B
----
RS = B * A (already computed above)
TU = B * F (T ignored)
WX = H * A (W ignored)
YZ = H * F (YZ ignored)
----
IJKL (IJ ignored)

K ((R + U + X) / DWORD) should match second lowest DWORD of the semi prime. If not, select
the next low pair and try again,

Multiply C E and D G and check if the most significant 64 bits of the product matches the
most significant 64 bits of the semi prime:

C E
* D G
----
RS = G * E
TU = G * C
WX = D * E
YZ = D * C (already computed above)
----
MNOP (OP ignored)

MNOP may possibly need to be adjusted because the product may just be 127 bits and not 128
bits. If and only if (IFF) the most significant two bits of C and D are both set, then the
product would be 128 bits, otherwise it would be 127 bits and require a shift to be
correct for a compare with the high semi prime value (the semi prime high end would be
already left shifted to set the most significant bit of a DWORD). Because of this anomoly,
all products starting with these pairs (for all 16 levels) must be left shifted by one
bit. This is time comsuming. To avoid this, the left shifted semi prime has been also
saved as a right shifted by one bit value. When the high factor pair is first selected, a
check will be made and the correct pointer to the semi prime check value will be
established for that pair, and product shifting will not be required.

M and N should match the highest 64 bits of the check semi prime. If not, select a new
high pair and reset the low pair to the beginning (working with a new high pair) and start
again until both pairs match. Note that O and P cannot be checked at this time because it
is unknown how much the carry DWORDS will be until the next level testing can select the
third DWORD pair.

If the end of either factor list is reached before the double match, change which of the
four initial pairs is being used, and if all four pairs have been checked, then change the
test number bit sizes (alternate lower and higher around the smaller value mid-point that
was selected and then recalculate the larger number bit count), and reset the lists and
start again until the actual factors are found.

If both end points match then check if you are at the lowest level. Do this by maintaining
the bit sizes by decrementing by 64 as you go to each next level - when the smaller test
number lower size is less than 64, then hold the smaller test number and just process the
larger test number for subsequent levels.

If both sizes are less than 64 then you have the final test primes. Just multiply out the
DWORD values (to get all middle carries) to verify whether this is a valid solution. To
get these values (they are split into 4 parts during this testing), the low DWORD values
are correct, it is only the high values that need to be adjusted and concatenated to the
top of the low values and then multiplied. To do this for the smaller test value, subtract
the bit size of the smaller test value lower test DWORDS (DWORD count * 32) from the total
bit size of the smaller test value. Divide the result by 32 (a shift works really well)
and what remains is the bit count to right shift the smaller test value high DWORD values
deleting low bits that are shifted out of the lowest DWORD value in the high piece (use
right shift double on pairs of adjacent DWORDS), then concatenate the result to the low
DWORDS. Do the same thing to adjust the larger test value high DWORDS. Then multiply and
check the product with the semi prime itself.

If both sizes are > 63 then drop down both of the directory levels and get the next two
pair of DWORDS for the test prime numbers and extend the multiplies by a DWORD each and
check again until you have the final test primes.

Obviously, quit when you find the two final prime numbers that, when multiplied, result in
the semi prime product.

Dave.

Bernhard Hiller1-Apr-14 20:44

Bernhard Hiller

1-Apr-14 20:44

That's bigger than the common homework questions here. Is it for your thesis?

Member 41945932-Apr-14 6:17

Member 4194593

2-Apr-14 6:17

Bernhard,

No this is not for any homework assignment. It is just an attempt to find another algorithm for factoring numbers - primarily to factor a semi prime. I know that this will be Big Data, but before I start implementing, is there any problem within the algorithm itself?

Dave.

Kornfeld Eliyahu Peter2-Apr-14 9:30

2-Apr-14 9:30

Maybe not thesis but can be a good resource - http://www.codeproject.com/Insider.aspx/undefined/?fid=1658735&tid=4791393[^]

I'm not questioning your powers of observation; I'm merely remarking upon the paradox of asking a masked man who he is. (V)

Peter_in_278023-May-14 15:34

Peter_in_2780

23-May-14 15:34

Member 4194593 wrote:
The third table is just a list of the prime numbers to be considered (all prime numbers
less than 300 decimal digits), but saved in a special way as a multi-level directory
structure as described below.

A quick consultation with the revered Dr Riemann indicates that there are over 10^297 primes less than 10^300. Since there are somewhere on the order of 10^80 atoms in the known universe, your compression scheme must achieve a minimum compression ratio in excess of 10^217 : 1. Can't see that happening in a hurry... Sometimes "big data" is just too big.

Cheers,
Peter

Software rusts. Simon Stephenson, ca 1994. So does this signature. me, 2012

Re: Team Contract Algorithm

Member 419459323-May-14 17:03

Member 4194593

23-May-14 17:03

Peter,

I was waiting for someone to figure that out. Check with Ravi, I proposed this to him in Email in March. The following is the second half of the original Email to him. Too bad this would not work, I think that the actual algo would work except for the fact that the file requirements are insane:

<pre>
------------------------------------------------------------------------------------------

The following discussion is why this algorithm (the part above this point) should be
published in Algorithms on April 1st without this lower section - let the wrecking crew of
experts in CP determine what is so wrong with the solution (the algorithm is correct but
just will not work).

------------------------------------------------------------------------------------------

Now, the number of primes with 300 digits is:

((10^300) / ln(10^300)) = (3.33 * (10^297))

The max size of a prime number in that range is 52 DWORDS or 26 QWORDS which would convert
to 232 characters so the total size of all path names describing all of these prime
numbers will be:

(232 * 3.33 * (10^297)) = (7.7 * (10^299)) BYTES

The number of terabytes to contain this data would be:

(7.7 * (10^299) / (10^12)) = (7.7 * (10^287)) TB

The number of 4 TB drives to contain this total size would thus be:

((1/4) * 7.7 * (10^287)) = (1.9 * (10^287)) drives

I don't think that Seagate has enough material to create that many drives. Penrose in his
book "The Emperor's New Mind" estimated that the number of particles in the visible
universe was 10^80, and after that problem has been solved, you still have another
problem about how much time it would take to write out these prime number directories to
the drives. So much for having the prime numbers available to use as trial multipliers to
see if the product matches the semi prime.

Lest you think it would make a difference (in the count and total size of the prime
numbers) if you only saved the prime numbers that lie between 10^200 and 10^300:

((10^300) / ln(10^300) - (10^200) / ln(10^200)) =
(1.447 * (10^297) - 2.171 * (10^197)) =
(1.447 * (10^297))

or

(1.447 * (10^297)) = (1.447 * (10^297))

Saving the number of prime numbers in a range of digits (from 200 to 300) would not even
make a visible dent in the count because the actual calculation is subtracting a 197 digit
number from a 297 digit number, leaving a 297 digit number with a diminished lowest value
(the last 196 digits if a borrow was needed else 197 digits), but the most significant 100
(or at least 99) digits would be unchanged. Only if you displayed the result in its actual
297 digit decimal form (instead of in truncated scientific notation) would you be able to
see the difference.

Another way to visualize this is to see how big a block of (1.9 * (10^287)) 4 TB drives
would be. Now my drives are (in inches) about 1.75 * 4.75 * 7. The diagonal from the
center of the drive to any corner is:

((((1.75 / 2)^2 + (4.75 / 2)^2 + (7 / 2)^2))^(1 / 2))) = 4.32 inches

A slightly smaller version of the block of drives would be to block the drives as a 4 wide
and 2 deep block, almost making them a cube. This would group the drives in blocks of 8
drives giving a total number of blocks as:

((1.9 / 8 * (10^287))) = (2.375 * (10^286)) blocks

The diagonal from the center of the block to any corner is:

((((1.75 * 4 / 2)^2 + (4.75 * 2 / 2)^2 + (7 / 2)^2))^(1 / 2))) = 6.86 inches

The dimensions (in the number of blocks) of a block with the same number of blocks in each
row, column, and layer would be:

((2.375 * (10^286))^(1/3)) = (2.874 * (10^95))

The diagonal from the center of this block to any corner would be

(6.86 * 2.874 * (10^95)) = (1.97 * (10^96)) inches
((6.86 * 2.874 / 12) * (10^95)) = (1.64 * (10^95)) feet
((6.86 * 2.874 / 5280) * (10^95)) = (3.11 * (10^91)) miles

The radius of a sphere that would enclose this block of drives would then be:

(3.11 * (10^91)) miles.

The radius of the the sphere that would enclose our solar system would be equal to the
orbit size of the planetoid Pluto (I guess I should be politically correct and not call
it a planet anymore) which is:

4.583 billion miles

or

(4.583 * (10^9)) miles

How much bigger would the radius of the sphere enclosing the block of drives be relative
to the radius of the sphere enclosing the solar system? To compare two spheres for "which
one is bigger, and by how much", consider comparing a baseball with a basketball. You
compare their height or width differences and not their circumference or volumes
differences (whether you are comparing diameters or radii), so try:

((radius of drives enclosing sphere) / (radius of solar system enclosing sphere))
((3.11 * (10^91)) / (4.583 * (10^9))) = (6.79 * (10^81))

It would be (6.79 * (10^81)) times as big as the radius of our solar system, and,
furthermore, you would be filling that space with 4 TB drives instead of the almost
complete vacuum that exists there today.

Where would you get that much material to build the drives? Maybe we could be seeing some
quantum sharing of particles between (10^287) different drives, all at the same time,
without dropping any recorded bits??? Let's not even consider the power requirements, or
the write time to populate the directory entries before the factoring algorithm starts.
</pre>

Dave.

Team Contract Algorithm

Laurence Senna27-Mar-14 20:23

Laurence Senna

27-Mar-14 20:23

Hi,

I want to make a simple algorithm for F1 Challenge. Basically i want to make an external program that takes my race results and at the end of the season i get "contract offers" depending on how well i did. I already know how i want it to work, i just want to know if there's already an algorithm similar to this?

Richard MacCutchan28-Mar-14 0:22

Re: Detecting File Changes

28-Mar-14 0:22

This hardly needs an algorithm. It is just a matter of adding up all the points and sorting the drivers and teams into ascending order by points.

Detecting File Changes

Richard Andrew x6427-Mar-14 13:44

Richard Andrew x64

27-Mar-14 13:44

I use Carbonite backup service on my main machine. Carbonite claims that they are able to transmit only changes in a file to their servers so that the whole file needn't be transmitted again when there are changes to that file.

How are they doing this? Wouldn't they need to compare the changed file against a complete copy of the original? And to do this across the net would require transmitting the entire file anyway, right?

Confused | :confused:

The difficult we do right away...
...the impossible takes slightly longer.

Richard MacCutchan27-Mar-14 22:19

Re: Detecting File Changes

27-Mar-14 22:19

Well I guess only Carbonite could give a definitive answer. But, it is possible they are keeping a copy of the file's directory information so they can just compare disk segments, and they backup by copying the segments direct from disk. Then on the next incremental backup they only need to copy any segments that have been added since the previous backup.

Matt T Heffron28-Mar-14 7:22

Matt T Heffron

28-Mar-14 7:22

That is likely to miss changes to files that do updates in place with mapped files...

Re: Detecting File Changes

Richard MacCutchan28-Mar-14 8:04

How much is my encryption algorithm worth?

28-Mar-14 8:04

Which is why I added the initial sentence to my response.

Daniel Mullarkey22-Mar-14 20:47

Daniel Mullarkey

22-Mar-14 20:47

I developed a lossless, key-based, encryption algorithm that can work in any modern programming language and it has a 1-to-1 relationship between the unencrypted values and the encrypted values as long as the key remains the same and the minimum values and maximum values are known. The key can also be of virtually limitless size or of a very limited size. The algorithm uses carefully calculated mathematics to give the appearance of gibberish until it is decoded with the proper key or combination of keys. Other than that, I cannot reveal anything, lest I lose any intellectual property right opportunities that I have to the algorithm. How much is this encryption algorithm potentially worth if I patent it?

Kornfeld Eliyahu Peter22-Mar-14 21:19

22-Mar-14 21:19

How many keys are you using? (I mean public/private).
How you exchange keys?
There is default values for key size?
Did you run some performance tests on real life data?

I'm not questioning your powers of observation; I'm merely remarking upon the paradox of asking a masked man who he is. (V)

Richard MacCutchan22-Mar-14 22:10

22-Mar-14 22:10

Daniel Mullarkey wrote:
How much is this encryption algorithm potentially worth if I patent it?

Not much, since there are already high quality encryption methods available at no cost.

Daniel Mullarkey23-Mar-14 3:34

Daniel Mullarkey

23-Mar-14 3:34

Well, to address both replies, the core function headers look something like this:

Public Function Encode(ByVal UnencodedValue As Integer, ByVal KeyValue As Integer, ByVal MinimumValue As Integer, ByVal MaximumValue As Integer) As Integer

Public Function Decode(ByVal EncodedValue As Integer, ByVal KeyValue As Integer, ByVal MinimumValue As Integer, ByVal MaximumValue As Integer) As Integer

Keep in mind that UnencodedValue and EncodedValue can range anywhere from MinimumValue to MaximumValue and that the KeyValue can range anywhere from MinimumValue + 2 to MaximumValue.

Kornfeld Eliyahu Peter23-Mar-14 4:04

23-Mar-14 4:04

Encryption is not only (and not mainly) about hide your data behind numbers. It can be smart but that's not nearly enough...You have to think about the data exchange between the side encrypting and the other that decrypting the data. For every encryption has helpers - these are the keys - and those helpers are have to pass (or be presented) on both side. You method signatures are telling nothing about that - just from this we can't tell nothing about, how your encryption is good. It also tells nothing about it speed of execution...

I'm not questioning your powers of observation; I'm merely remarking upon the paradox of asking a masked man who he is. (V)

modified 23-Mar-14 10:49am.

Daniel Mullarkey23-Mar-14 4:38

Daniel Mullarkey

23-Mar-14 4:38

Well, I can say that the core methods involve 10 lines of code, one with a conditional (if...then) recursion that at best only occurs once per original use and the rest involves conditional (if...then) mathematics at three levels, thereby making it very quick and very fast. Besides, ever heard of lossless compression? Well, I might as well have created perpetual lossless, key-based, encryption. Keep in mind that the encryption algorithm is symmetrical, so the very idea of having public keys would be a bad idea. The core of my encryption algorithm is essentially a block cipher in which the unencrypted value does not grow or shrink in size when encrypted. Also, keep in mind that it can not only encrypt bulk data, but it can also use bulk keys. And one day I will show that I can deliver on the goods. I just have to find a way to afford to pay LegalZoom $1,000 to patent it.

"In cryptography, a block cipher is a deterministic algorithm operating on fixed-length groups of bits, called blocks, with an unvarying transformation that is specified by a symmetric key. Block ciphers are important elementary components in the design of many cryptographic protocols, and are widely used to implement encryption of bulk data."
--Quoted from Wikipedia from Block cipher

Kornfeld Eliyahu Peter23-Mar-14 4:49