Introduction
This article comes to suggest a way to use/compress an array of float numbers where most of the array is filled with nulls or, in this case, NaNs.
Background
In most cases when there is a need to work with sparse arrays, the most common way to do it is to save two items for every value in the array. The first is usually the index and the second is the value itself, this was the situation I was facing. The arrays I handle are 128-4096 long and they represent a collection of harmonic values or various calculations with an interesting characteristic. Most of the harmonics do not exist and when some harmonics exist, they come in groups with close proximity to one another. Since there were groups of values close together, I was looking for a way to save these values with spending minimum space on the indexes.
The Algorithm
In a single line, the main idea of the algorithm is storing the indexes embedded inside the NaN values. In a paragraph, it goes like this:
Assume the array looks like this:
1.14 2.44 3.36 NaN NaN NaN 5.6 NaN NaN NaN NaN NaN NaN 25.6 7.46 3.4 3.2 NaN
Compressing the numbers might be a hard task as the values are usually close to random but a nice thing to do would be to have the array look more like the following:
1.14 2.44 3.36 3xNaN 5.6 6xNaN 25.6 7.46 3.4 3.2 1xNaN
This might look something like ‘partial’ RLE. In an array of 1024, there might be 5-35 valid values in groups and the rest NaNs. Writing the array in such a way would lead to very good compression. The only problem is how to write the number of NaNs, of course, without wasting valuable space. The answer came from the NaN structure. float.NaN
is actually a special number which is part of the IEEE 754. I won’t go into the whole floating point structure but just will state that a NaN is any 32bit (in a Single precision case) number with the flowing form:
X1111111 1XXXXXXX XXXXXXXX XXXXXXXX
The first X can be 0 or 1 and the rest of the bits can be anything but straight zeros (according to IEEE 754).
This means that the following are NaNs:
X1111111 10000000 00000000 00000000
X1111111 10000000 00000000 00001010
X1111111 10000000 00000000 01111010
X1111111 10000000 00000000 00110010
.
.
What the algorithm does is write the number of NaNs in the higher bits (little endian).
Using the Code
Using the code can't be easier. There are 3 main methods, compress
, uncompress
and GetValue
which work both on the compressed and on uncompressed arrays.
float[] Harmonics = new float[1024];
float[] Compressed = SparseFloatArray.CompressArray(Harmonics);
float Value26 = SparseFloatArray.GetValue(Compressed, 26);
float Value13 = SparseFloatArray.GetValue(Compressed, 13);
float[] UnCompressed = SparseFloatArray.DeCompressArray(Compressed);
Remarks
- I use standard arrays instead of
List
as they are twice as fast (except for where the compressed array is built). - The maximum number that can be written in the unused bits is 2^16. This means that the maximum number of concurrent NaNs must not exceed 2^16.
- This code works with float, you can easily convert it to work with doubles.
- Don't try to store the float values in a database such as SQL as the Single precision values SQL uses are not IEEE 754 compliant. Better convert the array to bytes and store as an Image.
- If you use this for saving harmonics, in some special cases, it would be wise to have 2 arrays. One for the odd and one for the even and have some modifications to the code.
GetValue(a,b)
works both on the compressed and non-compressed arrays. Operation complexity, unfortunately, is O(n).
History
- 26th March, 2008: Initial post
Gilad holds a B.Sc in Computer Eng. from the Technion IIT.