Click here to Skip to main content
Click here to Skip to main content
Go to top

Optimized Sparse Float Array

, 26 Mar 2008
Rate this:
Please Sign up or sign in to vote.
Saving and working on a sparse float array while hiding the index data

Introduction

This article comes to suggest a way to use/compress an array of float numbers where most of the array is filled with nulls or, in this case, NaNs.

Background

In most cases when there is a need to work with sparse arrays, the most common way to do it is to save two items for every value in the array. The first is usually the index and the second is the value itself, this was the situation I was facing. The arrays I handle are 128-4096 long and they represent a collection of harmonic values or various calculations with an interesting characteristic. Most of the harmonics do not exist and when some harmonics exist, they come in groups with close proximity to one another. Since there were groups of values close together, I was looking for a way to save these values with spending minimum space on the indexes.

The Algorithm

In a single line, the main idea of the algorithm is storing the indexes embedded inside the NaN values. In a paragraph, it goes like this:

Assume the array looks like this:

1.14 2.44 3.36 NaN NaN NaN 5.6 NaN NaN NaN NaN NaN NaN 25.6 7.46 3.4 3.2 NaN 

Compressing the numbers might be a hard task as the values are usually close to random but a nice thing to do would be to have the array look more like the following:

1.14 2.44 3.36 3xNaN 5.6 6xNaN 25.6 7.46 3.4 3.2 1xNaN 

This might look something like ‘partial’ RLE. In an array of 1024, there might be 5-35 valid values in groups and the rest NaNs. Writing the array in such a way would lead to very good compression. The only problem is how to write the number of NaNs, of course, without wasting valuable space. The answer came from the NaN structure. float.NaN is actually a special number which is part of the IEEE 754. I won’t go into the whole floating point structure but just will state that a NaN is any 32bit (in a Single precision case) number with the flowing form:

X1111111 1XXXXXXX XXXXXXXX XXXXXXXX

The first X can be 0 or 1 and the rest of the bits can be anything but straight zeros (according to IEEE 754).

This means that the following are NaNs:

X1111111 10000000 00000000 00000000 
X1111111 10000000 00000000 00001010 
X1111111 10000000 00000000 01111010 
X1111111 10000000 00000000 00110010 
.
.

What the algorithm does is write the number of NaNs in the higher bits (little endian).

Using the Code

Using the code can't be easier. There are 3 main methods, compress, uncompress and GetValue which work both on the compressed and on uncompressed arrays.

 float[] Harmonics = new float[1024];

 float[] Compressed = SparseFloatArray.CompressArray(Harmonics);

 float Value26 = SparseFloatArray.GetValue(Compressed, 26); 
 float Value13 = SparseFloatArray.GetValue(Compressed, 13);

 float[] UnCompressed = SparseFloatArray.DeCompressArray(Compressed);

Remarks

  • I use standard arrays instead of List as they are twice as fast (except for where the compressed array is built).
  • The maximum number that can be written in the unused bits is 2^16. This means that the maximum number of concurrent NaNs must not exceed 2^16.
  • This code works with float, you can easily convert it to work with doubles.
  • Don't try to store the float values in a database such as SQL as the Single precision values SQL uses are not IEEE 754 compliant. Better convert the array to bytes and store as an Image.
  • If you use this for saving harmonics, in some special cases, it would be wise to have 2 arrays. One for the odd and one for the even and have some modifications to the code.
  • GetValue(a,b) works both on the compressed and non-compressed arrays. Operation complexity, unfortunately, is O(n).

History

  • 26th March, 2008: Initial post

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Gilad Kapelushnik
Team Leader
Israel Israel
Gilad holds a B.Sc in Computer Eng. from the Technion IIT.

Comments and Discussions

 
GeneralJava Edition Pinmembertomz918-Mar-09 22:05 
GeneralCurious... PinmemberAndrew Rissing26-Mar-08 7:27 
Just a thought, what if you made it more into a class, such as:
 
public class CompressedFloat
{
  public class CompressedFloat(float[] array) { /* ... */ }
  public float this[int Index]                { /* ... */ }
  public ToCompressedArray()                  { /* ... */ }
  public ToUncompressedArray()                { /* ... */ }
}
 
It might help reduce the chances someone uses the compressed array blindly. Plus, you could put some additional logic into the indexor to help reduce your lookup times. Since, that seems to be the only problematic part of using it.
GeneralRe: Curious... PinmemberGilad Kapelushnik27-Mar-08 7:26 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web02 | 2.8.140916.1 | Last Updated 26 Mar 2008
Article Copyright 2008 by Gilad Kapelushnik
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid