12,823,976 members (42,999 online)
alternative version

#### Stats

19.2K views
8 bookmarked
Posted 26 Mar 2008

# Optimized Sparse Float Array

, 26 Mar 2008 CPOL
 Rate this:
Saving and working on a sparse float array while hiding the index data

## Introduction

This article comes to suggest a way to use/compress an array of float numbers where most of the array is filled with nulls or, in this case, NaNs.

## Background

In most cases when there is a need to work with sparse arrays, the most common way to do it is to save two items for every value in the array. The first is usually the index and the second is the value itself, this was the situation I was facing. The arrays I handle are 128-4096 long and they represent a collection of harmonic values or various calculations with an interesting characteristic. Most of the harmonics do not exist and when some harmonics exist, they come in groups with close proximity to one another. Since there were groups of values close together, I was looking for a way to save these values with spending minimum space on the indexes.

## The Algorithm

In a single line, the main idea of the algorithm is storing the indexes embedded inside the NaN values. In a paragraph, it goes like this:

Assume the array looks like this:

`1.14 2.44 3.36 NaN NaN NaN 5.6 NaN NaN NaN NaN NaN NaN 25.6 7.46 3.4 3.2 NaN `

Compressing the numbers might be a hard task as the values are usually close to random but a nice thing to do would be to have the array look more like the following:

`1.14 2.44 3.36 3xNaN 5.6 6xNaN 25.6 7.46 3.4 3.2 1xNaN `

This might look something like ‘partial’ RLE. In an array of 1024, there might be 5-35 valid values in groups and the rest NaNs. Writing the array in such a way would lead to very good compression. The only problem is how to write the number of NaNs, of course, without wasting valuable space. The answer came from the NaN structure. `float.NaN `is actually a special number which is part of the IEEE 754. I won’t go into the whole floating point structure but just will state that a NaN is any 32bit (in a Single precision case) number with the flowing form:

`X1111111 1XXXXXXX XXXXXXXX XXXXXXXX`

The first X can be 0 or 1 and the rest of the bits can be anything but straight zeros (according to IEEE 754).

This means that the following are NaNs:

```X1111111 10000000 00000000 00000000
X1111111 10000000 00000000 00001010
X1111111 10000000 00000000 01111010
X1111111 10000000 00000000 00110010
.
.```

What the algorithm does is write the number of NaNs in the higher bits (little endian).

## Using the Code

Using the code can't be easier. There are 3 main methods, `compress`, `uncompress `and `GetValue `which work both on the compressed and on uncompressed arrays.

```float[] Harmonics = new float[1024];

float[] Compressed = SparseFloatArray.CompressArray(Harmonics);

float Value26 = SparseFloatArray.GetValue(Compressed, 26);
float Value13 = SparseFloatArray.GetValue(Compressed, 13);

float[] UnCompressed = SparseFloatArray.DeCompressArray(Compressed);
```

## Remarks

• I use standard arrays instead of `List `as they are twice as fast (except for where the compressed array is built).
• The maximum number that can be written in the unused bits is 2^16. This means that the maximum number of concurrent NaNs must not exceed 2^16.
• This code works with float, you can easily convert it to work with doubles.
• Don't try to store the float values in a database such as SQL as the Single precision values SQL uses are not IEEE 754 compliant. Better convert the array to bytes and store as an Image.
• If you use this for saving harmonics, in some special cases, it would be wise to have 2 arrays. One for the odd and one for the even and have some modifications to the code.
• `GetValue(a,b)` works both on the compressed and non-compressed arrays. Operation complexity, unfortunately, is O(n).

## History

• 26th March, 2008: Initial post

## Share

Gilad holds a B.Sc in Computer Eng. from the Technion IIT.

## You may also be interested in...

 First Prev Next
 Java Edition tomz918-Mar-09 23:05 tomz9 18-Mar-09 23:05
 ```import java.util.*; import java.io.*; import java.util.zip.*; /** * Modified from Gilad Kapelushnik's C# edition. * @refer http://www.codeproject.com/KB/recipes/SparseFloatArray.aspx * @author Zheng, Yanbin */ pubilc class Sparse { public static final int MAX_COUNTER = 65536/*2^16*/ - 1 ; public static float[] compressArray(float[] values) { int jumpCounter = 0; List compressed = new LinkedList(); for(int i = 0; i < values.length; i++) { if(Float.isNaN(values[i]) && jumpCounter < MAX_COUNTER) { jumpCounter++; } else { if(jumpCounter != 0) { compressed.add(GetFloatNaNWithNumber(jumpCounter - 1)); jumpCounter = 0; // reset jump counter } compressed.add(values[i]); } } if(jumpCounter != 0) compressed.add(GetFloatNaNWithNumber(jumpCounter - 1)); return toArray(compressed);//compressed.toArray(new float[compressed.size()]);//compressed.ToArray(); } private static float[] toArray(List data) { float[] ret = new float[data.size()]; Iterator it = data.iterator(); int i = 0; while(it.hasNext()) { ret[i++] = ((Float) it.next()).floatValue(); } return ret; } public static float[] deCompressArray(float[] values) { List uncompressed = new LinkedList(); // At least the size of the original array. for(int i = 0; i < values.length; i++) { if(Float.isNaN(values[i])) { int NaNCount = getNumberFromFloatNaNWithNumber(values[i]); // number of NaN here for(int j = 0; j < NaNCount + 1; j++) uncompressed.add(Float.NaN); // insert 'standard' NaNs } else { uncompressed.add(values[i]); } } return toArray(uncompressed);//uncompressed.ToArray(); } public static float getValue(float[] values, int index) { int counter = 0; for(int i = 0; i < values.length; i++) { if(Float.isNaN(values[i])) { if(counter >= index) return Float.NaN; counter += getNumberFromFloatNaNWithNumber(values[i]) + 1; } else { if(counter == index) return values[i]; counter++; } } if(counter > index) return Float.NaN; throw new ArrayIndexOutOfBoundsException("index was out of range"); } private static float GetFloatNaNWithNumber(int value) { byte[] bytes = new byte[]{(byte) 0x0, (byte) 0x0, (byte) 0xc0, (byte) 0xFF}; // base NaN bytes[0] = (byte) (value & 0x000000FF); bytes[1] = (byte) (value >> 8); System.out.println("To encode number of NaN:" + value); return toFloat(bytes);//BitConverter.ToSingle(bytes, 0); } private static int getNumberFromFloatNaNWithNumber(float value) { int i = Float.floatToRawIntBits(value); i = i & 0x0000FFFF; // System.out.println("Number of NaN:" + i); return i; // byte[] Bytes = BitConverter.GetBytes(value); // Bytes[2] = 0x0; // Bytes[3] = 0x0; // return BitConverter.ToInt32(Bytes, 0); } private static float toFloat(byte[] addr) { int address; address = addr[0] & 0xFF; address |= ((addr[1] << 8) & 0xFF00); address |= ((addr[2] << 16) & 0xFF0000); address |= ((addr[3] << 24) & 0xFF000000); return Float.intBitsToFloat(address); } //The following is for testing purpose ================================================================ private static float[] createTestCase() { float[] test = new float[1000000]; for (int i = 0; i < 100 ; i++) { test[i] = i; } for (int i = 100; i < 100000 ; i++) { test[i] = Float.NaN; } for (int i = 100000; i < (100000 + 100) ; i++) { test[i] = i; } for (int i = 200000; i < 800000; i++) { test[i] = Float.NaN; } return test; } public static void main(String[] args) { float[] arr = createTestCase();// new float[]{1, 2, 3, 4, 5, Float.NaN, Float.NaN, Float.NaN, Float.NaN, Float.NaN, 6, 7, Float.NaN, Float.NaN, 8}; float[] compressed = compressArray(arr); System.out.println("Compressed.length:" + compressed.length); float[] decompressed = deCompressArray(compressed); if(arr.length != decompressed.length) { System.out.println("Errror...1"); return; } for(int i = 0; i < arr.length; i++) { if(arr[i] != decompressed[i]) { if(Float.isNaN(arr[i]) && Float.isNaN(decompressed[i])) continue; System.out.println("Not equal. i=" + i + "; val1:" + arr[i] + "; val2:" + decompressed[i]); return; } } saveZip(compressed); float [] unzippedFloats = readZip(); System.out.println( "unzippedFloats.length:" + unzippedFloats.length); if(unzippedFloats.length != compressed.length) { System.out.println( "unzippedFloats.length not matched"); return; } for (int i = 0; i < compressed.length ; i++) { if(compressed[i] != unzippedFloats[i]) { if(Float.isNaN(compressed[i]) && Float.isNaN(unzippedFloats[i])) continue; System.out.println( "Unzipped not matched"); return; } } System.out.println( "Success"); } private static float [] readZip() { try { List compressed = new LinkedList(); // final DataInputStream zip = new DataInputStream(new ZipInputStream(new FileInputStream("test.zip"))); FileInputStream fin = new FileInputStream("test.zip"); ZipInputStream zin = new ZipInputStream(fin); ZipEntry ze = null; final DataInputStream zip = new DataInputStream(zin); while ((ze = zin.getNextEntry()) != null) { System.out.println("Unzipping " + ze.getName()); while(true) { try { float f = zip.readFloat(); compressed.add(f); } catch(EOFException ef) { System.out.println( "EOFException ..........."); break; } } zin.closeEntry(); } zin.close(); System.out.println( "compressed.size:" + compressed.size()); return toArray(compressed); } catch(Exception e) { System.out.println( e); return null; } } private static void saveZip(float[] unzipped) { try { final DataOutputStream zip = new DataOutputStream(createZipOutputStream(new FileOutputStream("test.zip"))); for (int i = 0; i < unzipped.length ; i++) { zip.writeFloat(unzipped[i]); } zip.flush(); zip.close(); } catch(Exception e) { System.out.println( e); } } private static OutputStream createZipOutputStream(OutputStream zipped) throws IOException { final ZipOutputStream zip = new ZipOutputStream(zipped); zip.putNextEntry(new ZipEntry("data")); return zip; } } ```
 Curious... Andrew Rissing26-Mar-08 8:27 Andrew Rissing 26-Mar-08 8:27