Click here to Skip to main content
Click here to Skip to main content
Go to top

Optimized Sparse Float Array

, 26 Mar 2008
Rate this:
Please Sign up or sign in to vote.
Saving and working on a sparse float array while hiding the index data

Introduction

This article comes to suggest a way to use/compress an array of float numbers where most of the array is filled with nulls or, in this case, NaNs.

Background

In most cases when there is a need to work with sparse arrays, the most common way to do it is to save two items for every value in the array. The first is usually the index and the second is the value itself, this was the situation I was facing. The arrays I handle are 128-4096 long and they represent a collection of harmonic values or various calculations with an interesting characteristic. Most of the harmonics do not exist and when some harmonics exist, they come in groups with close proximity to one another. Since there were groups of values close together, I was looking for a way to save these values with spending minimum space on the indexes.

The Algorithm

In a single line, the main idea of the algorithm is storing the indexes embedded inside the NaN values. In a paragraph, it goes like this:

Assume the array looks like this:

1.14 2.44 3.36 NaN NaN NaN 5.6 NaN NaN NaN NaN NaN NaN 25.6 7.46 3.4 3.2 NaN 

Compressing the numbers might be a hard task as the values are usually close to random but a nice thing to do would be to have the array look more like the following:

1.14 2.44 3.36 3xNaN 5.6 6xNaN 25.6 7.46 3.4 3.2 1xNaN 

This might look something like ‘partial’ RLE. In an array of 1024, there might be 5-35 valid values in groups and the rest NaNs. Writing the array in such a way would lead to very good compression. The only problem is how to write the number of NaNs, of course, without wasting valuable space. The answer came from the NaN structure. float.NaN is actually a special number which is part of the IEEE 754. I won’t go into the whole floating point structure but just will state that a NaN is any 32bit (in a Single precision case) number with the flowing form:

X1111111 1XXXXXXX XXXXXXXX XXXXXXXX

The first X can be 0 or 1 and the rest of the bits can be anything but straight zeros (according to IEEE 754).

This means that the following are NaNs:

X1111111 10000000 00000000 00000000 
X1111111 10000000 00000000 00001010 
X1111111 10000000 00000000 01111010 
X1111111 10000000 00000000 00110010 
.
.

What the algorithm does is write the number of NaNs in the higher bits (little endian).

Using the Code

Using the code can't be easier. There are 3 main methods, compress, uncompress and GetValue which work both on the compressed and on uncompressed arrays.

 float[] Harmonics = new float[1024];

 float[] Compressed = SparseFloatArray.CompressArray(Harmonics);

 float Value26 = SparseFloatArray.GetValue(Compressed, 26); 
 float Value13 = SparseFloatArray.GetValue(Compressed, 13);

 float[] UnCompressed = SparseFloatArray.DeCompressArray(Compressed);

Remarks

  • I use standard arrays instead of List as they are twice as fast (except for where the compressed array is built).
  • The maximum number that can be written in the unused bits is 2^16. This means that the maximum number of concurrent NaNs must not exceed 2^16.
  • This code works with float, you can easily convert it to work with doubles.
  • Don't try to store the float values in a database such as SQL as the Single precision values SQL uses are not IEEE 754 compliant. Better convert the array to bytes and store as an Image.
  • If you use this for saving harmonics, in some special cases, it would be wise to have 2 arrays. One for the odd and one for the even and have some modifications to the code.
  • GetValue(a,b) works both on the compressed and non-compressed arrays. Operation complexity, unfortunately, is O(n).

History

  • 26th March, 2008: Initial post

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Gilad Kapelushnik
Team Leader
Israel Israel
Gilad holds a B.Sc in Computer Eng. from the Technion IIT.

Comments and Discussions

 
GeneralJava Edition Pinmembertomz918-Mar-09 22:05 
import java.util.*;
import java.io.*;
import java.util.zip.*;
/**
 * Modified from Gilad Kapelushnik's C# edition. 
 * @refer http://www.codeproject.com/KB/recipes/SparseFloatArray.aspx
 * @author Zheng, Yanbin
 */
pubilc class Sparse
{
    public static final int MAX_COUNTER = 65536/*2^16*/ - 1 ;
    public static float[] compressArray(float[] values)
    {
        int jumpCounter = 0;
        List<float> compressed = new LinkedList<float>();
        for(int i = 0; i < values.length; i++)
        {
            if(Float.isNaN(values[i]) && jumpCounter < MAX_COUNTER)
            {
                jumpCounter++;
            }
            else
            {
                if(jumpCounter != 0)
                {
                    compressed.add(GetFloatNaNWithNumber(jumpCounter - 1));
                    jumpCounter = 0; // reset jump counter
                }
                compressed.add(values[i]);
            }
        }
        if(jumpCounter != 0) compressed.add(GetFloatNaNWithNumber(jumpCounter - 1));
        return toArray(compressed);//compressed.toArray(new float[compressed.size()]);//compressed.ToArray();
    }
 
    private static float[] toArray(List<float> data)
    {
        float[] ret = new float[data.size()];
        Iterator it = data.iterator();
        int i = 0;
        while(it.hasNext())
        {
            ret[i++] = ((Float) it.next()).floatValue();
        }
        return ret;
    }
 
    public static float[] deCompressArray(float[] values)
    {
        List<float> uncompressed = new LinkedList<float>(); // At least the size of the original array.
        for(int i = 0; i < values.length; i++)
        {
            if(Float.isNaN(values[i]))
            {
                int NaNCount = getNumberFromFloatNaNWithNumber(values[i]); // number of NaN here
                for(int j = 0; j < NaNCount + 1; j++) uncompressed.add(Float.NaN); // insert 'standard' NaNs
            }
            else
            {
                uncompressed.add(values[i]);
            }
        }
        return toArray(uncompressed);//uncompressed.ToArray();
    }
 
    public static float getValue(float[] values, int index)
    {
        int counter = 0;
        for(int i = 0; i < values.length; i++)
        {
            if(Float.isNaN(values[i]))
            {
                if(counter >= index) return Float.NaN;
                counter += getNumberFromFloatNaNWithNumber(values[i]) + 1;
            }
            else
            {
                if(counter == index) return values[i];
                counter++;
            }
        }
        if(counter > index) return Float.NaN;
        throw new ArrayIndexOutOfBoundsException("index was out of range");
    }
 
    private static float GetFloatNaNWithNumber(int value)
    {
 
        byte[] bytes = new byte[]{(byte) 0x0, (byte) 0x0, (byte) 0xc0, (byte) 0xFF}; // base NaN
        bytes[0] = (byte) (value & 0x000000FF);
        bytes[1] = (byte) (value >> 8);
 
        System.out.println("To encode number of NaN:" + value);
        return toFloat(bytes);//BitConverter.ToSingle(bytes, 0);
    }
 
    private static int getNumberFromFloatNaNWithNumber(float value)
    {
        int i = Float.floatToRawIntBits(value);
        i = i & 0x0000FFFF;
        // System.out.println("Number of NaN:" + i);
        return i;
        // byte[] Bytes = BitConverter.GetBytes(value);
        // Bytes[2] = 0x0;
        // Bytes[3] = 0x0;
        // return BitConverter.ToInt32(Bytes, 0);
    }
 
    private static float toFloat(byte[] addr)
    {
        int address;
        address = addr[0] & 0xFF;
        address |= ((addr[1] << 8) & 0xFF00);
        address |= ((addr[2] << 16) & 0xFF0000);
        address |= ((addr[3] << 24) & 0xFF000000);
        return Float.intBitsToFloat(address);
    }
 
    //The following is for testing purpose ================================================================

    private static float[] createTestCase()
    {
        float[] test = new float[1000000];
        for (int i = 0; i < 100 ; i++) 
        {
            test[i] = i;
        }
        for (int i = 100; i < 100000 ; i++) 
        {
            test[i] = Float.NaN;
        }
        for (int i = 100000; i < (100000 + 100) ; i++) 
        {
            test[i] = i;
        }
        for (int i = 200000; i <  800000; i++) 
        {
            test[i] = Float.NaN;
        }
        return test;
    }
    public static void main(String[] args)
    {
        float[] arr = createTestCase();// new float[]{1, 2, 3, 4, 5, Float.NaN, Float.NaN, Float.NaN, Float.NaN, Float.NaN, 6, 7, Float.NaN, Float.NaN, 8};
        float[] compressed = compressArray(arr);
 
        System.out.println("Compressed.length:" + compressed.length);
        float[] decompressed = deCompressArray(compressed);
 
        if(arr.length != decompressed.length)
        {
            System.out.println("Errror...1");
            return;
        }
        for(int i = 0; i < arr.length; i++)
        {
            if(arr[i] != decompressed[i])
            {
                if(Float.isNaN(arr[i]) && Float.isNaN(decompressed[i])) continue;
                System.out.println("Not equal. i=" + i + "; val1:" + arr[i] + "; val2:" + decompressed[i]);
                return;
            }
        }
        saveZip(compressed);
        float [] unzippedFloats = readZip();
        System.out.println( "unzippedFloats.length:" + unzippedFloats.length);
        if(unzippedFloats.length != compressed.length)
        {
            System.out.println( "unzippedFloats.length not matched");
            return;
        }
        for (int i = 0; i < compressed.length ; i++) 
        {
            if(compressed[i] != unzippedFloats[i])
            {
                if(Float.isNaN(compressed[i]) && Float.isNaN(unzippedFloats[i])) continue;
                System.out.println( "Unzipped not matched");
                return;
            }
        }
        System.out.println( "Success");
    }
 
    private static float [] readZip()
    {
        try
        {
            List<float> compressed = new LinkedList<float>();
            // final DataInputStream zip = new DataInputStream(new ZipInputStream(new FileInputStream("test.zip")));

 
            FileInputStream fin = new FileInputStream("test.zip");
            ZipInputStream zin = new ZipInputStream(fin);
            ZipEntry ze = null;
            final DataInputStream zip = new DataInputStream(zin);
            while ((ze = zin.getNextEntry()) != null) 
            {
                System.out.println("Unzipping " + ze.getName());
                while(true)
                {
                    try
                    {
                        float f = zip.readFloat();
                        compressed.add(f);
                    }
                    catch(EOFException ef)
                    {
                        System.out.println( "EOFException ...........");
                        break;
                    }
                }
                zin.closeEntry();
            }
            zin.close();
 
            System.out.println( "compressed.size:" + compressed.size());
            return toArray(compressed);
        }
        catch(Exception e)
        {
            System.out.println( e);
            return null;
        }
    }
    private static void saveZip(float[] unzipped) 
    {
        try
        {
            final DataOutputStream zip = new DataOutputStream(createZipOutputStream(new FileOutputStream("test.zip")));
            for (int i = 0; i < unzipped.length ; i++) 
            {
                zip.writeFloat(unzipped[i]);
            }
            zip.flush();
            zip.close();
        }
        catch(Exception e)
        {
            System.out.println( e);
        }
    }
    private static OutputStream createZipOutputStream(OutputStream zipped) throws IOException
    {
        final ZipOutputStream zip = new ZipOutputStream(zipped);
        zip.putNextEntry(new ZipEntry("data"));
        return zip;
    }
 

}
 

 
</float></float></float></float></float></float></float>

GeneralCurious... PinmemberAndrew Rissing26-Mar-08 7:27 
GeneralRe: Curious... PinmemberGilad Kapelushnik27-Mar-08 7:26 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web02 | 2.8.140922.1 | Last Updated 26 Mar 2008
Article Copyright 2008 by Gilad Kapelushnik
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid