Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / Languages / C#

Using memory mapped files to conserve physical memory for large arrays

4.78/5 (22 votes)
12 Nov 2009LGPL34 min read 1   2.4K  
The article shows how to implement a value type array as a memory mapped file to conserve physical memory.

Introduction

I work with large data structures which are kept in memory for speed, and over the years, I have encountered out of memory exceptions numerous times, especially on 32 bit systems. With 64 bit OS rapidly becoming the standard, we can now get a fast performing disk version of the structures using Memory Mapped Files for storage.

Large structures would, on 32 bit, be 800MB+, and on 64 bit, as much physical memory as you have available. When you move close to those boundaries, .NET is bound to give you an out of memory exception, which in most cases will break your application at unknown places.

Background

I have long thought about creating a disk based version of an array to store my data, but this would require a lot of caching logic to make it perform fast enough compared to physical memory.

A couple of years ago, I stumbled across Memory Mapped Files which has long existed in the Operating Systems, and is typically used in Windows for the swap space. With even more data in my current project and running on a 64 bit platform, the time seemed right to wrap this into a small library.

The last time I used a library from MetalWrench, but this time around, I got hold of Winterdom's much nicer implementation of the Win32 API. I've included the patch from Steve Simpson, but removed the dynamic paging since it slows things down and it's not necessary on 64 bit systems. (If you want to use arrays which hold over 2GB of data on 32 bit systems, I recommend reverting to Steve's original version and setting a view size of 200-500MB.)

The beauty of 64 bit is that you have virtually unlimited address space, so each thread can get its own view of the mapped file without running out of address space. 32 bit Windows can only address 4GB.

As for performance, my theory is that Microsoft has implemented a fairly good caching algorithm for its swap file, so it should prove good enough for me. A few tests show a much better disk IO with the Memory Mapped API than using .NET's file IO library. I haven't tested the performance if you add the SEC_LARGE_PAGES flag, but it might help some.

Requirements

The project is made in Visual Studio 2008 compiled against .NET 3.5, but you should be able to take the files and create projects in VS2005 as well. It should mostly be .NET 2.0 compatible. If I get enough requests, I'll create a VS2005 version.

The class accepts only value types. The reason is that the data is stored as bytes, and to keep track of the offset, TValue needs to be serialized to a defined size.

Implementation

The initial signature of my class looked like this:

C#
public class GenericMemoryMappedArray<TValue> : IDisposable, IEnumerable<TValue>
{
}

The problem I then encountered was that a user could create an array with whatever class which won't serialize to a defined size. To allow this, you would need a key file to store all the offsets in the value files as well. That will be left for my next project, implementing a memory mapped Dictionary.

So I ended up with:

C#
public class GenericMemoryMappedArray<TValue> : IDisposable, IEnumerable<TValue>
where TValue : struct
{
}

which restricts the usage to structs and value types. IDisposable is implemented in order to free up the MMF and delete the file used, and also release the unsafe memory areas allocated for working buffers. The buffers will be the size of the TValue, and calculated in the constructor. The constructor takes the size of the array and where to store the MMF as parameters.

C#
/// <summary>
/// Create a new memory mapped array on disk
/// </summary>
/// <param name="size">The length of the array to allocate</param>
/// <param name="path">The directory where the memory mapped file
///          is to be stored</param>
public GenericMemoryMappedArray(long size, string path)
{
    ...
    // Get the size of TValue
    _dataSize = Marshal.SizeOf(typeof(TValue));

    // Allocate a global buffer for this instance
    _buffer = new byte[_dataSize];

    // Allocate a global unmanaged buffer for this instance
    _memPtr = Marshal.AllocHGlobal(_dataSize);
    ...
}

The generic class has also implemented thread safety, so more than one thread can access the array at the same time. Since each thread has to set the position first, and then read or write, I keep a pool of all threads. .NET reuses internal threads so the pool will not grow very large. Even if it did, it's not a problem on 64 bit due to the large address space available. A timer runs every hour to clean up unused threads.

Here's an example of the Write method which gets the current thread ID, then gets a view of the MMF for that thread, and finally writes the data to the MMF.

C#
public void Write(byte[] buffer)
{
   int threadId = Thread.CurrentThread.ManagedThreadId;
   _lastUsedThread[threadId] = DateTime.UtcNow;
   Stream s = GetView(threadId);
   s.Write(buffer, 0, buffer.Length);
}

The class supports auto growing the array, and has a property for this which defaults to true. Useful for being able to add more data as you go along.

Using the Code

C#
string path = AppDomain.CurrentDomain.BaseDirectory;
var myList = new GenericMemoryMappedArray<int>(1024*1024, path);
using (myList) // automatically dispose the mmf when done
{
    myList.AutoGrow = false;
    try
    {
        myList[1024 * 1024] = 1;
    }
    catch (Exception e)
    {
        Console.WriteLine(e.Message);
        //will give exception
    }
    myList.AutoGrow = true;
    myList[0] = 1;
    myList[1024 * 1024] = 1; // will now increase the file
}

Conclusion

For my needs, memory mapped files has proven to be a good trade off between speed and usage of physical memory. Of course, physical memory is used for caching in the back end, but having the OS work its magic is much better than getting out of memory exceptions from .NET. I've tried with both sequential and random reads/writes, and it works pretty good. Be sure not to resize the array too often, as unmapping will flush the underlying pages. This will have an impact on performance if the structure is constantly used.

The code can also be modified to keep the temp files it uses for permanent storage and not deleting them when the class is disposed.

License

This article, along with any associated source code and files, is licensed under The GNU Lesser General Public License (LGPLv3)