DIY String Interning

Rob Philpott

Rate me:

4.95/5 (10 votes)

25 Oct 2018CPOL2 min read

8.2K

Saving time and memory reading repetitive data

A Quick Performance Trick

Take a look at this peculiar thing:

public class StringCache
{
	private readonly Dictionary<string, string> _cache;

	public StringCache()
	{
		_cache = new Dictionary<string, string>();
	}

	public virtual string Get(string value)
	{
		if (value == null) return null;

		string result;
		if (!_cache.TryGetValue(value, out result))
		{
			result = value;
			_cache[result] = result;
		}
		return result;
	}
}

As you can see, it has a Dictionary mapping one string to another with both key and value being identical, which seems fairly pointless. But it is useful, as when you call the Get method, it will either return a reference to what you supply if it's not seen it before, or a reference to a previous instance of the same thing if it has.

I created this as an experiment to use in the Data Access Layer of a project I was working on. ADO was calling a stored procedure which was returning a lot of rows, and often I would see the same string value appearing for the same field over and over again. Millions of times in some cases. You might think this is indicative of a less than perfectly normalised database, to which I would reply that it's a 'real world' example. :)

Anyway, I figured I could save some memory by using this - lots of references to the same string instance rather than lots of references to individual instances of the same string. At the start of the read loop, I'd create one of these, pass the strings through it and let it drop out of scope at the end of the loop.

It reduced the memory footprint of the service nicely, but I was expecting to take a hit in the overall read time, after all, each string needs a hashcode calculating and some dictionary 'stuff' doing additionally. In fact, it speeded the read up. Not really sure why, there's more to do but perhaps there was room for that based on the speed of the IO being the bottleneck. That could only make it the same speed, not faster, so I assume it's something to do with garbage collection.

Anyway who cares. I've retrofitted these little caches in lots of DAL functions which return endless records and the result is faster times and less memory consumption.

Simple, and a little bit beautiful.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Written By

Rob Philpott

Architect

United Kingdom

I am a .NET architect/developer based in London working mostly on financial trading systems. My love of computers started at an early age with BASIC on a 3KB VIC20 and progressed onto a 32KB BBC Micro using BASIC and 6502 assembly language. From there I moved on to the blisteringly fast Acorn Archimedes using BASIC and ARM assembly.

I started developing with C++ since 1990, where it was introduced to me in my first year studying for a Computer Science degree at the University of Nottingham. I started professionally with Visual C++ version 1.51 in 1993.

I moved over to C# and .NET in early 2004 after a long period of denial that anything could improve upon C++.

Recently I did a bit of work in my old language of C++ and I now realise that frankly, it's a total pain in the arse.

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

DIY String Interning

A Quick Performance Trick

License

Comments and Discussions