Download memo.zip - 8.1 KB

Introduction

Memoization is an optimization technique where we store the results of expensive function calls and return the cached results when the same inputs occur again (https://en.wikipedia.org/wiki/Memoization). It was first described by Donald Michie in his 1968 article titled “Memo” Functions and Machine Learning in the Nature magazine. He called it a rote-learning facility that allows programs to improve their own efficiency during execution.

While dynamic programming is the oft-cited application for memoization, there are many others that can benefit from this technique, such as distributed computing, mathematical modeling, financial portfolio valuations etc. As is usually the case with such techniques, the more you use memoization, the more likely you are to identify other applications for it. In this article, I present a generic and modular memoizer that can be configured to suit the needs of the particular application.

Background

The memoizer presented in this article makes extensive use of modern C++ features such as variadic templates, rvalue-references, universal references, perfect forwarding and return-type deduction. Familiarity with these concepts would help in understanding the code samples.

The Components of a Memoizer

Before we can begin to design a modular memoizer, it helps to think about the components of a memoizer. A memoizer has two parts:

1. Memoizing function: This is a higher-order function that takes another function as an argument, evaluates it and stores its results in a cache.

2. Results cache: This is where the memoizing function stores the results.

Even though memoization is widely used, most memoizer implementations are all-or-none, featuring tight-coupling between the memoizing function and the the results cache. They have hard-coded dependencies on the cache type, offer limited ability to configure the behavior of the cache and make pessistic choices about data structures in an attempt to be generic. This limits their applicability to a small range of problems and leads to programmers re-implementing memoizers for each new set of applications. As users, we either don't get what we want, or end up paying (in terms of space/time complexity) for something we don't need or use. Modular or loosely-coupled designs, on the other hand, promote code re-use and make it easy to adapt to new applications as necessary.

The Memoizing Function

Let's start by thinking about what we want out of the memoizing function.

1. Generic i.e. accept any callable object: We should be able to memoize any callable object, such as free-functions, member functions, function objects, lambdas etc.

2. Preserve types: We should not be required to wrap the functions in a type-erased interface such as std::function. Type-erased interfaces use virtual functions to delegate calls to the underlying objects. In additon to adding an extra level of indirection, virtual functions prevent inlining. While there certainly are situations where we need to use a type-erased interface (such as when memoizing recursive functions), we shouldn't be forced to do so.

3. Non-intrusive: We should be able to memoize a function without having to modify it. However, we need to make an exception for recursive functions as we will see later.

4. No hard-coded dependencies: This goes without saying. After all, we are talking about a loosely-coupled memoizer.

Now, let's look at a memoizing function that satisfies all of these requirements.

namespace memo {

template<typename CacheType, typename CallableType, typename... Args>
auto invoke(CacheType&& cache, CallableType&& fn, Args&&... args)
{
    auto&& pY = cache.lookup(std::tie(args...));
    if (!!pY) {
        return *pY;
    }

    auto&& y = fn(std::forward<Args>(args)...);
    cache.remember(std::tie(args...), std::forward<decltype(y)>(y));
    return y;
}

template<typename CacheType, typename CallableType>
class Memoizer {
   
public:
    template<typename C, typename... FnArgs>
    explicit Memoizer(C&& cache, FnArgs&&... args)
    : cache_(std::forward<C>(cache))
    , fn_(std::forward<FnArgs>(args)...)
    {
    }
    
    template<typename... Args>
    auto operator()(Args&&... args) const
    {
        return memo::invoke(cache_, fn_, std::forward<Args>(args)...);
    }

private:
    mutable CacheType cache_;
    mutable CallableType fn_;
};

}

There's a lot going on here. Let's break it down.

1. Firstly, memo::invoke is the memozing function - it evaluates the given function and stores its results in the cache. It is parameterized by CallableType and CacheType - so no hard-coded dependencies.

2. The return type of the CacheType's lookup method must allow testing for validity using operator!() and should be dereferenceable. Pointers, including smart pointers, exhibit this behavior, as does std::optional.

3. The Memoizer template is a generator for function decorators that ties together the memoizing function and the results cache. Applying this decorator to a function gives us its memoized version.

The Results Cache

Now, let's think about what we want out of the results cache. But before we begin, let's take look at a couple of basic cache types that adapt std::map-like data structures to work with our memoizer.

namespace memo {
namespace cache {

struct construct_in_place_t { };

template<typename Container>
class AssociativeContainerAdapter {

public:
    AssociativeContainerAdapter()
    {
    }

    template<typename... ConsArgs>
    explicit AssociativeContainerAdapter(construct_in_place_t, ConsArgs&&... args)
    : data_(std::forward<ConsArgs>(args)...)
    {
    }

    template<typename K>
    auto lookup(K&& key) const
    {
        auto iter = data_.find(std::forward<K>(key));
        if (iter == data_.end()) {
            return (decltype(&iter->second))nullptr;
        }

        return &iter->second;
    }

    template<typename K, typename V>
    void remember(K&& key, V&& value)
    {
        data_.emplace(std::forward<K>(key), std::forward<V>(value));
    }

private:
    Container data_;
};

template<typename K, typename V, typename Comparator = std::less<K> >
using BasicOrderedCache = AssociativeContainerAdapter<std::map<K, V, Comparator> >;

template<typename K, typename V, typename Hash = std::hash<K>, typename KeyEqual = std::equal_to<K> >
using BasicUnorderedCache = AssociativeContainerAdapter<std::unordered_map<K, V, Hash, KeyEqual> >;

}
}

These caches will work just fine for many applications. They also support custom comparators, meaning we can define our own equivalence relationships. This is especially useful when working with floating-point numbers, where we need to use "fuzzy" comparisons. The listing below shows a fuzzy less-than comparator.

struct FuzzyLess {

    explicit FuzzyLess(double tolerance)
    : tolerance_(tolerance)
    {
    }

    bool operator()(const std::tuple<double>& arg1, const std::tuple<double>& arg2) const
    {
        return std::get<0>(arg1) < (std::get<0>(arg2) - tolerance_);
    }

private:
    double tolerance_;
};

There are, however, applications where we need specialized caches. Let's take a look at some of the considerations involved in choosing the optimal cache type for such scenarios.

1. Limited resources: Applications need to be able to work with a variety of resource limitations. Memory is getting cheaper but it is still a limited resource. While it would be great to be able to store the result of every invocation, it might not be feasible due to available memory and the size of the results. If memory usage is a concern, we will need an evicting cache, such as an LRU cache.

2. Location of the cache: Does it have to be an in-memory cache, or do we need a remote cache? An in-memory cache works very well for most applications. However, consider a distributed computing grid with many worker nodes. If the tasks performed on the workers share common sub-problems that are expensive to compute, we should memoize them. But using an in-memory cache would mean that the rest of the workers won't be able access the memoized results. A shared remote cache, on the other hand, will allow the workers to benefit from each other's work. And remember that there are any number of vendors that provide remote caching solutions. Do we want to be able to switch vendors with minimal changes? Or perhaps we'd like to simuate cache failures and study the impact on the overall throughput of the computing grid.

3. Thread-safety: Do we need a synchronized data structure? Yes, if the memoized function is invoked from multiple threads. However, we shouldn't have to pay the cost in a single-threaded application. The following listing shows a synchronized adapter for std::map-like data structures.

namespace memo {
namespace cache {

namespace detail {

template<typename T, typename Mutex>
class ContainerLockGuard {
    // on a cache miss, this allows us to hold the lock until the
    // result is memoized.

public:
    ContainerLockGuard(const T *ptr, Mutex *mutex) throw()
    : ptr_(ptr)
    , mutex_(mutex)
    {
    }

    ContainerLockGuard(const ContainerLockGuard&) = delete;
    ContainerLockGuard& operator=(const ContainerLockGuard&) = delete;

    ContainerLockGuard(ContainerLockGuard&& that)
    : ptr_(nullptr)
    , mutex_(nullptr)
    {
        std::swap(ptr_, that.ptr_);
        std::swap(mutex_, that.mutex_);
    }

    ~ContainerLockGuard()
    {
        if (mutex_) {
            mutex_->unlock();
        }
    }

    bool operator!() const
    {
        return !ptr_;
    }

    const T& operator*()
    {
        return *ptr_;
    }  

private:
    const T *ptr_;
    Mutex *mutex_;
};

}

template<typename Container, typename Mutex = std::recursive_mutex>
class SynchronizedContainerAdapter {

    typedef typename Container::mapped_type MappedType;
    friend class detail::ContainerLockGuard<MappedType, Mutex>;

public:
    SynchronizedContainerAdapter()
    {
    }

    template<typename... ConsArgs>
    explicit SynchronizedContainerAdapter(construct_in_place_t, ConsArgs&&... args)
    : data_(std::forward<ConsArgs>(args)...)
    {
    }

    template<typename K>
    auto lookup(K&& key) const
    {
        std::unique_lock<Mutex> lock(*mutex_);
        auto iter = data_.find(std::forward<K>(key));
        const MappedType *ptr = nullptr;
        if (iter != data_.end()) {
            ptr = &iter->second;
        }

        lock.release(); // dissociates from the mutex without unlocking
        return detail::ContainerLockGuard<MappedType, Mutex>(ptr, mutex_.get());
    }

    template<typename K, typename V>
    void remember(K&& key, V&& value)
    {
        std::unique_lock<Mutex> lock(*mutex_);
        data_.emplace(std::forward<K>(key), std::forward<V>(value));
    }

private:
    Container data_;
    mutable std::unique_ptr<Mutex> mutex_ = std::make_unique<Mutex>();
};

}
}

template<typename K, typename V, typename Comparator = std::less<K> >
using SynchronizedOrderedCache = SynchronizedContainerAdapter<std::map<K, V, Comparator> >;

template<typename K, typename V, typename Hash = std::hash<K>, typename KeyEqual = std::equal_to<K> >
using SynchronizedUnorderedCache = SynchronizedContainerAdapter<std::unordered_map<K, V, Hash, KeyEqual> >

The ContainerLockGuard ensures that on a cache-miss, the memoizer continues to hold the lock until the function is evaluated and the results stored. Failing to do so could lead to race conditions where multiple threads evaluate the function for the same inputs.

4. Exploiting apriori knowledge: How much do we know about the function and can we use that knowledge to choose an optimal caching strategy? For example, if the valid inputs belong to a small range of integers, we can use an array to get constant-time insertions and lookups. If we know the derivative of the function, we can use a custom comparator to avoid re-calculating the function for (application-defined) negligible changes in inputs, effectively filtering out "noise".

In a tightly-coupled design, supporting all of these use-cases would result in an explosion of configuration options. Also, supporting a new use-case would require extensive changes to the memoizer's implementation. Clearly, such a design is brittle, error-prone and does not scale well. We get limited ability to configure cache behavior and/or are forced to use pessimistic data structures (for example, always using a synchronized data structure). A modular memoizer, on the other hand, frees the library writer from having to anticipate every possible use-case, and gives library users the freedom to configure it to suit the particular application.

Putting The Pieces Together

It's time to put the pieces together. The Memoizer does not erase types, meaning the template needs to be instantiated with the type of the function to be memoized. This can get complicated really fast, so there is a helper function (memoize) that deduces the type, and decorates the function and returns its memoized version.

namespace memo {

template<typename CacheType, typename CallableType>
auto memoize(CacheType&& cache, CallableType&& f)
{
    typedef Memoizer<std::remove_reference_t<CacheType>,
                     std::remove_reference_t<CallableType> > MemoizerType;
    return MemoizerType(std::forward<CacheType>(cache), std::forward<CallableType>(f));
}

}

Now let's see how to use the memoizer. The listing below shows multiple implementations of a simple logistic function - a free-function, a lambda, and a user-defined callable type - that are memoized with a BasicOrderedCache to store the results. Because we are dealing with floating-point numbers, the cache is instantiated with a fuzzy less-than comparator.

double logisticFn(double x)
{
    return 1.0 / (1.0 + std::exp(-x));
}

struct LogisticFn {

    double operator()(double x) const
    {
        return 1.0 / (1.0 + std::exp(-x));
    }

    double evaluate(double x) const
    {
        return operator()(x);
    }
};

void memoizer_test()
{
    using namespace memo;
    using namespace memo::cache;

    // there is a single input of type double, and the result type is double
    typedef BasicOrderedCache<std::tuple<double>, double, FuzzyLess> CacheType;
    auto cache = CacheType(cache::construct_in_place_t(), FuzzyLess(1e-6));
    
    // memoize a free-function
    auto logistic1 = memo::memoize(cache, &logisticFn);
    std::cout << logistic1(5.0) << std::endl;

    // memoize a lambda
    auto logistic2 = memo::memoize(cache,
                                    [](double x) -> double {
                                        return 1.0 / (1.0 + std::exp(-x));
                                    });
    std::cout << logistic2(5.0) << std::endl;

    // memoize a functor (std::function)
    auto logistic3 = memo::memoize(cache, std::function<double(double)>(logisticFn));
    std::cout << logistic3(5.0) << std::endl;

    // memoize a user-defined callable object
    auto logistic4 = memo::memoize(cache, LogisticFn());
    std::cout << logistic4(5.0) << std::endl;

    // memoize a member that is not a call operator
    auto logistic5 = memo::memoize(cache, std::bind(&LogisticFn::evaluate, 
                                                     LogisticFn(), std::placeholders::_1));
    std::cout << logistic5(5.0) << std::endl;
}

Good so far. However, types with overloaded methods create an interesting situation. Let's look at an example before we get into the details.

struct LogisticFn {

    double operator()(double x) const
    {
        return 1.0 / (1.0 + std::exp(-x));
    }

    double operator()(float x) const
    {
        return 1.0 / (1.0 + std::exp(-x));
    }
};

void memoizerTest()
{
    using namespace memo;
    using namespace memo::cache;

    typedef BasicOrderedCache<std::tuple<double>, double, FuzzyLess> CacheType;
    auto cache = CacheType(construct_in_place_t(), FuzzyLess(1e-6));

    // case 1: selects a different overload based on the type of the argument
    auto logistic1 = memo::memoize(cache, LogisticFn());
    std::cout << logistic1(5.0) << std::endl; // calls double operator()(double)
    std::cout << logistic1(5.0f) << std::endl; // calls double operator()(float)

    // case 2: always calls double operator()(double)
    auto logistic2 = memo::memoize(cache,
                        std::bind((double (LogisticFn::*)(double) const)&LogisticFn::operator(),
                                  LogisticFn(), std::placeholders::_1));
    std::cout << logistic2(5.0) << std::endl; // calls double operator()(double)
    std::cout << logistic2(5.0f) << std::endl; // calls double operator()(double)
}

In the first case, the overload that is called depends on the types of the arguments. The code compiles because the overloads have the same return type (if not, return type deduction would have failed in the Memoizer), and float is implicitly convertible to the type of the input in the cache (double). But we don't have to leave it to the compiler. The second case shows how to select a specific overload by casting the operator()().

Memoizing Recursive Functions

Ok, we have a memoizer that is generic and modular. Are we all done? Not so fast, because Memoizer does not work for recursive functions. To see why, consider the following example where we attempt to memoize a recursive fibonacci function.

int fibonacci(int n)
{
    if (n == 0 || n == 1) {
        return 1;
    }
    
    return fibonacci(n - 1) + fibonacci(n - 2);
}

void recursiveTest()
{
    memo::cache::BasicOrderedCache<std::tuple<int>, int> cache;
    auto memFibonacci = memo::memoize(cache, &fibonacci);
    memFibonacci(5);
}

Because the memoizer is non-intrusive, fibonacci knows nothing about it. When it makes the recursive call, it is calling itself and not its memoized version. We still get the right answer, but without the benefit of memoization.

The way to solve this is through the use of a Y-combinator. A Y-combinator generalizes recursion and can be used to implement recursive functions in languages that don't support recursion. The listing below shows the implementation of the memoizer for recursive functions.

namespace memo {

template<int N>
struct placeholder {};

}

template<int N>
struct std::is_placeholder<memo::placeholder<N> > : public std::integral_constant<int, N> {
};

namespace memo {

template<typename CacheType, typename Signature>
class RecursiveMemoizer;

template<typename CacheType, typename R, typename... Args>
class RecursiveMemoizer<CacheType, R(Args...)> {

public:
    template<typename C, typename F>
    RecursiveMemoizer(C&& cache, F&& fn)
    : cache_(std::forward<C>(cache))
    , fn_(std::forward<F>(fn))
    {
        bindThis(std::make_index_sequence<sizeof...(Args)>());
    }

    RecursiveMemoizer(RecursiveMemoizer&& that)
    : cache_(std::move(that.cache_))
    , fn_(std::move(that.fn_))
    {
        bindThis(std::make_index_sequence<sizeof...(Args)>());
        that.this_ = decltype(that.this_)();
    }

    RecursiveMemoizer& operator=(RecursiveMemoizer&& that)
    {
        if (this != &that) {
            cache_ = std::move(that.cache_);
            fn_ = std::move(that.fn_);

            bindThis(std::make_index_sequence<sizeof...(Args)>());
            that.this_ = decltype(that.this_)();
        }
        return *this;
    }

<span id="cke_bm_97E" style="display: none;"> </span>    RecursiveMemoizer(const RecursiveMemoizer&) = delete;
    RecursiveMemoizer& operator=(const RecursiveMemoizer&) = delete;

    R operator()(Args... args)
    {
        auto&& pY = cache_.lookup(std::tie(args...));
        if (!!pY) {
            return *pY;
        }

        R&& y = fn_(this_, args...);
        cache_.remember(std::tie(args...), std::forward<R>(y));
        return y;
    }

private:
    template<std::size_t... Is>
    void bindThis(std::index_sequence<Is...>)
    {
        this_ = std::bind(&RecursiveMemoizer::operator(), this, memo::placeholder<Is + 1>()...);
    }

private:
    CacheType cache_;
    std::function<R(std::function<R(Args...)>&, Args...)> fn_;
    std::function<R(Args...)> this_;
};

When memoizing recursive functions, the function needs to know about its memoized version and vice-versa. We cannot have a cyclic-dependency between types at compile-time. The RecursiveMemoizer uses a type-erased interface (std::function) to break the cycle. The listing below shows how to memoize a recursive fibonacci function.

int fibonacci(std::function<int(int)>& self, int n)
{
    if (n == 0 || n == 1) {
        return 1;
    }
    
    return self(n - 1) + self(n - 2);
}

void recursiveTest()
{
    memo::cache::BasicOrderedCache<std::tuple<int>, int> cache;
    memo::RecursiveMemoizer<decltype(cache), int(int)> memFibonacci(cache, &fibonacci);
    std::cout << memFibonacci(15) << std::endl;
}

Pre-requisites for Memoization

When we memoize a function, we are implicitly making the following assumptions about it:

First, that given the same inputs, the function returns the same value everytime i.e. there are no "hidden" inputs such as global variables etc.
Second, that the return value fully encapsulates the effect of invoking that function i.e. there are no side-effects

A function that satisfies these requirements is called a pure function. If a function is not pure, its memoization is invalid. Particular care must be taken when memoizing member functions. Non-static member functions have an implicit this argument and changes to the state of the object might invalidate the memoized results.

Points of Interest

Tightly-coupled memoizers cannot be easily adapted to new applications. They also make pessimistic choices about data structures in order to be generally applicable.
A modular memoizer de-couples the cache type from the memoizing function, and can be configured to meet the needs of the particular application.
C++ templates allow us to write a generic memoizer without requiring the use of type-erased interfaces (except in the case of recursive functions).
Recursive functions can be memoized through the use of a Y-combinator.