Click here to Skip to main content
13,900,557 members
Click here to Skip to main content
Add your own
alternative version

Tagged as

Stats

1.1K views
Posted 16 Mar 2019
Licenced MIT

L1 Cache Lines

, 16 Mar 2019
Rate this:
Please Sign up or sign in to vote.
L1 cache lines

Herb Sutter gave an interesting talk titled Machine Architecture: Things Your Programming Language Never Told You:

In it, he gives a tiny yet very interesting example of code that illustrates hardware destructive interference: how the size of L1 cache line and improper data layout can negatively affect performance of your code.
The example program allocates two ints on the heap one right next to the other. It then starts two threads; each thread reads and writes to one of the ints location. Let’s do just that, 100’000’000 times and see how long it takes:

Duration: 4338.55 ms

Let us now do the same exact thing, except this time we’ll space the ints apart, by… you guessed it, L1 cache line size (64 bytes on Intel and AMD x64 chips):

Duration: 1219.50 ms

The same code now runs 4 times faster. What happens is that the L1 caches of the CPU cores no longer have to be synchronized every time we write to a memory location.

The lesson here is that data layout in memory matters. If you must run multiple threads that perform work and write to memory locations, make sure those memory locations are separated by L1 cache line size. C++17 helps us with that: Hardware Constructive and Destructive Interference Size.

Complete Listing

#include <iostream>
#include <thread>
#include <chrono>
#include <cstdlib>

using namespace std;
using namespace chrono;

const int CACHE_LINE_SIZE = 64;//sizeof(int);
const int SIZE = CACHE_LINE_SIZE / sizeof(int) + 1;
const int COUNT = 100'000'000;

int main(int argc, char** argv)
{
    srand((unsigned int)time(NULL));

    int* p = new int [SIZE];

    auto proc = [](int* data) {
        for(int i = 0; i < COUNT; ++i)
            *data = *data + rand();
    };
    
    auto start_time = high_resolution_clock::now();
    
    std::thread t1(proc, &p[0]);
    std::thread t2(proc, &p[SIZE - 1]);
    
    t1.join();
    t2.join();
    
    auto end_time = high_resolution_clock::now();
    cout << "Duration: " << duration_cast<microseconds>
             (end_time - start_time).count() / 1000.f << " ms" << endl;

    return 1;
}

License

This article, along with any associated source code and files, is licensed under The MIT License

Share

About the Author

Martin Vorbrodt
Software Developer (Senior)
United States United States
No Biography provided

You may also be interested in...

Comments and Discussions

 
-- There are no messages in this forum --
Permalink | Advertise | Privacy | Cookies | Terms of Use | Mobile
Web02 | 2.8.190306.1 | Last Updated 16 Mar 2019
Article Copyright 2019 by Martin Vorbrodt
Everything else Copyright © CodeProject, 1999-2019
Layout: fixed | fluid