Click here to Skip to main content
Rate this: bad
good
Please Sign up or sign in to vote.
See more: C++ Threading Cache , +
Hi!
I have a question about multithreading and ”Relaxed” memory consistency in c++, but I guess it's even apply to c#, java and a lot of other imperative languages. A professor at my university said at a lecture in a course called "parallel programming" that the following two programs,
 
program 1:
x=0           
x=1           
read y  

program2:
y=0
y=1
read x
 

executed in parallel could result in both seeing 0 after the read instruction. This should be because the new value 1 of x and/or y hadn't been written from the cache / memory buffer to the actually shared memory, and therefore both processes would be able to read 0.
 
For me this sounds like a huge problem because how can one then guaranteeing even if locks are used that the shared memory is consistent and up to date? I guess you somehow would need to do as you do with locks, i.e. make sure it's value is written and read directly to and from the shared memory. How can you achieve that or what approach do you recommend.
 
Would be very thankful for an explanation and/or an approach to tackle this problem.
/ WaZoX
Posted 7-Apr-13 6:26am
WaZoX862
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 1

What you say may be completely correct but I believe the problem goes deeper than one of cache coherency.
The issue I think is rather to do with the language specification and the current state of the art in optimising compilers.
The C++ language specification does not, contrary to common assumption, specify that statements must be executed in the order they are written. If fact the compiler is free to rearrange statements at will so long as the result is 'correct' according to the constraints that the specification does impose.
Authoring compiler software is one of the most difficult challenges in computing and authoring an optimising compiler even more so.
Given this, alomost all the optimisers I'm aware of work linearly and most work with a limited scope. In other words they operate within the scope of one function or the functions in one file and they optimise under the assumption of a single thread of execution.
 
In the case you mention this can lead to the 2 piece of code being compiled to machine code equivalent to for example:
 
x=0
read y
x=1
 
and
 
read x
y=0
y=1
 
Each code section on its own is still gauranteed to be correct and produce the same result if it is the only piece of code executing. However if multiple threads of execution are running amok through this code then the result will be highly dependent on the final linearized sequence of instructions that is executed. There are many possible such sequences such as
 
x=0
read y
.
read x
y=0
.
x=1
.
y=1
(dots representing context switches)
 
While I can't immediately see one that would result in both x and y being 0 it would be hard to rule it out given that there are sequences where at least either x or y is undeterminable. This gets even worse when you have truly parallel hardware that can actually do read y and y=0 at the same time. This give many mre possible outcomes and many more of them are as good as random.
 
It may sound like the situation is therefore hopeless and little or nothing can ever be gaurenteed but it is not so. The above code can very simply be made reliable and synchronisation trouble free by not sharing x or y between threads. Making it a design issue rather than a language or compiler problem.
When such sharing is necessary then it can only ever be made safe by using hardware facilities for atomic operations which include memory fences, pipeline flushes and bus locks to ensure consistency at the hardware level. This is where caching may enter the picture but does not present an insoluable problem.
In the past such low level atomic operations have had to be provided by operating system specific libraries pre-built in assembly language because the C++ language specification was independent of the availability of atomic operations at the hardware level and neither required them or relied on them. C++11 however mandates a new atomic operations section of the standard library, effectively requiring underlying hardware locking support in order to use the full language.
  Permalink  
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 2

The professor was correct.
 
The short answer to your question is that the code sequence illustrated needs to turn the operations into atomic operations which, amongst other things, will flush the cache and disable/workaround the instruction reordering as described.
  Permalink  
Comments
Matthew Faithfull at 8-Apr-13 5:01am
   
I wish I could be so concise. My 5.
H.Brydon at 8-Apr-13 10:21am
   
Thanks, and I like your answer - a +5 for you too.
Rate this: bad
good
Please Sign up or sign in to vote.

Solution 3

After some research it turns out this isn't a problem on modern computers. The problem is solved in the hardware implementation of the cache using some of many memory model. It may however affect the performance of the program since the core will need to fetch the data from the shared memory instead of the cache if it has been modified. It may also decrease performance if unrelated data is placed in the same block, e.g. if some variable in the same block is modified which isn't even used by the core, it will still need to load the data from the shared memory again. This is an issue for the compiler though and nothing we generally can do much about. This apply normally to single multicore processor, you may need to be careful if you have more processors, it highly depends on the system.
  Permalink  
v2
Comments
H.Brydon at 17-Apr-13 12:31pm
   
You need to do a little more research. It is not a problem for single threaded programs or for programs where sequence of operations is not an issue (eg. search, load balancing counter). The artifacts you described in the question are features of the C++ memory consistency model, and are a "problem" with multi-process (or multi-thread or multi-fiber) programming and multi-processor hardware. There will be different artifacts in other languages and machine architectures.
WaZoX at 17-Apr-13 14:49pm
   
I can't see that's the case.
My sources are,
http://www.roguewave.com/portals/0/products/threadspotter/docs/2010.4/manual/ch_intro_coherence.html
 
http://en.wikipedia.org/wiki/MESI_protocol
 
http://stackoverflow.com/questions/558848/can-i-force-cache-coherency-on-a-multicore-x86-cpu
 
maybe not very reliable sources but they all speak for that this is solved and not an issue for multicore programs.
 
Plus one book called "Computer organization and design" don't sure about what revision right now but can check if it matter.
 
I'm not saying you must be wrong but it seems anywhere else where I can see, that what I said is the case and that's for multicore programs as well.
 
Could you give me some references to verify that what you're saying is correct?
 
EDIT: I'm taking about Intel's x86 and x64. In general you can't assume this behavior.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

  Print Answers RSS
0 OriginalGriff 325
1 DamithSL 265
2 CPallini 235
3 Sergey Alexandrovich Kryukov 229
4 Maciej Los 190
0 OriginalGriff 5,455
1 DamithSL 4,422
2 Maciej Los 3,860
3 Kornfeld Eliyahu Peter 3,480
4 Sergey Alexandrovich Kryukov 3,010


Advertise | Privacy | Mobile
Web01 | 2.8.141216.1 | Last Updated 17 Apr 2013
Copyright © CodeProject, 1999-2014
All Rights Reserved. Terms of Service
Layout: fixed | fluid

CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100