Concurrent execution versus Out-of-Order execution

Concurrent execution of threads

When threads are executed concurrently, they are unordered.
Any thread can produce something consumed by any thread.
A thread t may produce a value consumed by another thread t' which produces another value consumed by t.
The programmer must ensure that the producing instruction is run before the consuming one, whatever the way the OS interleaves the threads. The programmer inserts synchronisation primitives to achieve the required ordering.
In the following figure, thread 0 produces a, which thread 1 consumes. Then thread 1 produces b which thread 0 consumes. Thread 3 produces c which thread 2 consumes. The programmer must synchronize the execution to ensure this order.
This is the semantic assumed by any OpenMP parallel structure.

OoO execution of threads

When threads are executed OoO, they are semantically ordered.
A thread t which precedes a thread t' may be executed in parallel with t' for all their independent parts.
A thread can produce something consumed by any thread after it.
If a thread t produces a value for a thread t', it means that t fully precedes t' and thus thread t' may not produce anything consumed by thread t.
In the following figure, thread 0 semantically precedes thread 1, which semantically precedes thread 2, which semantically precedes thread 3.
Threads 0 to 3 form the decomposition of an application into successive pieces (thread 0 to thread 3 in this semantical order). The OoO execution of the four threads has the same semantic as their serial execution.
For any valid OoO execution, the two RAW events concerning a and b should synchronize the read after the write (e.g. thread 1 reads a after thread 0 has written it).
When all RAW dependencies are preserved, any OoO execution is deterministic and produces the same output than the serial execution.