p_fc or p_fn ?

Deterministic OpenMP fills the computing resources in order to favour performance.
Forking on a new core favours true parallelism and forking on the same core favours latency hiding.

Assume the following numbers to define the set of cores and harts in a manycore processor:
  • play_arrow c is the number of cores,
  • play_arrow h is the number of harts,
  • play_arrow f is the minimum number of active harts required to fetch every cycle (f < h).
Deterministic OpenMP applies the following rule where a is the number of harts allocations to be done in the parallel region (a is assumed to be less than c*h):
if a < f*c, repeat f-1 times p_fc, then p_fn until a harts have been allocated
else if (f+k)*c <= a < (f+k+1)*c, with 0 <= k < h-f, repeat f+k times p_fc, then p_fn until a harts have been allocated.

For example, a 4-core LBP processor has c=4, h=4 and f=2. If a is 10, the pattern is to allocate two harts on the same core (two successive p_fc) and the third one on the next core (one p_fn) and continue until all a harts are filled.