Merging the next hart with the join hart

The p_merge t0, t0, t6 instruction is used to merge the next hart identification into register t0 holding the join hart identification. The join hart is unmodified in t0 and occupies the upper half (bits 30-16). The next hart, i.e. the hart allocated by the previous p_fc t6 or p_fn t6 instruction, is inserted at the lower half of t0 (bits 15-0). The most significant bit of t0 is set (bit 31) to indicate that the next call will be a forking one (p_jal). The p_merge t0, t0, t6 instruction is to be placed between a forking instruction (p_fc t6 or p_fn t6) and a parallelized call one p_jal, after the transmission of t0 with a p_swcv t6, t0, 4 instruction.

The next hart identification is used by the return instruction (p_jalr zero, ra, t0) to send a join signal to its successor.
Any hart except the first one may end only after it has received the join signal from its predecessor.
This ensures in order hart termination, which ensures correct join synchronizations (all the harts forming a parallel block should be finished before the run resumes at the join point).

In the following example, the first p_merge t0, t0, t6 in core 0, hart 0 places t6 lower half (core 0, hart 1) into register t0. When f returns (instruction p_jalr zero, ra, t0), the join signal is sent to core 0, hart 1.

A p_merge t0, t0, t6 instruction is used each time there is a forking p_fc t6 or p_fn t6 instruction.