Synchronizing memory accesses within a hart

The hart memory accesses placed before the p_syncm machine instruction are run before the ones placed after it.
In the example, the p_syncm instruction avoids that the forked hart reads the transmitted values before they have been written. One of the p_swcv instructions preceding the p_syncm might be delayed, for example because the transmitted register is not yet set. As the execution model is out-of-order, if the p_syncm instruction is missing, a bad scenario would be that the p_jal instruction in the forking hart and the p_lwcv instruction in the forked hart are issued before the matching p_swcv has written. The p_syncm instruction forces the forking hart to wait until all memory accesses are done before it fetches the p_jal.

The p_syncm instructions simplifies the core hardware: memory accesses within a hart are done out-of-order, hence eliminating the classic load/store queue. If in-order issue is necessary, a p_syncm instruction should be inserted.