LBP

LBP is the design of a 64-core parallelizing processor implementing the RISCV PISC ISA.
Unlike other manycore processors, the cores in LBP are interconnected to allow sending parallelization control signals as well as copying registers.
The interconnection of the cores has a line topology.
The memory organization is flat (no caches, no virtual memory).
The memory is separated into code, local data/stack and shared global.
Code and local data/stack are placed in local memory banks.
Global data are placed in a set of distributed memory banks with one bank facing each core. Each bank has two access ports, one for local accesses and one for global accesses.
The distributed memory banks are interconnected by a hierarchical set of busses and routers.
The pipeline has 5 stages (fetch, decode, issue, writeback, commit).
The implementation of the parallelizing instructions (p_fc, p_fn, p_swcv/p_lwcv, p_swre/p_lwre, p_jal and p_ret) are shown on hardware schemas.
The LBP processor performance has been evaluated on a matrix multiplication program example. Results are presented as histograms.

64-core processor interconnect pipeline par instr impl LBP performance