Scaling an augmented RISC-V processor design with high-level synthesis.
Keywords
Abstract
Motivated by major applications in machine learning, novel hardware architectures are increasingly implementing on-chip accelerators. The open architecture of the RISC-V ISA is suited to match this current trend by supporting the flexibility to add instructions for domain-specific architectures. Also, recent advances in tools for high-level synthesis reduce the required effort for the hardware design process significantly. Furthermore, program transformations rewriting a given computer code with some specified aim are a powerful software technique which, today, have not benefitted to a large extent from hardware support. The goal of this note is to analyze the potential of bringing together a custom scalable RISC-V processor design written for high-level synthesis with a particular type of program transformation on individual elementary operations. In these transformations, each scalar operation is augmented with a so-called transformed operation changing the semantics of the given program. An augmented RV32I processor design is introduced that implements not only the original operation but also its transformed operation in hardware. The new design is simulated for the AMD Alveo U50 Data Center Accelerator Card. For each scalar operation given in the original program, it enables the computation of 63 scalar transformed operations in parallel with a lower bound on the speedup of roughly 13.