Data Rearrange Unit for Efficient Data Computation

Akiyuki Mamiya, Nobuyuki Yamasaki


Recently, the demand for computation-intensive applications such as multimedia and AI applications has increased. Data-parallel execution units are typically used for calculations in these applications to increase computational throughput. However, the data required for computation may need to be accessed at discontinuous memory addresses, which can reduce computation efficiency.

Generally, normal memory access instructions access data blocks of continuously allocated memory addresses, containing both valid and invalid data for computation. These memory access patterns result in low computation density in the data parallel execution units, wasting computation resources. Therefore, simply increasing the number of data-parallel execution units leads to an increase in wasted computational resources, which will become a significant issue in embedded systems where multiple resource limitations exist. It is essential to improve computational efficiency to perform practical computation in such systems.

This paper introduces a Data Rearrange Unit (DRU), which gathers and rearranges valid computation data between main memory and execution units. The DRU improves the performance of multimedia and AI application by significantly reducing the access rate from/to main memory and increasing computation efficiency. It is applicable to most hardware architectures, and its effectiveness can be further enhanced by the execution unit interface that directly connects the DRU to the execution unit. We demonstrate the effectiveness of our DRU by implementation on the RMTP SoC, improving convolution throughput on a data-parallel execution unit by a maximum of 94 times while only increasing the total cell area by about 12.7%. 


convolutional neural network; data-parallel; data rearrange

Full Text:



  • There are currently no refbacks.