Argus CNN Accelerator Based on Kernel Clustering and Resource-Aware Pruning

Authors

  • Damjan M. Rakanovic Faculty of Technical Sciences, Department of Power, Electronics and Telecommunications, University of Novi Sad, Serbia
  • Vuk Vranjkovic Faculty of Technical Sciences, Department of Power, Electronics and Telecommunications, University of Novi Sad, Serbia
  • Rastislav J. R. Struharik Faculty of Technical Sciences, Department of Power, Electronics and Telecommunications, University of Novi Sad, Serbia

DOI:

https://doi.org/10.5755/j02.eie.28922

Keywords:

Machine Learning, Accelerator architecture, Convolutional Neural Network pruning, Edge-based computing

Abstract

Paper proposes a two-step Convolutional Neural Network (CNN) pruning algorithm and resource-efficient Field-programmable gate array (FPGA) CNN accelerator named “Argus”. The proposed CNN pruning algorithm first combines similar kernels into clusters, which are then pruned using the same regular pruning pattern. The pruning algorithm is carefully tailored for FPGAs, considering their resource characteristics. Regular sparsity results in high Multiply-accumulate (MAC) efficiency, reducing the amount of logic required to balance workloads among different MAC units. As a result, the Argus accelerator requires about 170 Look-up tables (LUTs) per Digital Signal Processor (DSP) block. This number is close to the average LUT/DPS ratio for various FPGA families, enabling balanced resource utilization when implementing Argus. Benchmarks conducted using Xilinx Zynq Ultrascale + Multi-Processor System-on-Chip (MPSoC) indicate that Argus is achieving up to 25 times higher frames per second than NullHop, 2 and 2.5 times higher than NEURAghe and Snowflake, respectively, and 2 times higher than NVDLA. Argus shows comparable performance to MIT’s Eyeriss v2 and Caffeine, requiring up to 3 times less memory bandwidth and utilizing 4 times fewer DSP blocks, respectively. Besides the absolute performance, Argus has at least 1.3 and 2 times better GOP/s/DSP and GOP/s/Block-RAM (BRAM) ratios, while being competitive in terms of GOP/s/LUT, compared to some of the state-of-the-art solutions.

Downloads

Published

2021-06-28

How to Cite

Rakanovic, D. M., Vranjkovic, V., & Struharik, R. J. R. . (2021). Argus CNN Accelerator Based on Kernel Clustering and Resource-Aware Pruning. Elektronika Ir Elektrotechnika, 27(3), 57-70. https://doi.org/10.5755/j02.eie.28922

Issue

Section

ELECTRONICS