文献笔记(4)(2018ISSCC 13.3)


文献摘自UNPU: A 50.6TOPS/W Unified Deep Neural Network Accelerator with 1b-to-16b Fully-Variable Weight Bit-Precision

1 英文缩写

  • CL: convolutional layer
  • FCL: fully-connected layer
  • RL: recurrent layer
  • PE: processing element
  • UNPU: unified neural processing unit
  • IF: input feature
  • LBPE: lookup-table-based bit-serial PE
  • AFL: aligned feature loader
  • Psum: 部分和
  • OF: output feature

2 overall architecture

In this paper, we present a unified neural processing unit (UNPU) supporting CLs, RLs, and FCLs with fully-variable weight bit-precision from 1b to 16b.

  • reuse of input feature
  • the lookup-table-based bit serial PE is implement for energy-optimal DNN operations with variable-weight bit-precisions from 1b to 16b.
  • an aligned feature loader (AFL) minimizes the amount of off-chip memory accesses
    在这里插入图片描述
    consist:
  • 4 DNN cores
    • 6 lookup-table-based bit-serial PE
    • 6 aligned feature loader
    • a weight memory
    • an instruction decoder
    • a controller
  • an aggregation(聚类) core; The partial-sums (Psums) calculated by each DNN core are aggregated to an output feature (OF) in the aggregation core
  • a 1D SIMD core
  • a RISC controller: performs the remaining operations, such as non-linear activation or pooling

猜你喜欢

转载自blog.csdn.net/tiaozhanzhe1900/article/details/83141183