文献摘自UNPU: A 50.6TOPS/W Unified Deep Neural Network Accelerator with 1b-to-16b Fully-Variable Weight Bit-Precision
1 英文缩写
- CL: convolutional layer
- FCL: fully-connected layer
- RL: recurrent layer
- PE: processing element
- UNPU: unified neural processing unit
- IF: input feature
- LBPE: lookup-table-based bit-serial PE
- AFL: aligned feature loader
- Psum: 部分和
- OF: output feature
2 overall architecture
In this paper, we present a unified neural processing unit (UNPU) supporting CLs, RLs, and FCLs with fully-variable weight bit-precision from 1b to 16b.
- reuse of input feature
- the lookup-table-based bit serial PE is implement for energy-optimal DNN operations with variable-weight bit-precisions from 1b to 16b.
- an aligned feature loader (AFL) minimizes the amount of off-chip memory accesses
consist: - 4 DNN cores
-
- 6 lookup-table-based bit-serial PE
-
- 6 aligned feature loader
-
- a weight memory
-
- an instruction decoder
-
- a controller
- an aggregation(聚类) core; The partial-sums (Psums) calculated by each DNN core are aggregated to an output feature (OF) in the aggregation core
- a 1D SIMD core
- a RISC controller: performs the remaining operations, such as non-linear activation or pooling