Introduction to Cyclone device resource structure knowledge for FPGA learning

Introduction to Cyclone device resource structure knowledge for FPGA learning

Because next week will be RAM core call and system design learning, at the end of this week, I will refer to the Cyclone IV device manual to learn some basic knowledge of FPGA devices (the devices studied in this article are Cyclone IV E series devices), and then become familiar with device resource definitions and Design attention to the use details.

https://blog.csdn.net/sinat_41653350/article/details/103955348
also mentioned the hardware resources of FPGA devices, but what I mentioned is only one-sided and basic, which is not very friendly for beginners. Understandable.

Device naming

Insert picture description here

The overall naming composition: device series + device type (whether it contains high-speed serial transceivers) + the number of LE logic units + package type + the number of high-speed serial transceivers (if not, do not write) + the number of pins + the normal operating temperature of the device Range + speed grade of the device + suffix

Percentage of device resources

Insert picture description here

The above table comes from the Cyclone IV E device manual, which describes the resources and structure of this series of devices:

Sort by table as follows:

1. Logic Unit (LE)
2. Embedded Memory (Kbits)
3. Embedded 18x18 Multiplier
4. General PLL
5. Global Clock Network
6. User I/O Block
7. Maximum User I/O Block: Note: List The user I/O pins in 6 include all general-purpose I/O pins, dedicated clock pins, and dual-purpose configuration pins. Transceiver pins and dedicated configuration pins are not included in this pin list.

The following table shows the resources occupied by Cyclone IV GX series devices for comparison

Insert picture description here

FPGA architecture

If we want to get started with FPGA design and subsequent timing analysis, we must understand the internal workings of the FPGA device.

Why is FPGA a hardware description language? Because the internal structure of FPGA is actually like a PCB board, and the logic array of FPGA is like some discrete components on the PCB board. The PCB connects signals with relevant electrical characteristics through wires. , FPGA also needs to turn on related logic nodes through internal wiring, so FPGA design is essentially a circuit design.

First of all, let’s introduce the meanings and circuits represented by these kinds of resources, right?

1. Logic Unit (LE)

The logic unit (LE) is the smallest logic unit in Cyclone IV devices. The characteristics of the logic unit are shown in the figure below:
Insert picture description here

The core architecture of the logic unit proposed in the figure above is actually a logic unit composed of a four-port lookup table (LUT), which can implement any function of the four variables. (The first feature in the figure above). The following figure shows the specific structure of the logic unit LE.

Insert picture description here

The logic unit has two working modes in the device, namely the normal mode and the algorithm mode. The Quartus II software automatically selects the applicable mode for common functions, such as counter, adder, subtractor and arithmetic functions, and parameterized functions such as parameterization. Module library (LPM) functions together. The figure below is the composition structure of the logic unit LE working in normal mode, which is basically the same as the features introduced in the figure above.

Insert picture description here

Because the logic unit (LE) is the smallest logic unit in Cyclone IV devices, we can combine multiple logic units together. This is the so-called logic array module (LAB). Not much nonsense, directly above the picture, this article Basically, it is to learn from the device manual, with many pictures.

Insert picture description here

Insert picture description here

From the above figure, it can be concluded that the logic array composed of logic elements (LE) is structured and distributed in the device. There are a total of 8 control signals for the logic array (two clocks, two clock enable, two asynchronous clearing, and one synchronous clearing). Zero and one synchronous loading).

Please remember the structural layout of the logic array. The structure diagram of each resource in the device will be summarized later, and then the resource distribution diagram of the rectified FPGA device will be formed.

2. Embedded memory (Kbits)

The embedded memory structure is composed of a series of M9K memory modules. By configuring these M9K memory modules, various memory functions can be realized, such as RAM, shift register, ROM and FIFO buffer.

Insert picture description here

It can be seen from the above table that the M9K memory module supports a maximum of 8192 storage bits (including parity bits, each module has a total of 9216 bits)

Insert picture description here
In the above figure, it is mentioned that M9K memory has independent read enable (rden) and write enable (wren signal) for each port.

Among them, in packed mode, the M9K memory module can be divided into two 4.5K single-port RAMs.

How to configure which memory mode (RAM, shift register, ROM and FIFO buffer), will be introduced in detail in the use of the IP core next week.

3. Embedded multiplier

Insert picture description here
The figure above shows an embedded multiplier column and adjacent logic array module (LAB). The multiplier can be configured as an 18 18 multiplier, or configured as two 9 9 multipliers, for those greater than 18*18 For the multiplication operation, the Quartus II software will cascade multiple multipliers together. Although there is no limitation on the data bit width of the multiplier, the larger the data bit width, the slower the multiplication operation will be.

Among them, the M9K memory module can be used as a look-up table (LUT) to realize a soft multiplier. The partial result of the product of the input data and the coefficient product is stored in the LUT, which can be used as a high-performance application of DSP, realizing a high-performance soft multiplier with variable depth and width, and increasing the number of available multipliers of the device.

Insert picture description here

The figure above shows the architecture of the multiplier module, and it is found that there are three components:

1. Multiplier stage
2. Input and output registers (the previous articles mentioned why they are needed)
3. Input and output interfaces.

The above figure also clearly shows the operating mode of the multiplier.

The input signal of the multiplier passes through the input register, or is directly connected to the internal multiplier stage in the form of 9bit or 18bit, so whether the input data passes through the register can be set separately, and the output register is still the same.

Each operand of the multiplier stage is a unique signed or unsigned value. The signa and signb control the input of the multiplier and determine whether the value is signed or unsigned. When the signa and signb signals are not used, the Quartus II software will set the multiplier to unsigned multiplication by default, as shown in the following table .

Insert picture description here

4. General PLL

Because of the long-term use of E series devices, the concept of clock network is not introduced in detail, but the module position of PLL is proposed in the GX series.

Insert picture description here

From the figure above, it can be seen that the 4 phase-locked loop PLLs are distributed around the device. When the external reference clock passes through the top or bottom of the device and the clock pins on the right side, after entering the FPGA device, it can quickly enter the PLL module , To achieve the shortest clock source path, thereby generating higher generated clock quality. In addition, PLL is an analog circuit, which is beneficial to production when placed on the edge of the device.

The function of PLL is to be able to divide, multiply, and phase control the input clock signal, such as phase control, which is used in SDR SDRAM memory, because the design needs to use clocks with the same frequency but a phase difference of 180° signal. Except for the two types EP4CE6 and EP4CE10, which only contain two PLL units, the others all contain 4 PLLs.

5. User I/O block

Insert picture description here

The above figure shows that the I/O blocks are distributed around the device. The specific voltage, current standards and slew rate control refer to the actual circuit selection.

Here is a reminder that Altera's recommended data flow strategy is that the data flow is input from the left I/O, then processed (operation and storage), and finally output from the right.

The control signal is from top to bottom, that is, the control signal enters from the top I/O unit and the bottom I/O unit outputs.

6.Top View

The figure below is the top-level view of the Quartus II chip. What do these symbols mean?

Insert picture description here
Open in View->Pin Legend to get the explanation of each symbol of the device. The triangle upward is VCC, the triangle downward is ground, and P and N represent the differential signal pair.

Insert picture description here

Resource distribution map

After the introduction of device resources above, how are these resources distributed inside the device? The following figure shows the resource distribution of different devices

Insert picture description here

Insert picture description here
Insert picture description here

A comparison of these three resource distribution diagrams clearly shows that the logic array LAB and M9K memory and multipliers are alternately distributed, which can shorten the data transmission path to obtain more excellent timing performance.

Case analysis

Let's take a typical data collection example to analyze the advantages of this resource distribution.
Insert picture description here

Compare the above figure to analyze the data flow:

1. The data is first collected by an external high-speed ADC and enters the FPGA through the I/O port.

2. After preprocessing by a certain logic circuit, it is written into the buffer composed of M9K memory, such as FIFO or dual-port RAM.

3. Then, the logic circuit reads out from the M9K memory and performs corresponding processing. During the processing, high-speed multiplication may be required, so this part of the data can be directly sent to the on-chip integrated 18*18 multiplier Perform operations on it.

4. After the operation is over, it is processed by the logic circuit and sent to the buffer composed of M9K memory, such as FIFO or dual-port RAM.

5. Wait for other circuits such as the data sending circuit to read the data from the buffer and finally transmit it through the communication circuit connected to the I/O port.
Insert picture description here
If we analyze the entire data flow from left to right, we will find that the resources required at each level in the data flow correspond to the distribution positions of the resources in Cyclone IV E. Therefore, according to the officially recommended data flow strategy, the above-mentioned design is arranged on the corresponding resources of Cyclone IV E, and the layout and routing with optimized timing can be obtained, so that the design can work at a higher clock frequency.

Guess you like

Origin blog.csdn.net/sinat_41653350/article/details/105736117