On the principle of c/c++ compilation (1)

-------------Foreword

I just finished reading "Advanced C/C++ Compilation Technology" in a hurry. I know that reading it once is not enough, and just reading it is not enough. I should write this blog post first to record some of my conjectures. Verified, verified and not conjecture. In the end, if there is anything wrong in the following, please correct me, keep learning and making progress!

-------------text

Let's talk about this book first. This book was written by Milan Stevanovic and translated by Lu Yusheng. The content of the structure is the hardware foundation, the life cycle of the program, the introduction of each stage in the life cycle, and the solution of various problems.

The first chapter explains his hardware support, because the final compilation is to compile into a binary language that the machine can understand, so first of all we need to know what our purpose is, the high-level language c/c++------- ---"A tool (first treat it as a black box)------------"Binary file. Then, the binary language that the machine can understand is the instruction that needs to go to a dedicated chip. More importantly, how to get the address of the instruction, how to get our binary file, and how to solve our problem. That's what I can probably read, I haven't seen the operating system yet, and I haven't understood something deeper.

To sum up, in the end, what we want to control is our I/O device. The main memory releases the byte stream. As a direct port for data exchange with IO, the main memory is required to be fast, but due to cost issues, the larger the Expensive, so there is a caching mechanism, a compromise. The cache often has multiple levels. The faster the cache is close to the IO, the slower the CPU is. This classification system plays a role in buffering to some extent, but inevitably, when the amount of IO data is large, the main memory often Ask the hard disk for help, which makes the mechanism of virtual memory, then it will appear very stuck. It makes the application think that it has contiguous available memory (a contiguous complete address space ), when in fact it is usually divided into multiple physical memory fragments, and some are temporarily stored on external disk storage , when needed data exchange.

How to make it look continuous?

In fact, the physical address is not continuous, but the number of the physical address is continuous. When learning the principles of digital circuits and microcomputers, one will teach that one is to assign addresses to ram. According to the pre-agreed rules, you can make it Address number reading is continuous. For example, the reading of the parity address bits meets the cpu setting. The intel8086 cpu has a pin that indicates the parity read address. In order to satisfy its overall architecture, the combination of ram and decoder is used for the cpu to see Up is really reading the parity address. For more details, just check the textbook again.

The life cycle of a program is:

Compile, link, two stages , go through these two stages:

c/c++--------->compile---------->assembler---------->link------------ ----> Binary file

Binary file -------------> load -----------------> give absolute address (addressable), execute

ok, let's introduce it step by step.

Compile:

1. Execution unit: Compiler

2. Input: Compilation unit (usually text containing source code)

3. Output: A collection of binary object files

Note: Although the output at this time is a binary file, it is the target file corresponding to each source file, that is, the connection between them has not been connected.

ok, to turn the above input into output, what does the compiler do?

1. Preprocessing stage

Input: c/c++

output: c/c++

Use a special text handler to replace the macro. Include the specific file identified by the include keyword into the source file, convert the value specified by the define statement into a constant, and deal with ifndef or ifdef, eleif and endif accordingly.

2. Linguistic analysis stage

Input: c/c++

output: c/c++

The result is a compact, syntactic and semantic code. There are several aspects of the process:

      1. Lexical analysis: splitting source code into indivisible words

      2. Syntax analysis: Concatenate the proposed words into a sequence and check against the rules of the programming language to verify that the sequence is reasonable.

      3. Semantic analysis: The purpose is to find out whether the sentences that meet the grammatical regulations have actual meaning.

3. Assembly process

Input: c/c++

output: assembly code

A set of languages ​​translated into a specific CPU instruction set. After the two steps of 1 and 2, it can be guaranteed that the current c/c++ code is streamlined and meaningful. Then you can convert it to assembly code. Take the gcc compiler as an example:

Source code------------>gcc---------------->ASCII encoded text file

The x86 processor architecture supports two instruction formats: AT&T intel format

4. Optimization stage

Input: assembly code

output: assembly code

Assembly code for the initial version --------------------->Optimization---------------------- ----> Final assembly code

The principle of optimization:

         1. Minimize register usage

         2. Part of the code that does not actually need to be executed can be predicted through analysis

5. Code Generation Phase

Input: assembly code

Output: Multiple collections of binaries

Each object file corresponds to a compilation unit, and the assembly instructions are converted into the binary values ​​of the corresponding machine instructions (opcodes).

ok, the compilation is complete, and on Linux, we can pass the command

gcc -S <input file> -o <output assembler file>.s //output assembly code

Write a simple hello program.

gcc -S -masm=intel hello.c -o hello.s


This is the assembler, but I haven't read it yet, so I don't understand it, so I won't analyze it. Know that the assembler can be derived.

gcc -c hello.c -o hello.o

I got hello.o, the binary file. Although you see that there is only one, don't forget why you include a ghost that you didn't understand when you first started learning. Where is the ghost? That thing is the shared library, which will be introduced later, but will not be discussed for now.

So, we can get the binary object file through this command, but we need a hexadecimal file viewer to view it. I don't have it. Using vim directly is a bunch of garbled characters and won't take screenshots.

At this point, the compilation process ends. See the next section later.



Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325742122&siteId=291194637