Linking(4)

Relocation 

Once the linker has completed the symbol resolution step, it has associated each symbol reference in the code with exactly one symbol definition (i.e., a symbol table entry in one of its input object modules). At this point, the linker knows the exact sizes of the code and data sections in its input object modules. It is now ready to begin the relocation step, where it merges the input modules and assigns run-time addresses to each symbol. Relocation consists of two steps: 

(1)Relocating sections and symbol definitions. In this step, the linker merges all sections of the same type into a new aggregate section of the same type.

For example, the .data sections from the input modules are all merged into one section that will become the .data section for the output executable object file.

The linker then assigns run-time memory addresses to the new aggregate sections, to each section defined by the input modules, and to each symbol defined by the input modules. When this step is complete, every instruction and global variable in the program has a unique run-time memory address

(2)Relocating symbol references within sections. In this step, the linker modifies every symbol reference in the bodies of the code and data sections so that they point to the correct run-time addresses. To perform this step, the linker relies on data structures in the relocatable object modules known as relocation entries, which we describe next. 

 Relocation Entries 

When an assembler generates an object module, it does not know where the code and data will ultimately be stored in memory. Nor does it know the locations of any externally defined functions or global variables that are referenced by the module.

So whenever the assembler encounters a reference to an object whose ultimate location is unknown, it generates a relocation entry that tells the linker how to modify the reference when it merges the object file into an executable. Relocation entries for code are placed in .rel.text. Relocation entries for initialized data are placed in .rel.data. 

Figure 7.8 shows the format of an ELF relocation entry.

The offset is the section offset of the reference that will need to be modified.

The symbol identifies the symbol that the modified reference should point to.

The type tells the linker how to modify the new reference. 

We are con- cerned with only the two most basic relocation types

(1) R_386_PC32: Relocate a reference that uses a 32-bit PC-relative address.

Recall from Section 3.6.3 that a PC-relative address is an offset from the current run-time value of the program counter (PC).

When the CPU executes an instruction using PC-relative addressing, it forms the effective address (e.g., the target of the call instruction) by adding the 32-bit value encoded in the instruction to the current run-time value of the PC, which is always the address of the next instruction in memory. 

(2) R_386_32: Relocate a reference that uses a 32-bit absolute address.

With absolute addressing, the CPU directly uses the 32-bit value encoded in the instruction as the effective address, without further modifications. 

Relocating Symbol References 

pseudo code for the linker’s relocation algorithm:

Lines 1 and 2 iterate over each section s and each relocation entry r associated with each section.

For concreteness, assume that each section s is an array of bytes and that each relocation entry r is a struct of type Elf32_Rel, as defined in Figure 7.8.

Also, assume that when the algorithm runs, the linker has already chosen run-time addresses for each section (denoted ADDR(s)) and each sym- bol (denoted ADDR(r.symbol)).

Examples:

Relocating PC-Relative References 

(1) The relocation entry  consists of three fields: 

r.offset = 0x7
r.symbol = swap
r.type   = R_386_PC32

These fields tell the linker to modify the 32-bit PC-relative reference starting at offset 0x7 so that it will point to the swap routine at run time.  

(2) Now, suppose that the linker has determined that 

ADDR(s) = ADDR(.text) = 0x80483b4
ADDR(r.symbol) = ADDR(swap) = 0x80483c8

(3) Recall from our running example in Figure 7.1(a) that the main routine in the .text section of main.o calls the swap routine, which is defined in swap.o. 

6: e8 fc ff ff ff     call 7 <main+0x7> swap();
                      7: R_386_PC32 swap relocation entry

From this listing, we see that the call instruction begins at section offset 0x6 and consists of the 1-byte opcode 0xe8, followed by the 32-bit reference 0xfffffffc (−4 decimal).

We also see a relocation entry for this reference displayed on the following line.

(Recall that relocation entries and instructions are actually stored in different sections of the object file. The objdump tool displays them together for convenience.)  

(4) Using the algorithm in Figure 7.9, the linker first computes the run-time address of the reference (line 7): 

refaddr = ADDR(s)   + r.offset
           = 0x80483b4 + 0x7
           = 0x80483bb

(5) It then updates the reference from its current value (−4) to 0x9 so that it will point to the swap routine at run time (line 8): 

*refptr = (unsigned) (ADDR(r.symbol) + *refptr - refaddr)
           = (unsigned) (0x80483c8      + (-4)    - 0x80483bb)
           = (unsigned) (0x9)

(6) In the resulting executable object file, the call instruction has the following relocated form

80483ba: e8 09 00 00 00      call 80483c8 <swap>     swap();

(7) In conclusion

At run time, the call instruction will be stored at address 0x80483ba.

When the CPU executes the call instruction, the PC has a value of 0x80483bf, which is the address of the instruction immediately following the call instruction.

To execute the instruction, the CPU performs the following steps: 

1. push PC onto stack
2. PC <- PC + 0x9 = 0x80483bf + 0x9 = 0x80483c8

Thus, the next instruction to execute is the first instruction of the swap routine, which of course is what we want! 

Relocating Absolute References 

Recall that in our example program in Figure 7.1, the swap.o module initializes the global pointer bufp0 to the address of the first element of the global buf array: 

int *bufp0 = &buf[0];

Since bufp0 is an initialized data object, it will be stored in the .data section of the swap.o relocatable object module.

Since it is initialized to the address of a global array, it will need to be relocated.  

00000000 <bufp0>:
0: 00 00 00 00                                       int *bufp0 = &buf[0];
                            0: R_386_32 buf          Relocation entry

We see that the .data section contains a single 32-bit reference, the bufp0 pointer, which has a value of 0x0.

The relocation entry tells the linker that this is a 32-bit absolute reference, beginning at offset 0, which must be relocated so that it points to the symbol buf.

Now, suppose that the linker has determined that 

ADDR(r.symbol) = ADDR(buf) = 0x8049454

The linker updates the reference using line 13 of the algorithm in Figure 7.9: 

*refptr = (unsigned) (ADDR(r.symbol) + *refptr)
           = (unsigned) (0x8049454      + 0)
           = (unsigned) (0x8049454)

In the resulting executable object file, the reference has the following relocated form: 

0804945c <bufp0>:
804945c: 54 94 04 08    Relocated!

In words, the linker has decided that at run time the variable bufp0 will be located at memory address 0x804945c and will be initialized to 0x8049454, which is the run-time address of the buf array. 

猜你喜欢

转载自www.cnblogs.com/geeklove01/p/9221259.html