Description of various burning file formats (ELF, HEX, BIN)

1. ELF

Executable and linking format (ELF) file is a common object file format under x86Linux system. There are three main types:

         (1) A relocatable file suitable for connection (relocatable file) can be used to create executable files and share object files together with other object files. 
         (2) An executable file suitable for execution (executable file), which is used to provide a process image of the program and execute with the loaded memory. 
         (3) Shared object file (shared object file), the connector can connect it with other relocatable files and shared object files into other object files, and the dynamic linker can connect it with executable files and other shared object files Combined to create a process image. 
The ELF file format is more complicated.

Second, HEX

The Intel HEX file is an ASCII text file that records text lines. In the Intel HEX file, each line is a HEX record, a machine code or data constant composed of hexadecimal numbers. The Intel HEX file is often used to program or data. Transfer 
storage to ROM, EPROM, most programmers and emulators use Intel HEX files. 
         Record format 
         An Intel HEX file can contain any number of hexadecimal records, each record has five fields, below is a record format. 
: llaaaatt [dd. . .
         Each group of letters in cc  is an independent field. Each letter is a hexadecimal digit. Each field consists of at least two hexadecimal digits. The following is a description of bytes. 
         : The colon is the beginning of each Intel HEX record. 
         Ll is the length field of this record. He represents the number of bytes of data (dd). 
         aaaa is the address field, he indicates the starting address of the data 
         tt This field indicates the type of this HEX record, he may be the following types 
         00 —-Data record 
         01 —-End of file record 
         02 —-Extended segment address record 
         04 —-Extended linear address record 
         dd is a data field, indicating one byte of data, a record may have multiple data bytes, the number of bytes can be 
viewed in the ll field,
         cc is a checksum field, indicating the checksum of the record, The calculation method is to match all the letter pairs starting with a colon in this record
The hexadecimal digits represented 
         are added together, and the remainder obtained by modulo-dividing 256 is finally the complement of the remainder, which is the check byte cc. 
         : 0300000002005E9D 
         cc = 0 × 01 + NOT ((0 × 03 + 0 × 00 + 0 × 00 + 0 × 00 + 0 × 02 + 0 × 00 + 0 × 5E)% 0 × 100) = 0 × 01 + 0 × 9C = 0 × 9D> 
         Data record The 
         Intel HEX file consists of several data records. A data record ends with a carriage return and a line feed. For 
example, the following data record
         : 10246200464C5549442050524F46494C4500464C33 
         10 is the number of bytes of data recorded  in this line. 
         2462 is The starting address of the data in the memory 
         00 is the record type 00 (is a data record) 
         464C to 464C is the data 
         33 is the checksum of this line of records 

Three, BIN


The BIN file is a direct binary file, and there is no address mark inside. Generally, it starts from 00 when programming with a programmer, and if the download is run, it can be downloaded to the address at the time of compilation.

Summary: It can be converted from ELF files to the other two files, HEX can also be directly converted into BIN files, but to convert BIN into HEX files, a base address must be given. HEX and BIN cannot be converted into elf files, because the amount of information of ELF is larger. There is also an ads debugging file axf,
which can be converted into a BIN file, using the following command fromelf -nodebug xx. axf -bin xx. Just bin.
 

The basic file formats of the ARM system mentioned here are all file formats that are often encountered in the development of ARM-based embedded systems.
    There are three basic file formats for the ARM system:
1) BIN, a flat binary format, is generally used to burn directly into Flash, and can also be used to load into a monitor program.
2) ELF, EXECUTABLE AND LINKABLE FORMAT, a general OBJECT file format, generally produced by GNU COMPILER COLLECTION (GCC).
3) AXF, the extended version of BIN format, the main part is the same as BIN, and debugging information is added to the head and end of the file for AXD.
    This article mainly discusses BIN and ELF.
    First, the ELF format is an OBJECT file format. Generally OBJECT files can be divided into three categories: relocatable OBJECT files, executable OBJECT files, and shared OBJECT files. ELF format files can also be divided into these three types.
    First talk about relocatable OBJECT files. This OBJECT file is generally generated by ASSEMBLER (as) in GCC (please don't think that GCC is just a compiler), in addition to the binary machine code, there is some information that can be used for relocation. It is mainly used as the input of LINKER (ld). LINKER will follow this information and relocate the symbols that need to be relocated, thereby generating an executable OBJECT file. The relocatable OBJECT file in ELF format is composed of header and section.
    Header includes ELF header and section header. ELF header is located at the head of the file, used to store the target machine's architecture, size end configuration, ELF header size, object file type, section header offset in the file, section header size, Information such as the number of items in the section header. Section header defines the type, position, size and other information of each section in the file. Linker finds the entry of the section header by looking up the ELF header, and then finds the corresponding section entry in the section header, and then locates the target section.
    Section includes 
 


.text    :经过编译的机器代码。
.rodata  :只读的数据,例如printf(“hello!”)中的字符串hello。
.data    :已初始化的全局变量,局部变量将在运行时被存放在堆栈中,不会在.data或 .bss段中出现。
.bss     :未初始化的全局变量,在这里只是一个占位符,在object文件中并没有实际的存储空间。
.symtab  :符号表,用于存放程序中被定义的或被引用到的全局变量和函数的信息。
.rel.text  :一个保存着一系列在.text中的位置的列表。这些位置将在linker把这个文件与其它object文件合并时被修改,一般来说,这些位置都是保存着一些引用到全局变量或者外部函数的指令。引用局部变量或者本地函数的指令是不需要被修改的,因为局部变量和本地函数的地址一般都是使用PC相对偏移地址的。需要注意的是,这个section 和下面的.rel.data在运行时并不需要,生成可执行的ELF object文件时会去掉这个section。
.rel.data :保存全局变量的重定位信息。一般来说,如果一个全局变量它的初始化值是另一个全局变量的地址,或者是外部函数的地址,那么它就需要被重定位。
.debug  :保存debug信息。
.strtab  : 一个字符串表,保存着.symtab和.debug ,和各个section的名字。.symtab,.debug 和section table里面,凡是保存name的域,其实都是保存了一个偏移值,通过这个偏移值在这个字符串表里面可以找到相应得字符串。

Let's discuss .symtab carefully:
every relocatable object file will have a .symtab. This symbol table stores all the defined and referenced symbols in this object file. When the source program is a C language program, the symbols in .symtab come directly from the C compiler (cc1). There are mainly three kinds of symbols mentioned here:
1) The symbols defined in this object file can be global symbols of other object files. In the C language source program, it is mainly those non-static (without static modification) global variables and non-static functions. In ARM assembly language, these are the variables exported by the EXPORT instruction.
2) Global variables referenced in this object file, but defined in other files. In ARM assembly language, it is the variable introduced through the IMPORT command.
3) Local variable. Local variables are only visible in this object file. The local variables here refer to the local variables of the connector, which should be distinguished from the general program local variables. The local variables referred to here include global variables decorated with static, section names in object files, and source code file names. Local variables in the general sense are managed by the system's runtime environment at runtime, and the linker does not care.
    Each symbol that meets the above conditions will have a data item in the .symtab file. The data structure of this data item is:
 

Typedef struct{
    int name;//符号名称,其实就是.strtab的偏移值
    int value;//在section中的位置,以相对section地址的偏移表示
    int size;//大小
    char type;//类型,一般是数据或函数
    char binding;//是本地变量还是全局变量
    char reserved;//保留的位
        char section;//符号所属的section。可选有:.text(用数字1代表),.data(用数
                            //3代表),ABS(不应被重定位的符号),UND(在本object文件
                            //中未定义的符号,可能在别的文件中定义),COM(一般的未初//始化的变量符号)。
}ELF_sym

Now suppose that the various modules that make up the application have been assembled, and a relocatable object file has been constructed. The structure of these objects is the same, with their own .text, .data section, and their own .symtab. The next step for GCC is to use linker (ld) to connect these object files, plus the necessary libraries, into An executable file with an absolute runtime address is an executable file in ELF format.
    Linker's connection action can be divided into two parts:

1) Symbol resolution. Determine the direction of the reference symbol.
2) Symbol relocation. Combine sections, assign runtime environment addresses, and reference symbol relocation.

    Symbol resolution:
    In an object file, there are instructions that define symbols, and some instructions refer to symbols. There may be a situation where a referenced symbol has multiple definitions. The role of symbol resolution is to determine which symbol is actually referenced by a symbol reference in this object file.
    At the time of compilation, in addition to the global variable defined in this file, the compiler will generate a symbol table entry. When a referenced symbol is found not defined in this file, the compiler will automatically generate a The symbol table entry leaves the work of determining these references to the linker. The assembler will read these symbol table entries during assembly and generate .symtab. During reading, if
found in the referenced item symbol can not be determined , the assembler will generate a data entry for these additional symbols, called relocation data items, stored in rel.text or rel.data section , the cross Determined by linker. The following is the data structure of the relocation entry:
 

Typedef struct{
    int offset;//指明需要被重定位的引用在object中的偏移,实际上就是需要被重定位的引用
                   //在object中的实际位置
    int symbol;//这个被重定位的引用真实指向的符号
    int type;//重定位类型:R_ARM_PC24:使用24bit的PC相对地址重定位引用
          //R_ARM_ABS32:使用32bit绝对地址重定位引用
}Elf32_Rel

Linker needs to resolve the references that are generated by the relocation data items. Linker will search for the appropriate symbol in each input object file for each relocated data item according to the rules defined by the C language, and fill this symbol into the symbol item. But since we don't know the real address of this symbol, even now we know the real point of reference, but we still can't determine the address pointed by this reference.
Symbol relocation:
    Symbol relocation is used to solve the above problem. Linker first merges sections. The process of linker merging object files is very simple. Generally, it is the merging of sections with the same attributes. For example, .text sections of different object files will be merged into one .text. Similarly, the .symtab section has also been merged into a .symtab.
There are two issues involved here:
1) The order in which the object files are merged. This question relates to the running address of the final instructions and symbols. The most important thing is, which section is at the top? This is most important in the development of the ARM RAW system. After the CPU of the ARM system is powered on, the system will automatically fetch and execute instructions from the address 0x00000000, and the memory is mapped on this address. This action is not programmable. Therefore, the first section must contain the entry point of the program , otherwise the system cannot run normally.
2) Correspondence between input section and output terminal. In theory, any section can be mapped to an output section at will. A .data section can be combined with a .text section to output a .text. Of course, such actions are meaningless. We must tell the linker to use those sections as input to produce an output section. The
    above two problems are controlled by a file called a connection scriptof. Linker reads the connection script to determine the mapping of sections from input to output, set the entry point of the program, and set which section should be in the head of the entire executable file.
    The connection script also has another function, which is to specify the address of each section . After the section merge is complete, the linker will follow the .symtab to address the symbols uniformly and assign an absolute runtime address. This address is based on the section address . Assuming that the address of the .text section is 0x00000000, the symbols in the .text will use the address 0x00000000 as the reference address. Specifying the section address is also done by the connection script. Commonly used in embedded development, text_base, data_base and other parameters that need to be specified when compiling the project will be added to the connection script at the end to complete the address allocation of the section.
    After the above two steps are completed, the linker performs the reference symbol relocation operation. Linker traverses the .rel section (including .rel text and .rel data), and for each data item in it, finds the corresponding referenced real address in the .symtab according to the symbol field (after the above address allocation, now in .symtab) The symbols of have the absolute running address), and then fill this address into the corresponding position according to the offset provided by the offset field.
    So far, the symbol relocation work has been completed. Linker deletes the rel.text and rel.data sections used to save relocation information, and adds a segment header and an .init section. Generate executable object files in ELF format.
    Segment header holds the information used for operating system memory mapping. The .init section contains a _init function. When the program is loaded, the program loader of the operating system loads the program into the user memory space by reading the segment header, and maps the .text segment and the .data segment to appropriate addresses according to the mapping information in the segment header, respectively. Then call the _init function in .init to complete the initialization.
    Because ELF files have the advantages of versatility, the current popular development mode is to first generate an executable file in the ELF file format through a compilation tool, and use an external tool to extract the corresponding part of the ELF file to generate a BIN file. For example, the famous GNU bootloader U-Boot adopts this approach. The compiler tool set is GCC, and the BIN generation tool is elf2bin. ARM's well-known development environment ADS, although using its own armcc, and armcpp compiler, but they work in the same way as GNU GCC.
 

Published 25 original articles · praised 8 · 20,000+ views

Guess you like

Origin blog.csdn.net/boazheng/article/details/104299195