Chapter 3. The /proc Filesystem

Chapter 3. The /proc Filesystem

3.1. Introduction

One of the big reasons why Linux is so popular today is the fact that it combines many of the best features from its UNIX ancestors. One of these features is the /proc filesystem, which it inherited from System V and is a standard part of all kernels included with all of the major distributions. Some distributions provide certain things in /proc that others don’t, so there is no one standard /proc specification; therefore, it should be used with a degree of caution.

Linux 之所以如此受欢迎的一个重要原因是它继承了UNIX许多好的功能。其中一个功能是/proc文件系统, 它从系统 V 继承, 它是所有主要发行版中包含的所有内核的标准部分。各个发行版在/proc中提供的东西不同, 因此没有一个标准/proc规范;因此, 应谨慎使用。

The /proc filesystem is one of the most important mechanisms that Linux provides for examining and configuring the inner workings of the operating system. It can be thought of as a window directly into the kernel’s data structures and the kernel’s view of the user processes running on the system. It appears to the user as a filesystem just like / or /home, so all the common file manipulation programs and system calls can be used with it such as cat(1), more(1), grep(1), open(2), read(2), and write(2)[1]. If permissions are sufficient, writing values to certain files is also easily performed by redirecting output to a file with the > shell character from a shell prompt or by calling the system call write(2) within an application.

/proc文件系统是 Linux 为检查和配置操作系统的内部运作而提供的最重要的机制之一。它可以被认为是一个直接查看内核的数据结构的窗口，和以内核视角查看运行在系统上用户进程。它显示给用户的文件系统就像/或/家, 因此, 所有常见的文件操作程序和系统调用都可以与它一起使用, 如 cat (1)、more (1)、grep (1)、open (2)、read (2) 和write (2) [1]。如果权限足够, 则通过shell重定向符号 > 将shell的输入重定向到某个文件, 或者在应用程序中调用系统调用write (2), 可以很容易地执行对某些文件的修改。

[1] When Linux operation names are appended with a number in parentheses, the number directly refers to a man page section number. Section 1 is for executable programs or shell commands, and section 2 is for system calls (functions provided by the kernel). Typing man 2 read will view the read system call man page from section 2.

当将 Linux 操作名称加上到括号中的数字作为后缀时, 该数字直接引用了帮助手册部分编号。1节用于可执行程序或 shell 命令, 2 节用于系统调用 (内核提供的函数)。键入man 2read将从第二节查看read系统调用的帮助页面。

The goal of this chapter is not to be an exhaustive reference of the /proc filesystem, as that would be an entire publication in itself. Instead the goal is to point out and examine some of the more advanced features and tricks primarily related to problem determination and system diagnosis. For more general reference, I recommend reading the proc(5) man page.

本章的目标不是对/proc文件系统的详尽介绍, 因为它本身就是一个介绍手册。相反, 本章的目的是指出并检查一些与问题的确定和系统诊断有关更先进的功能和技巧。为了更一般的参考, 我建议阅读proc (5) 帮助手册。

Note: If you have the kernel sources installed on your system, I also recommend reading /usr/src/linux/Documentation/filesystems/procfs.txt.

注意: 如果您的系统中安装了内核源代码, 我还建议阅读/usr/src/linux/Documentation/filesystems/procfs.txt 。

3.2. Process Information

Along with viewing and manipulating system information, obtaining user process information is another way in which the /proc filesystem shines. When you look at the listing of files in /proc, you will immediately notice a large number of directories identified by a number. These numbers represent process IDs and contain more detailed information on that process ID within it. All Linux systems will have the /proc/1 directory. The process with ID 1 is always the “init” process and is the first user process to be started on the system during bootup. Even though this is a special program, it is a process just like any other, and the /proc/1 directory will contain the same information as any other process including the ls command you use to see the contents of this and any other directory! The following sections will go into more detail on the most useful information that can be found in the /proc/<pid>[2] directory such as viewing and understanding a process’ address space, viewing CPU and memory configuration information, and understanding settings that can greatly enhance application and system troubleshooting.

除了查看和管理系统信息之外, 获取用户进程信息也是该文件系统的另一个亮点。当您查看 /proc中的文件列表时, 您将立即注意到大量由数字标识的目录。这些数字表示进程 id, 其中包含有关该id进程的更详细信息。所有 Linux 系统都将有/proc/1 目录。ID 为1的进程始终是 "init" 过程, 是启动期间在系统上启动的第一个用户进程。虽然这是一个特殊的程序, 它是一个进程, 就像任何其他进程一样, /proc/1 目录将包含类似的信息, 包括 ls 命令, 你用来看到这个和任何其他目录的内容!以下各节将详细介绍在/proc/<pid>[2] 目录中可以找到的最有用的信息,例如查看和理解进程的地址空间、查看 CPU 和内存配置信息以及了解可以大大增强应用程序和系统故障排除的设置.

[2] A common way of generalizing a process’ directory name under the /proc filesystem is to use / proc/<pid> considering a process’ number is random with the exception of the init process.

在/proc文件系统下推广进程目录名的一种常见方法是使用/proc/<pid>考虑进程的编号是随机的, 但init进程除外.

3.2.1. /proc/self

As a quick introduction into how processes are represented in the /proc filesystem, let’s first look at the special link “/proc/self.” The kernel provides this as a link to the currently executing process. Typing “cd /proc/self” will take you directly into the directory containing the process information for your shell process. This is because cd is a function provided by the shell (the currently running process at the time of using the “self” link) and not an external program. If you perform an ls -l /proc/self, you will see a link to the process directory for the ls process, which goes away as soon as the directory listing completes and the shell prompt returns. The following sequence of commands and their associated output illustrate this.

作为对进程在/proc文件系统中的表示方式的快速介绍, 让我们先来看一下特殊的链接 "/proc/self"。内核将其作为指向当前正在执行的进程的链接。键入 "cd/proc/self" 将直接带您进入包含当前 shell 进程信息的目录。这是因为 cd 是 shell 提供的函数 (使用 "self" 链接到当前正在运行的进程), 而不是外部程序。如果执行 ls-l/proc/self, 您将看到 ls 进程的进程目录的链接, 一旦目录列表完成, shell 提示返回, 就会立即消失。下面的命令序列及其相关的输出说明了这一点。

Note: $$ is a special shell environment variable that stores the shell’s process ID, and “/proc/<pid>/cwd” is a special link provided by the kernel that is an absolute link to the current working directory.

注意: $ $ 是一个特殊的 shell 环境变量, 它存储 shell 的进程 ID, "/proc/<pid>/cwd" 是内核提供的一个特殊链接, 它是指向当前工作目录的绝对链接.

penguin> echo $$

2602

penguin> ls -l /proc/self

lrwxrwxrwx 1 root root 64 2003-10-13 08:04 /proc/self -> 2945

penguin> cd /proc/self

penguin> ls -l cwd

lrwxrwxrwx 1 dbehman build 0 2003-10-13 13:00 cwd -> /proc/2602

penguin>

The main thing to understand in this example is that 2945 is the process ID of the ls command. The reason for this is that the /proc/self link, just as all files in /proc, is dynamic and will change to reflect the current state at any point in time. The cwd link matches the same process ID as our shell process because we first used “cd” to get into the /proc/self directory.

在此示例中要理解的主要问题是, 2945 是 ls 命令的进程 ID。其原因是/proc/self链接 (就像所有文件在/proc中一样) 是动态的, 并且将更改以反映任何时间点的当前状态。cwd 链接与我们的 shell 进程匹配相同的进程 ID, 因为我们首先使用 "cd" 进入/proc/self。

3.2.2. /proc/<pid> in More Detail

With the understanding that typing “cd /proc/self” will change the directory to the current shell’s /proc directory, let’s examine the contents of this directory further. The commands and output are as follows:

了解到键入 "cd/proc/self" 会将目录更改为当前 shell 的/proc目录, 让我们进一步检查此目录的内容。命令和输出如下所示:

penguin> cd /proc/self

penguin> ls -l

total 0

-r--r--r-- 1 dbehman build 0 2003-10-13 13:34 cmdline

lrwxrwxrwx 1 dbehman build 0 2003-10-13 13:34 cwd -> /proc/2602

-r-------- 1 dbehman build 0 2003-10-13 13:34 environ

lrwxrwxrwx 1 dbehman build 0 2003-10-13 13:34 exe-> /bin/bash

dr-x------ 2 dbehman build 0 2003-10-13 13:34 fd

-rw------- 1 dbehman build 0 2003-10-13 13:34 mapped_base

-r--r--r-- 1 dbehman build 0 2003-10-13 13:34 maps

-rw------- 1 dbehman build 0 2003-10-13 13:34 mem

-r--r--r-- 1 dbehman build 0 2003-10-13 13:34 mounts

lrwxrwxrwx 1 dbehman build 0 2003-10-13 13:34 root -> /

-r--r--r-- 1 dbehman build 0 2003-10-13 13:34 stat

-r--r--r-- 1 dbehman build 0 2003-10-13 13:34 statm

-r--r--r-- 1 dbehman build 0 2003-10-13 13:34 status

Notice how the sizes of all the files are 0, yet when we start examining some of them more closely it’s clear that they do in fact contain information. The reason for the 0 size is because these files are basically a window directly into the kernel’s data structures and therefore are not really files; rather they are very special types of files. When filesystem operations are performed on files within the /proc filesystem, the kernel recognizes what is being requested by the user and dynamically returns the data to the calling process just as if it were being read from the disk.

请注意, 所有文件的大小是 0, 但当我们开始仔细检查其中的一些时, 很明显, 它们实际上包含信息。大小是0的原因是, 这些文件基本上是直接进入内核的数据结构的一个窗口, 因此不是真正的文件;相反, 它们是非常特殊的文件类型。在/proc文件系统中的文件执行文件系统操作时, 内核将识别用户请求的内容, 并将数据动态地返回到调用进程, 就像从磁盘读取一样。

3.2.2.1. /proc/<pid>/maps

The “maps” file provides a view of the process’ memory address space. Every process has its own address space that is handled and provided by the Virtual Memory Manager. The name “maps” is derived from the fact that each line represents a mapping of some part of the process to a particular region of the address space. For this discussion, we’ll focus on the 32-bit x86 hardware. However, 64-bit hardware is becoming more and more important, especially when using Linux, so we’ll discuss the differences with Linux running on x86_64 at the end of this section.

"映射" 文件提供了进程的内存地址空间的视图。每个进程都有自己的地址空间, 由虚拟内存管理器处理和提供。"映射" 的名称来源于以下事实: 文件的每一行表示进程的某个部分映射到地址空间的特定区域。对于本讨论, 我们将重点介绍32位 x86 硬件。然而, 64 位硬件变得越来越重要, 尤其是在使用 linux 的时候, 所以在本节的末尾我们将讨论x86_64 上运行的 linux 的不同之处。

Figure 3.1 shows a sample maps file which we will analyze in subsequent sections. 显示一个映射文件示例, 我们将在后面的部分分析。

Figure 3.1. A /proc/<pid>/ maps file.

Code View: Scroll / Show All

08048000-080b6000 r-xp 00000000 03:08 10667 /bin/bash

080b6000-080b9000 rw-p 0006e000 03:08 10667 /bin/bash

080b9000-08101000 rwxp 00000000 00:00 0

40000000-40018000 r-xp 00000000 03:08 6664 /lib/ld-2.3.2.so

40018000-40019000 rw-p 00017000 03:08 6664 /lib/ld-2.3.2.so

40019000-4001a000 rw-p 00000000 00:00 0

4001a000-4001b000 r--p 00000000 03:08 8598 /usr/lib/locale/en_US/LC_IDENTIFICATION

4001b000-4001c000 r--p 00000000 03:08 9920 /usr/lib/locale/en_US/LC_MEASUREMENT

4001c000-4001d000 r--p 00000000 03:08 9917 /usr/lib/locale/en_US/LC_TELEPHONE

4001d000-4001e000 r--p 00000000 03:08 9921 /usr/lib/locale/en_US/LC_ADDRESS

4001e000-4001f000 r--p 00000000 03:08 9918 /usr/lib/locale/en_US/ LC_NAME

4001f000-40020000 r--p 00000000 03:08 9939 /usr/lib/locale/en_US/LC_PAPER

40020000-40021000 r--p 00000000 03:08 9953 /usr/lib/locale/en_US/LC_MESSAGES/SYS_LC_MESSAGES

40021000-40022000 r--p 00000000 03:08 9919 /usr/lib/locale/en_US/LC_MONETARY

40022000-40028000 r--p 00000000 03:08 10057 /usr/lib/locale/en_US/LC_COLLATE

40028000-40050000 r-xp 00000000 03:08 10434 /lib/libreadline.so.4.3

40050000-40054000 rw-p 00028000 03:08 10434 /lib/libreadline.so.4.3

40054000-40055000 rw-p 00000000 00:00 0

40055000-4005b000 r-xp 00000000 03:08 10432 /lib/libhistory.so.4.3

4005b000-4005c000 rw-p 00005000 03:08 10432 /lib/libhistory.so.4.3

4005c000-40096000 r-xp 00000000 03:08 6788 /lib/libncurses.so.5.3

40096000-400a1000 rw-p 00039000 03:08 6788 /lib/libncurses.so.5.3

400a1000-400a2000 rw-p 00000000 00:00 0

400a2000-400a4000 r-xp 00000000 03:08 6673 /lib/libdl.so.2

400a4000-400a5000 rw-p 00002000 03:08 6673 /lib/libdl.so.2

400a5000-401d1000 r-xp 00000000 03:08 6661 /lib/i686/libc.so.6

401d1000-401d6000 rw-p 0012c000 03:08 6661 /lib/i686/libc.so.6

401d6000-401d9000 rw-p 00000000 00:00 0

401d9000-401da000 r--p 00000000 03:08 8600 /usr/lib/locale/en_US/LC_TIME

401da000-401db000 r--p 00000000 03:08 9952 /usr/lib/locale/en_US/LC_NUMERIC

401db000-40207000 r--p 00000000 03:08 10056 /usr/lib/locale/en_US/LC_CTYPE

40207000-4020d000 r--s 00000000 03:08 8051 /usr/lib/gconv/gconv-modules.cache

4020d000-4020f000 r-xp 00000000 03:08 8002 /usr/lib/gconv/ISO8859-1.so

4020f000-40210000 rw-p 00001000 03:08 8002 /usr/lib/gconv/ISO8859-1.so

40210000-40212000 rw-p 00000000 00:00 0

bfffa000-c0000000 rwxp ffffb000 00:00 0

The first thing that should stand out is the name of the executable /bin/bash. This makes sense because the commands used to obtain this maps file were “cd /proc/self ; cat maps.” Try doing “less /proc/self/maps” and note how it differs.

首先应该突出的是可执行文件/bin/bash 的名称。这是有道理的, 因为用于获取此映射文件的命令是 "cd/proc/self";cat maps。试着做 "less /proc/self/maps", 并注意它的区别。

Let’s look at what each column means. Looking at the first line in the output just listed as an example we know from the proc(5) man page that 08048000-080b6000 is the address space in the process occupied by this entry; the r-xp indicates that this mapping is readable, executable, and private; the 00000000 is the offset into the file; 03:08 is the device (major:minor); 10667 is the inode; and /bin/bash is the pathname. But what does all this really mean?

让我们看看每一列的含义。看一下刚才列出的输出中的第一行, 我们从proc (5) 帮助手册知道, 08048000-080b6000 是该条目所占用进程中的地址空间;r xp 表示此映射是可读的、可执行的和私有的;00000000是文件的偏移量;03:08 是设备 (主要: 次要);10667是 inode;和/bin/bash 是路径名。但这一切到底意味着什么呢？

It means that /bin/bash, which is inode 10667 (“stat /bin/bash” to confirm) on partition 8 of device 03 (examine /proc/devices and /proc/partitions for number to name mappings), had the readable and executable sections of itself mapped into the address range of 0x08048000 to 0x080b6000.

这意味着/bin/bash, 这是 inode 10667 ("stat /bin/bash", 以确认) 在设备03的分区 8 (检查/proc/devices和/proc/partitions 的数字到名称映射), 本身可读和可执行的部分映射到地址范围0x08048000 到0x080b6000。

Now let’s examine what each individual line means. Because the output is the address mappings of the /bin/bash executable, the first thing to point out is where the program itself lives in the address space. On 32-bit x86-based architectures, the first address to which any part of the executable gets mapped is 0x08048000. This address will become very familiar the more you look at maps files. It will appear in every maps file and will always be this address unless someone went to great lengths to change it. Because of Linux’s open source nature, this is possible but very unlikely. The next thing that becomes obvious is that the first two lines are very similar, and the third line’s address mapping follows immediately after the second line. This is because all three lines combined contain all the information associated with the executable /bin/bash.

现在让我们来研究一下每一行的含义。因为输出是/bin/bash 可执行文件的地址映射, 所以首先要指出的是程序本身在地址空间中的位置。在32位 x86 体系结构上, 可执行文件的任何部分被映射到的第一个地址是0x08048000。这个地址将变得非常熟悉, 你看映射文件越多。它将出现在每个映射文件, 并将永远是这个地址, 除非有人去做很大的努力来改变它。由于 Linux 的开源性质, 这是可能的, 但不是很可行。接下来的事情变得很明显, 前两行非常相似, 第三行的地址映射紧跟在第二行之后。这是因为所有三行组合包含与可执行文件/bin/bash 相关的所有信息。

Generally speaking, each of the three lines is considered a segment and can be named the code segment, data segment, and heap segment respectively. Let’s dissect each segment along with its associated line in the maps file.

一般而言, 三行中的每一个都被视为一个段, 并且可以分别命名为代码段、数据段和堆段。让我们解剖每个段以及它在映射文件中的联系。

3.2.2.1.1. Code Segment

The code segment is also very often referred to as the text segment. As will be discussed further in Chapter 9, “ELF: Executable and Linking Format,” the .text section is contained within this segment and is the section that contains all the executable code.

代码段也经常被称为文本段。正如将在第9章 "ELF: 可执行文件和链接格式" 中进一步讨论的那样, ". text" 部分包含在此段中, 是包含所有可执行代码的段。

Note: If you’ve ever seen the error message text file busy (ETXTBSY) when trying to delete or write to an executable program that you know to be binary and not ASCII text, the meaning of the error message stems from the fact that executable code is stored in the .text section

注意: 如果您在试图删除或写入一个您知道是二进制文件而不是 ASCII 文本的可执行程序时遇到错误消息text file busy(ETXTBSY), 则错误消息的含义源于可执行代码存储在. text 段中的事实。

Using /bin/bash as our example, the code segment taken from the maps file in Figure 3.1 is represented by this line:

使用/bin/bash 作为我们的示例, 从图3.1 中的映射文件中取出的代码段如下:

08048000-080b6000 r-xp 00000000 03:08 10667 /bin/bash

This segment contains the program’s executable instructions. This fact is confirmed by the r-xp in the permissions column. Linux does not support self modifying code, therefore there is no write permission, and since the code is actually executed, the execute permission is set. To give a hands-on practical example of demonstrating what this really means, consider the following code:

此段包含程序的可执行指令。此事实由 "权限" 列中的 r xp 确认。Linux 不支持自修改代码, 因此没有写入权限, 而且由于代码实际被执行, 因此设置了 execute 权限。要举一个实际的例子来展示这到底意味着什么, 请考虑下面的代码:

#include <stdio.h>

int main( void )

{

printf( "Address of function main is 0x%x\n", &main );

printf( "Sleeping infinitely; my pid is %d\n", getpid() );

while( 1 )

sleep( 5 );

return 0;

}

Compiling and running this code will give this output:

编译并运行此代码将提供以下输出:

Address of function main is 0x804839c

Sleeping infinitely; my pid is 4059

While the program is sleeping, examining /proc/4059/maps gives the following maps file:

当程序处于休眠状态时, 检查/proc/4059/maps 提供以下映射文件:

08048000-08049000 r-xp 00000000 03:08 130198 /home/dbehman/testing/c

08049000-0804a000 rw-p 00000000 03:08 130198 /home/dbehman/testing/c

40000000-40018000 r-xp 00000000 03:08 6664 /lib/ld-2.3.2.so

40018000-40019000 rw-p 00017000 03:08 6664 /lib/ld-2.3.2.so

40019000-4001b000 rw-p 00000000 00:00 0

40028000-40154000 r-xp 00000000 03:08 6661 /lib/i686/libc.so.6

40154000-40159000 rw-p 0012c000 03:08 6661 /lib/i686/libc.so.6

40159000-4015b000 rw-p 00000000 00:00 0

bfffe000-c0000000 rwxp fffff000 00:00 0

Looking at the code segment’s address mapping of 08048000 - 08049000 we see that main’s address of 0x804839c does indeed fall within this range. This is an important observation to understand when debugging programs especially when using a debugger such as GDB. The reason for this is because when looking at various addresses in a debugging session, knowing roughly what they are can often help to put the puzzle pieces together much more quickly.

查看代码段的地址映射 08048000-08049000 我们看到main函数的地址0x804839c 确实是在这个范围内。这是一个重要的观察, 以了解何时调试程序, 特别是当使用一个调试器时, 如 GDB。原因是因为当查看调试中的各种地址时, 大致知道它们是什么可以帮助将代码片段更快速地组合在一起。

3.2.2.1.2. Data Segment

For quick reference, the data segment of /bin/bash is represented by line two in Figure 3.1:

为快速引用,/bin/bash 的数据段由图3.1 中的第二行表示:

080b6000-080b9000 rw-p 0006e000 03:08 10667 /bin/bash

At first glance it appears to be very similar to the code segment line but in fact is quite different. The primary differences are the address mapping and the permissions setting of rw-p which means read-write, non-executable, and private. Logically speaking, a program consists mostly of instructions and variables. We now know that the instructions are in the code segment, which is read-only and executable. Because variables can certainly change throughout the execution of a program and are not considered to be executable, it makes perfect sense that they belong in the data segment. It is important to know that only certain kinds of variables exist in this segment, however. How and where they are declared in the program’s source code will dictate what segment and section they appear in the process’ address space. Variables that exist in the data segment are initialized global variables. The following program demonstrates this.

乍一看, 它似乎是非常类似的代码段, 但实际上是相当不同的。主要区别是地址映射和rw-p 的权限设置, 这意味着读写、不可执行和私有。从逻辑上讲, 程序主要由指令和变量组成。我们现在知道指令在代码段中, 它是只读的和可执行的。由于变量在整个程序的执行过程中肯定会发生变化, 并且是不可执行的, 因此它们属于数据段是完全有意义的。重要的是要知道, 只有某些类型的变量存在于这个部分, 但是。在程序的源代码中声明它们的方式和位置将决定它们在进程的 "地址空间" 中出现的位置。数据段中存在的变量是初始化的全局变量。下面的程序演示了这一点。

#include <stdio.h>

int global_var = 3;

int main( void )

{

printf( "Address of global_var is 0x%x\n", &global_var );

printf( "Sleeping infinitely; my pid is %d\n", getpid() );

while( 1 )

sleep( 5 );

return 0;

}

Compiling and running this program produces the following output:

编译和运行此程序会产生以下输出:

Address of global_var is 0x8049570

Sleeping infinitely; my pid is 4472

While this program sleeps, examining /proc/4472/maps shows the following:

当程序休眠时，检查/ proc / 4472 / maps显示以下内容：

08048000-08049000 r-xp 00000000 03:08 130200 /home/dbehman/testing/d

08049000-0804a000 rw-p 00000000 03:08 130200 /home/dbehman/testing/d

40000000-40018000 r-xp 00000000 03:08 6664 /lib/ld-2.3.2.so

40018000-40019000 rw-p 00017000 03:08 6664 /lib/ld-2.3.2.so

40019000-4001b000 rw-p 00000000 00:00 0

40028000-40154000 r-xp 00000000 03:08 6661 /lib/i686/libc.so.6

40154000-40159000 rw-p 0012c000 03:08 6661 /lib/i686/libc.so.6

40159000-4015b000 rw-p 00000000 00:00 0

bfffe000-c0000000 rwxp fffff000 00:00 0

We see that the address of the global variable does indeed fall within the data segment address mapping range of 0x08049000 - 080804a000. Two other very common types of variables are stack and heap variables. Stack variables will be discussed in the Stack Section further below, and heap variables will be discussed next.

我们看到全局变量的地址确实位于 0x08049000-080804a000 的数据段地址映射范围内。其他两种非常常见的变量类型是堆栈和堆变量。堆栈变量将在下面的堆栈部分中进行讨论, 接下来将讨论堆变量。

3.2.2.1.3. Heap Segment

As the name implies, this segment holds a program’s heap variables. Heap variables are those that have their memory dynamically allocated via programming APIs such as malloc() and new(). Both of these APIs call the brk() system call to extend the end of the segment to accommodate the memory requested. This segment also contains the bss section, which is a special section that contains uninitialized global variables. The reason why a separate section to the data section is used for these types of variables is because space can be saved in the file’s on-disk image because no value needs to be stored in association with the variable. This is also why the bss segment is located at the end of the executable’s mappings — space is only allocated in memory when these variables get mapped. The following program demonstrates how variable declarations in source code correspond to the heap segment.

顾名思义, 此段包含程序的堆变量。堆变量是那些通过 api函数 (如 malloc () 和 new ()) 动态分配的内存。这两个 api 函数都调用 brk () 系统调用来扩展段的末尾以适应所请求的内存。此段还包含 bss 部分, 它是一个包含未初始化的全局变量的特殊节。为这些类型的变量使用单独的部分的原因是因为空间可以保存在文件的磁盘映像中, 因为不需要存储与变量关联的值。这也是为什么 bss 段位于可执行文件映射的末尾的原因：空间仅在这些变量得到映射时在内存中分配。下面的程序演示源代码中的变量声明如何对应于堆段。

#include <stdio.h>

int g_bssVar;

int main( void )

{

char *pHeapVar = NULL;

char szSysCmd[128];

sprintf( sysCmd, "cat /proc/%d/maps", getpid() );

printf( "Address of bss_var is 0x%x\n", &bss_var );

printf( "sbrk( 0 ) value before malloc is 0x%x\n", sbrk( 0 ));

printf( "My maps file before the malloc call is:\n" );

system( sysCmd );

printf( "Calling malloc to get 1024 bytes for heap_var\n" );

heap_var = (char*)malloc( 1024 );

printf( "Address of heap_var after malloc is 0x%x\n",

heap_var );

printf( "sbrk( 0 ) value after malloc is 0x%x\n", sbrk( 0 ));

printf( "My maps file after the malloc call is:\n" );

system( sysCmd );

return 0;

}

Note: Notice the unusual variable naming convention used. This is taken from what’s called “Hungarian Notation,” which is used to embed indications of the type and scope of the variable in the name itself. For example, sz means NULL terminated string, p means pointer, and g_ means global in scope.

提示: 注意使用的异常变量命名约定。这是所谓的 "匈牙利命名法", 这是用嵌入到名字的符号来表示变量的类型和范围。例如, sz 表示 NULL 终止字符串, p 表示指针, g_ 表示全局变量。

Compiling and running this program produces the following output:

编译和运行此程序会产生以下输出:

Code View: Scroll / Show All

penguin> ./heapseg

Address of g_bssVar is 0x8049944

sbrk( 0 ) value before malloc is 0x8049948

My maps file before the malloc call is:

08048000-08049000 r-xp 00000000 03:08 130260 /home/dbehman/book/src/heapseg

08049000-0804a000 rw-p 00000000 03:08 130260 /home/dbehman/book/src/heapseg

40000000-40018000 r-xp 00000000 03:08 6664 /lib/ld-2.3.2.so

40018000-40019000 rw-p 00017000 03:08 6664 /lib/ld-2.3.2.so

40019000-4001b000 rw-p 00000000 00:00 0

40028000-40154000 r-xp 00000000 03:08 6661 /lib/i686/libc.so.6

40154000-40159000 rw-p 0012c000 03:08 6661 /lib/i686/libc.so.6

40159000-4015b000 rw-p 00000000 00:00 0

bfffe000-c0000000 rwxp fffff000 00:00 0

Calling malloc to get 1024 bytes for pHeapVar

Address of pHeapVar after malloc is 0x8049998

sbrk( 0 ) value after malloc is 0x806b000

My maps file after the malloc call is:

08048000-08049000 r-xp 00000000 03:08 130260 /home/dbehman/book/src/heapseg

08049000-0804a000 rw-p 00000000 03:08 130260 /home/dbehman/book/src/heapseg

0804a000-0806b000 rwxp 00000000 00:00 0

40000000-40018000 r-xp 00000000 03:08 6664 /lib/ld-2.3.2.so

40018000-40019000 rw-p 00017000 03:08 6664 /lib/ld-2.3.2.so

40019000-4001b000 rw-p 00000000 00:00 0

40028000-40154000 r-xp 00000000 03:08 6661 /lib/i686/libc.so.6

40154000-40159000 rw-p 0012c000 03:08 6661 /lib/i686/libc.so.6

40159000-4015b000 rw-p 00000000 00:00 0

bfffe000-c0000000 rwxp fffff000 00:00 0

When examining this output, it may seem that a contradiction exists as to where the bss section actually exists. I’ve written that it exists in the heap segment, but the preceding output shows that the address of the bss variable lives in data segment (that is, 0x8049948 lies within the address range 0x08049000-0x0804a000). The reason for this is that there is unused space at the end of the data segment, due to the small size of the example and the small number of global variables declared, so the bss segment appears in the data segment to limit wasted space. This fact in no way changes its properties.

在检查这一输出时, 似乎存在着一个矛盾, 那就是 "bss" 部分实际上存在的地方。我已经写了它存在于堆段中, 但前面的输出显示, bss 变量的地址存在数据段中 (即0x8049948 位于地址范围0x08049000-0x0804a000 中)。原因是在数据段的末尾有未使用的空间, 这是因为示例的大小和声明的全局变量的个数很少, 因此 bss 段出现在数据段中以限制空间的浪费。这个事实绝不会改变它的属性。

Note: As will be discussed in Chapter 9, the curious reader can verify that g_bssVar’s address of 0x08049944 is in fact in the .bss section by examining readelf - e <exe_name> output and searching for where the .bss section begins. In our example, the .bss section header is at 0x08049940.

注: 如9章所述, 好奇的读者可以通过检查 readelf –e <exe_name>的输出并搜索. bss 部分的起始位置来验证 g_bssVar 的0x08049944 地址实际在.bss段。在我们的示例中,. bss 段的头位于 0x08049940.

Also done to limit wasted space in this example, the brk pointer (determined by calling sbrk with a parameter of 0) appears in the data segment when we would expect to see it in the heap segment. The moral of this example is that the three separate entries in the maps files for the exe do not necessarily correspond to hard segment ranges; rather they are more of a soft guide.

在本示例中还做了限制空间的浪费, 当我们期望在堆段中看到它时, brk 指针 (通过调用参数为0的sbrk 确定) 出现在数据段中。这个例子的寓意是, exe 的映射文件中的三个单独的条目不一定对应于段范围;相反, 他们更象是一个软性规定。

The next important thing to note from this output is that before the malloc call, the heapseg executable only had two entries in the maps file. This meant that there was no heap at that particular point in time. After the malloc call, we now see the third line, which represents the heap segment. Next we see that after the malloc call, the brk pointer is now pointing to the end of the range reported in the maps file, 0x0806b000. Now you may be a bit confused because the brk pointer moved from 0x08049948 to 0x0806b000 which is a total of 136888 bytes. This is an awful lot more than the 1024 that we requested, so what happened? Malloc is smart enough to know that it’s quite likely that more heap memory will be required by the program in the future so rather than continuously calling the expensive brk() system call to move the pointer for every malloc call, it asks for a much larger chunk of memory than immediately needed. This way, when malloc is called again to get a relatively small chunk of memory, brk() need not be called again, and malloc can just return some of this extra memory. Doing this provides a huge performance boost, especially if the program requests many small chunks of memory via malloc calls.

该输出中要注意的下一个重要事项是, 在 malloc 调用之前, heapseg 可执行文件仅有两个入口在映射文件中。这意味着在那个特定的时间点没有堆。在 malloc 调用之后, 我们现在看第三行, 它表示堆段。接下来, 我们将看到, 在 malloc 调用之后, brk 指针现在指向映射文件0x0806b000 中报告的范围的末尾。现在, 您可能有点迷惑, 因为 brk 指针从0x08049948 移动到 0x0806b000, 这总共是136888个字节。这比我们要求的1024还要多, 怎么了？Malloc 非常聪明, 可以知道该程序在将来可能需要更多的堆内存, 而不是不断地调用昂贵的 brk () 系统调用来移动每个 malloc 调用的指针, 它要求更大的内存块。这样, 当 malloc 再次被调用以获得相对较小的内存块时, brk () 就不必再调用了, 而 malloc 可以返回一些额外的内存。这样做可以获得巨大的性能提升, 特别是当程序通过 malloc 调用请求许多小块内存时。

3.2.2.1.4. Mapped Base / Shared Libraries

Continuing our examination of the maps file, the next point of interest is what’s commonly referred to as the mapped base address, which defines where the shared libraries for an executable get loaded. In standard kernel source code (as downloaded from kernel.org), the mapped base address is a hardcoded location defined as TASK_UNMAPPED_BASE in each architecture’s processor.h header file. For example, in the 2.6.0 kernel source code, the file, include/asm-i386/processor.h, contains the definition:

继续检查映射文件, 下一个感兴趣的点是通常称为映射的基址, 它定义了可执行文件的共享库的加载位置。在标准内核源代码 (从 kernel.org 下载) 中, 映射的基地址是在每个体系结构的process. h 头文件中定义为 TASK_UNMAPPED_BASE 的硬编码位置。例如, 在2.6.0 内核源代码中, 文件include/asm i386/process. h 包含定义:

/* This decides where the kernel will search for a free chunk of vm

* space during mmap's.

#define TASK_UNMAPPED_BASE (PAGE_ALIGN(TASK_SIZE / 3))

Resolving the definitions of PAGE_ALIGN and TASK_SIZE, this equates to 0x40000000. Note that some distributions such as SuSE include a patch that allows this value to be dynamically modified. See the discussion on the /proc/ <pid>/mapped_base file in this chapter. Continuing our examination of the mapped base, let’s look at the maps file for bash again:

解决 PAGE_ALIGN 和 TASK_SIZE 的定义, 这等同于0x40000000。请注意, 某些发行版 (如 SuSE) 包含了允许动态修改此值的修补程序。请参见/proc/<pid>/mapped_base 文件的讨论。继续我们对映射基的检查, 让我们再次查看 "bash" 的映射文件:

Code View: Scroll / Show All

08048000-080b6000 r-xp 00000000 03:08 10667 /bin/bash

080b6000-080b9000 rw-p 0006e000 03:08 10667 /bin/bash

080b9000-08101000 rwxp 00000000 00:00 0

40000000-40018000 r-xp 00000000 03:08 6664 /lib/ld-2.3.2.so

40018000-40019000 rw-p 00017000 03:08 6664 /lib/ld-2.3.2.so

40019000-4001a000 rw-p 00000000 00:00 0

4001a000-4001b000 r--p 00000000 03:08 8598 /usr/lib/locale/en_US/LC_IDENTIFICATION

4001b000-4001c000 r--p 00000000 03:08 9920 /usr/lib/locale/en_US/LC_MEASUREMENT

4001c000-4001d000 r--p 00000000 03:08 9917 /usr/lib/locale/en_US/LC_TELEPHONE

4001d000-4001e000 r--p 00000000 03:08 9921 /usr/lib/locale/en_US/LC_ADDRESS

4001e000-4001f000 r--p 00000000 03:08 9918 /usr/lib/locale/en_US/LC_NAME

4001f000-40020000 r--p 00000000 03:08 9939 /usr/lib/locale/en_US/LC_PAPER

40020000-40021000 r--p 00000000 03:08 9953 /usr/lib/locale/en_US/LC_MESSAGES/SYS_LC_MESSAGES

40021000-40022000 r--p 00000000 03:08 9919 /usr/lib/locale/en_US/LC_MONETARY

40022000-40028000 r--p 00000000 03:08 10057 /usr/lib/locale/en_US/LC_COLLATE

40028000-40050000 r-xp 00000000 03:08 10434 /lib/libreadline.so.4.3

40050000-40054000 rw-p 00028000 03:08 10434 /lib/libreadline.so.4.3

40054000-40055000 rw-p 00000000 00:00 0

40055000-4005b000 r-xp 00000000 03:08 10432 /lib/libhistory.so.4.3

4005b000-4005c000 rw-p 00005000 03:08 10432 /lib/libhistory.so.4.3

4005c000-40096000 r-xp 00000000 03:08 6788 /lib/libncurses.so.5.3

40096000-400a1000 rw-p 00039000 03:08 6788 /lib/libncurses.so.5.3

400a1000-400a2000 rw-p 00000000 00:00 0

400a2000-400a4000 r-xp 00000000 03:08 6673 /lib/libdl.so.2

400a4000-400a5000 rw-p 00002000 03:08 6673 /lib/libdl.so.2

400a5000-401d1000 r-xp 00000000 03:08 6661 /lib/i686/libc.so.6

401d1000-401d6000 rw-p 0012c000 03:08 6661 /lib/i686/libc.so.6

401d6000-401d9000 rw-p 00000000 00:00 0

401d9000-401da000 r--p 00000000 03:08 8600 /usr/lib/locale/en_US/LC_TIME

401da000-401db000 r--p 00000000 03:08 9952 /usr/lib/locale/en_US/LC_NUMERIC

401db000-40207000 r--p 00000000 03:08 10056 /usr/lib/locale/en_US/LC_CTYPE

40207000-4020d000 r--s 00000000 03:08 8051 /usr/lib/gconv/gconv-modules.cache

4020d000-4020f000 r-xp 00000000 03:08 8002 /usr/lib/gconv/ISO8859-1.so

4020f000-40210000 rw-p 00001000 03:08 8002 /usr/lib/gconv/ISO8859-1.so

40210000-40212000 rw-p 00000000 00:00 0

bfffa000-c0000000 rwxp ffffb000 00:00 0

Note the line:

40000000-40018000 r-xp 00000000 03:08 6664 /lib/ld-2.3.2.so

This shows us that /lib/ld-2.3.2.so was the first shared library to be loaded when this process began. /lib/ld-2.3.2.so is the linker itself, so this makes perfect sense and in fact is the case in all executables that dynamically link in shared libraries. Basically what happens is that when creating an executable that will link in one or more shared libraries, the linker is implicitly linked into the executable as well. Because the linker is responsible for resolving all external symbols in the linked shared libraries, it must be mapped into memory first, which is why it will always be the first shared library to show up in the maps file.

这告诉我们, 当这个进程开始时, /lib/ld-2.3. 2.so是第一个被加载共享的库。/lib/ld-2.3. 2.so 链接器本身也是如此, 所以这是完全意义上的, 实际上是所有可执行文件使用共享库中动态链接的情况。基本上, 当创建一个可执行文件将在一个或多个共享库中链接时, 链接器也会隐式链接到可执行文件中。由于链接器负责解析链接共享库中的所有外部符号, 因此必须首先将其映射到内存中, 这就是为什么它始终是在映射文件中显示的第一个共享库。

After the linker, all shared libraries that an executable depends upon will appear in the maps file. You can check to see what an executable needs without running it and looking at the maps file by running the ldd command as shown here:

在链接器之后, 可执行文件所依赖的所有共享库都将显示在映射文件中。您可以通过运行 ldd 命令来检查可执行文件, 而不运行它并查看映射文件, 如下所示:

penguin> ldd /bin/bash

libreadline.so.4 => /lib/libreadline.so.4 (0x40028000)

libhistory.so.4 => /lib/libhistory.so.4 (0x40055000)

libncurses.so.5 => /lib/libncurses.so.5 (0x4005c000)

libdl.so.2 => /lib/libdl.so.2 (0x400a2000)

libc.so.6 => /lib/i686/libc.so.6 (0x400a5000)

/lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)

You can now correlate the list of libraries and their addresses to Figure 3.1 and see what they look like in the maps file.

现在, 您可以将库及其地址的列表与图3.1 关联起来, 并在映射文件中查看它们的样子。

Note: ldd is actually a script that does many things, but the main thing it does is it sets the LD_TRACE_LOADED_OBJECTS environment variable to non-NULL. Try the following sequence of commands and see what happens:

注意:ldd 实际上是一个脚本, 它做了很多事情, 但它所做的主要事情是将 LD_TRACE_LOADED_OBJECTS 环境变量设置为非 NULL。请尝试下面的命令序列, 看看会发生什么:

export LD_TRACE_LOADED_OBJECTS=1

less

Note: Be sure to do an unset LD_TRACE_LOADED_OBJECTS to return things to normal.

注意: 一定要做一个未设置的 LD_TRACE_LOADED_OBJECTS 将事情恢复到正常。

But what about all those extra LC_ lines in the maps file in Figure 3.1? As the full path indicates, they are all special mappings used by libc’s locale functionality. The glibc library call, setlocale(3), prepares the executable for localization functionality based on the parameters passed to the call. Compiling and running the following source will demonstrate this.

但图3.1 中映射文件中的所有额外 LC_ 行呢？正如完整路径所示, 它们都是 libc 的地域设置功能使用的特殊映射。glibc 库调用, setlocale (3), 根据传递给调用的参数为本地化功能准备可执行文件。编译并运行以下源代码将演示此操作。

#include <stdio.h>

#include <locale.h>

int main( void )

{

char szCommand[64];

setlocale( LC_ALL, "en_US" );

sprintf( szCommand, "cat /proc/%d/maps", getpid() );

system( szCommand );

return 0;

}

Running the program produces the following output:

运行该程序会产生以下输出:

Code View: Scroll / Show All

08048000-08049000 r-xp 00000000 03:08 206928 /home/dbehman/book/src/l

08049000-0804a000 rw-p 00000000 03:08 206928 /home/dbehman/book/src/l