Nice! Is it impossible to run programs without memory? After reading this article, you can make your conclusion

Main memory (RAM) is a very important resource, and memory must be treated with care. Although the growth rate of most memory is much faster than that of the IBM 7094, the growth of program size is much faster than the growth of memory. As Parkinson's law says: No matter how large the memory is, the growth rate of the program size is much faster than the growth rate of the memory capacity. Let's explore how the operating system creates memory and manages them.

After years of discussion, people have proposed a memory hierarchy. The following is the classification of the hierarchical system

Nice!  Is it impossible to run programs without memory?  After reading this article, you can make your conclusion

 

The top-level memory has the highest speed, but the smallest capacity, and the cost is very high. The lower the hierarchical structure, the slower the access efficiency and the larger the capacity, but the cheaper the cost.

The part of the operating system that manages the memory hierarchy is called the memory manager. Its main job is to effectively manage memory, record which memory is in use, allocate memory when the process needs it, and reclaim memory when the process is complete .

Below we will discuss different memory management models, from simple to complex. Since the lowest level of cache is managed by hardware, we mainly discuss the main memory model and how to manage the main memory.

No memory abstraction

The simplest memory abstraction is no storage. Early mainframe computers (before the 1960s), small computers (before the 1970s), and personal computers (before the 1980s) had no memory abstraction. Every program directly accesses physical memory. When a program executes the following command:

MOV REGISTER1, 1000

The computer will move the contents of the physical memory at location 1000 to REGISTER1. Therefore, the memory model presented to the programmer at that time was physical memory. The memory address starts from 0 to the maximum value of the memory address, and each address contains an 8-bit unit.

Therefore, the computer in this case cannot have two applications in the memory at the same time. If the first program writes a value to this location of memory address 2000, then this value will replace the value of the second program at that location. Therefore, running two applications at the same time will not work. The program will crash immediately.

Nice!  Is it impossible to run programs without memory?  After reading this article, you can make your conclusion

 

But even if the memory model is physical memory, there are still some options. Three variants are shown below

Nice!  Is it impossible to run programs without memory?  After reading this article, you can make your conclusion

 

In Figure a, the operating system is located at the bottom of RAM (Random Access Memory), or at the top of ROM (Read-Only Memory) as in Figure b; while in Figure c, the device driver is located in the top ROM, and The operating system is located in RAM at the bottom. The model in Figure a was previously used on mainframes and minicomputers, but it is rarely used now; the model in Figure b is generally used in handheld computers or embedded systems. The third model was used in early personal computers. A part of the ROM system becomes the BIOS (Basic Input Output System). The disadvantage of models a and c is that errors in the user program may damage the operating system and may lead to disastrous consequences.

When the system is organized in this way, usually only one thread is running at a time. Once the user types a command, the operating system copies the required program from the disk to the memory and executes it; when the process ends, the operating system displays a prompt on the user terminal and waits for a new command. After receiving the new command, it loads the new program into the memory and overwrites the previous program.

One way to achieve parallelism in a system without memory abstraction is to use multi-threaded programming. Since multiple threads in the same process share the same memory image internally, parallelism is not a problem.

Run multiple programs

However, even without memory abstraction, it is possible to run multiple programs simultaneously. The operating system only needs to save all the contents of the current memory to a disk file, and then read the program into the memory. As long as there is only one program at a time, there will be no conflicts.

With the help of additional special hardware, multiple programs can be run in parallel even if there is no swap function. This is how the early model of IBM 360 was solved

System/360 is an epoch-making large computer launched by IBM on April 7, 1964. This series is the world's first instruction set compatible computer.

Nice!  Is it impossible to run programs without memory?  After reading this article, you can make your conclusion

 

In IBM 360, the memory is divided into 2KB area blocks, and each area is assigned a 4-bit protection key, which is stored in a special register of the CPU. A machine with 1 MB of memory only needs 512 such 4-bit registers, with a total capacity of 256 bytes (this will count it.) There is a 4-bit code in PSW (Program Status Word). If a running process accesses a memory with a key different from its PSW code, 360 hardware will find this situation because only the operating system can modify the protection key, which can prevent interference between processes, user processes and operating systems.

This solution has a flaw. As shown below, suppose there are two programs, each of which is 16 KB in size

Nice!  Is it impossible to run programs without memory?  After reading this article, you can make your conclusion

 

It can be seen from the figure that this is the loading process of two different 16KB programs. Program a will first jump to address 24, where is a MOV instruction, but program b will first jump to address 28, address 28 is a line CMP instruction. This is a situation where two programs are loaded into the memory one after another. If the two programs are loaded into the memory at the same time and start execution from address 0, the state of the memory is as shown in Figure c above. The program is loaded and starts to run. A program runs from address 0 first, executes the JMP 24 instruction, and then executes the following instructions in sequence (many instructions are not shown). After a period of time, the first program is executed, and then the second program is executed. The first instruction of the second program is 28. This instruction will cause the program to jump to the ADD of the first program instead of the jump instruction CMP set in advance. Due to incorrect access to the memory address, this program It may crash within 1 second.

The core problem with the simultaneous execution of the above two programs is that they both reference absolute physical addresses. This is not what we want to see. What we want is that every program will reference a private local address. When the second program is loaded into memory, IBM 360 uses a technique called static relocation to modify it. Its workflow is as follows: When a program is loaded to 16384 address, the constant 16384 is added to each program address (so JMP 28 will become JMP 16412). Although this mechanism is feasible without errors, it is not a general solution and will slow down the loading speed. More recently, it requires additional information in all executable programs to indicate which ones contain (relocatable) addresses and which ones do not contain (relocatable) addresses. After all, the JMP 28 in Figure b above can be redirected (modified), and something like MOV REGISTER1,28 will move the number 28 to the REGISTER without redirection. Therefore, the loader needs a certain ability to distinguish between addresses and constants.

A memory abstraction: address space

Exposing physical memory to the process has several major disadvantages: The first problem is that if user programs can address every byte of memory, they can easily damage the operating system and stop the system (unless Use IBM 360 lock-and-key mode or special hardware for protection). This problem exists even when there is only one user process running.

The second point is that it is very difficult for this model to run multiple programs (if there is only one CPU, it is sequential execution). On a personal computer, many applications are usually opened, such as input methods, emails, and browsers. , These processes will have a process running at different times, and other applications can be awakened by the mouse. It is difficult to achieve without physical memory in the system.

The concept of address space

If you want to make multiple applications run in memory at the same time, you must solve two problems: protection and relocation. Let's see how the IBM 360 solves it: The first solution is to mark the memory block with a protection key and compare the key of the execution process with the key of each stored word extracted. This method can only solve the first problem, but it still cannot solve the problem of multiple processes running in memory at the same time.

A better way is to create a memory abstraction: the address space. Just as the concept of process creates an abstract CPU to run programs, the address space also creates an abstract memory for programs to use. The address space is the set of addresses that a process can use to address memory. Each process has its own address space, independent of the address space of other processes, but some processes may wish to share the address space.

Base register and index register

The easiest way is to use dynamic relocation (dynamic relocation), which is to map the address space of each process to a different area of ​​physical memory in a simple way. The classic method used from CDC 6600 (the earliest supercomputer in the world) to Intel 8088 (the core of the original IBM PC) is to configure each CPU with two special hardware registers, usually called the basic register and the index Register (limit register). When the base register and index register are used, the program will be loaded into a continuous space location in the memory and there is no need to relocate during loading. When a process is running, the starting physical address of the program is loaded into the base address register, and the length of the program is loaded into the index register. In the above figure c, when a program is running, the values ​​of the base and index registers loaded into these hardware registers are 0 and 16384, respectively. When the second program runs, these values ​​are 16384 and 32768, respectively. If the third 16 KB program is directly loaded onto the address of the second program and executed, the value of the base register and index register will be 32768 and 16384 at this time. Then we can summarize

  • Base address register: the starting position of the data memory
  • Index register: Store the length of the application program.

Whenever a process references memory to obtain instructions or read or write data words, the CPU hardware automatically adds the base address value to the address generated by the process, and then sends it to the memory bus. At the same time, it checks whether the address provided by the program is equal to or greater than the value in the index register. If the address provided by the program exceeds the range of the index register, an error will occur and the access will be aborted. In this way, after executing the JMP 28 instruction in Figure c above, the hardware will interpret it as JMP 16412, so the program can jump to the CMP instruction, the process is as follows

Nice!  Is it impossible to run programs without memory?  After reading this article, you can make your conclusion

 

Using base register and index register is a very good way to provide private address space for each process, because each memory address will be added to the contents of the base register before it is sent to the memory. In many actual systems, the base register and index register will be protected in a certain way, so that only the operating system can modify them. CDC 6600 provides protection for these registers, but Intel 8088 does not, or even index registers. However, Intel 8088 provides many base address registers so that program code and data can be independently relocated, but it does not provide protection for out-of-range memory references.

So you can know the disadvantages of using the base register and index register, every time the memory is accessed, ADD and CMP operations will be performed. The comparison can be performed very quickly, but the addition will be relatively slow. Unless a special addition circuit is used, the addition will be slower due to the carry propagation time.

Exchange technology

If the physical memory of the computer is large enough to accommodate all the processes, the solution mentioned earlier is more or less feasible. But in fact, the total RAM capacity required by all processes is much higher than the memory capacity. In Windows, OS X, or Linux systems, after the computer completes the boot (Boot), about 50-100 processes are started. For example, when a Windows application is installed, it usually issues a command so that when the subsequent system starts, it will start a process that does nothing except checking for updates to the application. A simple application may take up 5-10MB of memory. Other background processes check e-mail, network connections, and many other such tasks. All this will happen before the first user starts. Today, important user applications like Photoshop only require 500 MB to start, but once they start processing data, they require many GB to process. From the result point of view, keeping all processes in memory at all times requires a lot of memory, and it cannot be done if there is insufficient memory.

Therefore, in view of the above problem of insufficient memory, two methods are proposed: the simplest method is the swapping technology, which is to transfer a process completely into the memory, then run it in the memory for a period of time, and then put it Back to the disk. Idle processes are stored on disk, so these processes will not take up much memory when they are not running. Another strategy is called virtual memory (virtual memory), virtual memory technology can allow part of the application to run in memory. Let’s first discuss exchange

Exchange process

The following is an exchange process

Nice!  Is it impossible to run programs without memory?  After reading this article, you can make your conclusion

 

At the beginning, only process A was in memory, and then process B and process C were created or swapped into memory from disk. Then in Figure d, A was swapped out of memory to disk, and finally A came in again. Because process A in Figure g has now reached a different location, it needs to be relocated during the loading process, or executed by software during program exchange; or by hardware during program execution. Base register and index register are suitable for this situation.

Nice!  Is it impossible to run programs without memory?  After reading this article, you can make your conclusion

 

Swap creates multiple holes in the memory, and the memory moves all the free areas down as much as possible and merges them into one large free area. This technology is called memory compaction (memory compaction). But this technique is usually not used because it consumes a lot of CPU time. For example, if you copy 8 bytes every 8ns on a machine with 16GB of memory, it takes about 16s to shrink all the memory.

One issue worth noting is how much memory should be allocated for a process when it is created or swapped into memory. If the size of the process is fixed after it is created and does not change, then the allocation strategy is relatively simple: the operating system will allocate exactly the size it needs.

But if the data segment of the process can grow automatically, for example, by dynamically allocating memory in the heap, problems will definitely occur. Let me mention again what a data segment is. From the logical level, the operating system divides the data into different segments (different areas) to store:

  • Code segment (codesegment/textsegment):

Also known as a text segment, a memory space used to store instructions and run code

The size of this space is determined before the code runs

The memory space is generally read-only, and the code of some architectures is also allowed to be writable

In the code segment, there may also be some read-only constant variables, such as string constants.

  • Data segment (datasegment):

Read and write

Store initialized global variables and initialized static variables

The lifetime of the data in the data segment is continuous with the program (continued with the process): continuous with the process: the process exists when it is created, and it disappears when the process dies

  • bss段(bsssegment):

Read and write

Store uninitialized global variables and uninitialized static variables

The lifetime of the data in the bss segment is continuous with the process

The data in the bss segment generally defaults to 0

  • rodata 段 :

Read-only data such as the format string in the printf statement and the jump table of the switch statement. That is, the constant area. For example, const int ival = 10 in the global scope, ival is stored in the .rodata section; another example is the format string in the statement of printf("Hello world %d\n", c); in the local scope of the function" Hello world %d\n" is also stored in the .rodata section.

  • Stack:

Read and write

Stores local variables (non-static variables) in functions or codes

The lifetime of the stack continues with the code block. Space is allocated to you when the code block runs, and the space is automatically reclaimed when the code block ends.

  • Heap:

Read and write

Stores the malloc/realloc space dynamically allocated during program operation

The lifetime of the heap continues with the process, from malloc/realloc to free

Below is the result after we compiled with Borland C++

_TEXT	segment dword public use32 'CODE'
_TEXT	ends
_DATA	segment dword public use32 'DATA'
_DATA	ends
_BSS	segment dword public use32 'BSS'
_BSS	ends

Segment definition (segment) is used to distinguish or divide the scope of the meaning. The segment directive of assembly language indicates the beginning of the segment definition, and the ends directive indicates the end of the segment definition. Segment definition is a continuous memory space

Therefore, there are three ways to deal with the memory area that automatically grows

  • If a process is adjacent to the free area, the free area can be allocated to the process for its growth.
  • If the process is adjacent to another process, there are two ways to deal with it: either move the process that needs to grow to a large enough free area in the memory, or swap out one or more processes, which has become a spawn A large free area.
  • If a process cannot grow in memory, and the swap area on the disk is also full, then the process only has some free space suspended (or the process can be terminated)

Nice!  Is it impossible to run programs without memory?  After reading this article, you can make your conclusion

 

The above method is only for a single or a small number of processes that need to grow. If most of the processes need to grow at runtime, in order to reduce the overhead of process exchange and movement caused by insufficient memory area, an available method Yes, allocate some extra memory for it when swapping in or moving a process. However, when a process is swapped out to disk, only the actually used memory should be swapped. It is a waste to swap the extra memory. The following is a memory configuration that allocates growth space for two processes.

Nice!  Is it impossible to run programs without memory?  After reading this article, you can make your conclusion

 

If the process has two segments that can grow, for example, a data segment (data segment) used as a heap (global variable) for variable dynamic allocation and release, and a stack segment (stack segment) storing local variables and return addresses , As shown in Figure b. In the figure, you can see that the stack segment of the process shown grows downward at the top of the memory occupied by the process, and then the data segment immediately after the program segment grows upward. When the memory area reserved for growth is not enough, the processing method is the same as the above flowchart (three processing methods for data segment automatic growth).

Free memory management

When the memory is dynamically allocated, the operating system must manage it. Generally speaking, there are two ways to monitor memory usage

  • Bitmap
  • Free lists

Let’s explore these two ways of using

Storage management using bitmaps

When using the bitmap method, the memory may be divided into allocation units as small as a few words or as large as several kilobytes. Each allocation unit corresponds to a bit in the bitmap, 0 means free, 1 means occupied (or vice versa). A memory area and its corresponding bitmap are as follows

Nice!  Is it impossible to run programs without memory?  After reading this article, you can make your conclusion

 

Figure a shows a section of memory with 5 processes and 3 free areas, the scale is the memory allocation unit, and the shaded area represents free (indicated by 0 in the bitmap); Figure b represents the corresponding bitmap; Figure c represents it with a linked list The same information

The size of the allocation unit is an important design factor. The smaller the allocation unit, the larger the bitmap. However, even if there is only a 4-byte allocation unit, 32-bit memory only needs 1 bit in the bitmap. 32n-bit memory requires n-bit bitmaps, so 1 bitmap only occupies 1/32 of the memory . If you choose a larger memory cell, the bitmap should be smaller. If the size of the process is not an integer multiple of the allocation unit, a lot of memory will be wasted in the last allocation unit.

Bitmaps provide an easy way to track memory usage in a fixed-size memory, because the size of the bitmap depends on the size of the memory and allocation unit . One problem with this method is that when it is decided to put a process with k allocation units into memory, the memory manager must search the bitmap and find k consecutive 0 bits in the bitmap. string. It is a very time-consuming operation to find consecutive 0 strings of a specified length in the bitmap, which is a disadvantage of the bitmap. (It can be simply understood as finding out a long list of free array elements in a chaotic array)

Use linked lists for management

Another way to record memory usage is to maintain a linked list of allocated memory segments and free memory segments. The segment will contain the process or the free area of ​​the two processes. The above figure c  can be used to represent the memory usage . Each item in the linked list can represent a free area (H) or the starting mark of the process (P), the length and the position of the next linked list item.

In this example, the segment list (segment list) is sorted by address. The advantage of this approach is that it is simple to update the list when the process terminates or is swapped. A terminating process usually has two neighbors (except for the top and bottom of the memory). Adjacent may be processes or free areas. There are four combinations of them.

Nice!  Is it impossible to run programs without memory?  After reading this article, you can make your conclusion

 

When storing processes and free areas in the linked list in order of address, there are several algorithms that can allocate memory for the created process (or the process swapped in from the disk). We first assume that the memory manager knows how much memory should be allocated. The simplest algorithm is to use first fit. The memory manager will scan along the segment list until it finds a sufficiently large free area. Unless the size of the free area is the same as the size of the space to be allocated, the free area is divided into two parts, one for the process and the other to generate a new free area. The first adaptation algorithm is a very fast algorithm because it searches the linked list as much as possible.

A small variant of the first fit is the next fit. It works in the same way as the first match. The only difference is that the next adaptation will record the current position every time a suitable free area is found, so that the next time the free area is searched, it will start from where it ended last time. Search instead of searching from the beginning every time like the first matching algorithm. Bays (1997) proved that the performance of the next algorithm is slightly lower than the first matching algorithm.

Another well-known and widely used algorithm is best fit. The best fit will search the entire linked list from beginning to end to find the smallest free area that can accommodate the process. The best fit algorithm will try to find the free area closest to the actual need to best match the request with the available free area, instead of splitting a large free area that may be used later. For example, now we need a block of size 2, then the first matching algorithm will allocate this block in the free area at position 5, and the best fit algorithm will allocate the block in the free area at position 18, as follows

Nice!  Is it impossible to run programs without memory?  After reading this article, you can make your conclusion

 

So what is the performance of the best fit algorithm? The best fit will traverse the entire linked list, so the performance of the best fit algorithm is worse than the first match algorithm. But what is unexpected is that the best matching algorithm wastes more memory than the first matching and next matching algorithms, because it will generate a lot of useless small buffers, and the free area generated by the first matching algorithm will be larger.

The best fit free area will split into many very small buffers. In order to avoid this problem, consider using the worst fit algorithm. That is, the largest memory area is always allocated (so you now understand why the best fit algorithm splits many small buffers), so that the newly allocated free area is larger so that it can continue to be used. The simulation program shows that the worst-fit algorithm is not a good idea either.

If separate linked lists are maintained for processes and idles, the speed of these four algorithms can be improved. In this way, the goal of these four algorithms is to check the free area rather than the process. However, an inevitable price of this increase in allocation speed is increased complexity and slower memory release speed, because a reclaimed segment must be deleted from the process list and inserted into the free list area.

If the process and the free area use different linked lists, the free area linked list can be sorted according to the size, so as to improve the speed of the optimal adaptation algorithm. When using the best fit algorithm to search the list of free areas arranged from small to large, as long as a suitable free area is found, this free area is the smallest free area that can accommodate the job, so it is the best match. Because the free area linked list is organized as a singly linked list, no further search is required. When the free area linked list is sorted by size, the first adaptation algorithm is as fast as the best adaptation algorithm, and the next adaptation algorithm is meaningless here.

Another allocation algorithm is the quick fit algorithm, which maintains a separate linked list for free areas of commonly used sizes. For example, there is a table with n items. The first item of the table is a pointer to the free area linked list with a size of 4 KB, the second item is a pointer to the free area linked list with a size of 8 KB, and the third item is Pointer to the head of the free area linked list with a size of 12 KB, and so on. For example, a free area such as 21 KB can be placed in a linked list of 20 KB, or it can be placed in a linked list of free areas with a special storage size.

The fast matching algorithm is also very fast to find a free area for a designated consignment sale, but it has the same disadvantage as all solutions that sort the free area by size, that is, when a process terminates or is swapped out, it looks for it. The process of adjacent blocks and checking whether they can be merged is very time-consuming. Without merging, the memory will quickly split into small free areas that a large number of processes cannot use.

Author: Programmer cxuan
link: https: //juejin.im/post/6844904072496037901
Source: Nuggets

Guess you like

Origin blog.csdn.net/GYHYCX/article/details/109323194