[PHP] PHP recursion, efficiency and analysis

Definition of recursion

 

    Recursion (http:/en.wikipedia.org/wiki/Recursive) is a mechanism for a function to call itself (directly or indirectly). This powerful idea can make some complex concepts extremely simple. Outside of computer science, especially in mathematics, the concept of recursion is not uncommon. For example: the Fibonacci sequence, which is most commonly used for recursive explanations, is a very typical example. Others such as class (n!) can also be transformed into a recursive definition (n! = n*(n-1)!) Even in real life, recursive thinking can be seen everywhere: for example, due to academic problems you need the principal’s stamp, but the principal says, "I will stamp the stamp only if the dean of teaching is stamped." When you find the dean, teach The dean also said: "I will stamp only if the dean of the department stamps it"... Until you finally find the head teacher, after getting the bold stamp of the head teacher, you have to return to the dean, the dean of teaching, and finally the principal. Seal, the process is as follows:

 

  Although the stamped story is uninteresting (whose university life has nothing to do with sadness? How can we prove that we are young if we are not sad), but it reflects the basic idea of ​​recursion, that is, the two basic conditions of recursion:

  1. The recursive exit condition is a necessary condition for the normal execution of recursion and a necessary condition for ensuring that the recursion can return correctly. Without this condition, the recursion will continue indefinitely until the resources given by the system are exhausted (in most languages, the stack space is exhausted). Therefore, if you encounter a "stack overflow" (C In the language, errors such as “stack overflow” and “max nest level of 100 reached” (in PHP, exceeding the recursion limit) are mostly due to incorrect exit conditions, resulting in excessive recursion depth or infinite recursion. 2. Recursive process. The recursion from a function call to the next function call. Take n! as an example. In the case of n>1. N! = N*(N-1)! is the recursive process of the recursive function, we can also simply call it "recursive formula".

With these two basic conditions, we get the general pattern of recursion, which can be described by code as:

function Recur( param ){ if( reach the baseCondition ){ Calu();//calculate return;} //else just do it recursively param = modify(param)/modify the parameter, prepare to enter the lower layer to call Recur(param);}

With the general pattern of recursion, we can easily implement most recursive functions. For example: the recursive realization of the Fibonacci sequence, which is often mentioned, and the recursive access of the directory:

function ScanDir($path){	if(is_dir($path)){		$handler = opendir($path);		while($dir = readdir($handler)){			if($dir == '.' || $dir == '..'){				continue;			}			if(is_dir($path."/".$dir)){				ScanDir($path."/".$dir."/");			}else{				echo "file: ".$path."/".$dir.PHP_EOL;			}		}	}}ScanDir("./");

Careful students may find that we use the term "layer" many times in the process of expression. There are two main reasons:

1. In the process of analyzing recursion, people often use the form of recursive tree to analyze the trend of recursive functions. Take the Fibonacci sequence as an example. First, the Fibonacci sequence is defined as:

 

Therefore, in order to get the value of Fab(n), we often need to expand into the form of a "recursive tree", as shown in the following figure:

 

The recursive calculation process is from top to bottom, from left to right, once it reaches the leaf node of the recursive tree (that is, the recursive exit condition), it returns layer by layer. As shown in the figure below (reference URL: http:/www.csharpwin.com/csharpspace/12292r4006.shtml):

 

2. The structure of the stack.

Another important concept related to recursion is the stack. Borrowing the explanation of the stack in Baidu Encyclopedia: " Under Windows, the stack is a data structure extended to lower addresses , which is a contiguous area of ​​memory. This sentence means The address at the top of the stack and the maximum capacity of the stack are pre-defined by the system. Under WINDOWS, the size of the stack is 2M (some say it is 1M, in short, it is a constant determined at compile time). If the requested space exceeds the stack size When there is free space, it will prompt overflow. Therefore, the space that can be obtained from the stack is smaller. "In a linux system, you can also use the ulimit -s command to view the maximum stack size of the system. The stack is characterized by "last in, first out", that is, the last element that is pushed has the highest priority. Each time data is pushed, the stack is stacked on top of each other, and when data is fetched, it is taken from the top of the stack. data. It is this feature of the stack that makes the stack particularly suitable for recursion. Specifically, when the recursive program is running, the system will allocate a stack space of a rated size, and the parameters, local variables, and function return addresses of each function call (called a stack frame) will be pushed into the stack space (called “Protect the scene” so as to “return to the scene” when appropriate), after each recursive call of this layer, it will be unconditional (because of the unconditional, stack overflow attack is possible, please refer to (http:/wenku.baidu. com/view/7fb00bc2d5bbfd0a7956737d.html  ) Return to the previously saved return address to continue executing the code. In this way, the stack structure is like a stack of regular plates:

 

As a basic example of recursion, the following can be used for practice:

 

1. Recursive traversal of directories.

2. Unlimited classification.

3. Binary search and merge sort.

4. PHP built-in functions related to recursive behavior (such as array_merge_recursive, array_walk_recursive, array_replace_recursive, etc., consider their implementation)

 

Understanding recursion-stack trace of function call

 

 

In C language, you can trace the stack of function calls through debugging tools such as GDB, so as to track the running process of the function in detail (for the use of GDB, I recommend @左耳game 's blog: http:/blog.csdn.net/haoel/ article/details/2879  ).

In php, the debugging methods that can be used are:

1. Native print, echo, var_dump, print_r, etc., usually for simpler programs, only need to output key points in the function.

2. Php built-in stack trace functions: debug_backtrace and debug_print_backtrace.

3. Debugging tools such as xdebug and xhprof.

In order to facilitate understanding, take the Fibonacci sequence as an example (here, we assume that n must be a non-negative number):

function fab($n){debug_print_backtrace();if($n == 1 || $n == 0){return $n;}return fab($n - 1) + fab($n - 2);}fab(4);

 

The Fibonacci call stack printed out is

#0  fab(4) called at [/search/nginx/html/test/Fab.php:10]

#0  fab(3) called at [/search/nginx/html/test/Fab.php:8]

#1  fab(4) called at [/search/nginx/html/test/Fab.php:10]

#0  fab(2) called at [/search/nginx/html/test/Fab.php:8]

#1  fab(3) called at [/search/nginx/html/test/Fab.php:8]

#2  fab(4) called at [/search/nginx/html/test/Fab.php:10]

#0  fab(1) called at [/search/nginx/html/test/Fab.php:8]

#1  fab(2) called at [/search/nginx/html/test/Fab.php:8]

#2  fab(3) called at [/search/nginx/html/test/Fab.php:8]

#3  fab(4) called at [/search/nginx/html/test/Fab.php:10]

#0  fab(0) called at [/search/nginx/html/test/Fab.php:8]

#1  fab(2) called at [/search/nginx/html/test/Fab.php:8]

#2  fab(3) called at [/search/nginx/html/test/Fab.php:8]

#3  fab(4) called at [/search/nginx/html/test/Fab.php:10]

#0  fab(1) called at [/search/nginx/html/test/Fab.php:8]

#1  fab(3) called at [/search/nginx/html/test/Fab.php:8]

#2  fab(4) called at [/search/nginx/html/test/Fab.php:10]

#0  fab(2) called at [/search/nginx/html/test/Fab.php:8]

#1  fab(4) called at [/search/nginx/html/test/Fab.php:10]

#0  fab(1) called at [/search/nginx/html/test/Fab.php:8]

#1  fab(2) called at [/search/nginx/html/test/Fab.php:8]

#2  fab(4) called at [/search/nginx/html/test/Fab.php:10]

#0  fab(0) called at [/search/nginx/html/test/Fab.php:8]

#1  fab(2) called at [/search/nginx/html/test/Fab.php:8]

#2  fab(4) called at [/search/nginx/html/test/Fab.php:10]

 

At first glance at this mess of output, it seems to be clueless. In fact, for each line of output above, it contains the following items:

A. The stack level, such as #0 means the top of the stack, #1 means the first layer of stack frame, #2 means the second layer of stack frame, and so on, the larger the number, the greater the depth of the stack frame.

B. Functions and parameters to be called. For example, fab(4) indicates that the actual execution function is the fab function, and 4 indicates the actual parameter of the function.

C. The location of the call: including the file name and the number of lines executed.

In fact, we can see the call stack and calculation process of the function more clearly by adding some additional output information. For example, we add the basic information of the function level:

 

function fab($n){	echo “-- n = $n ----------------------------”.PHP_EOL;debug_print_backtrace();if($n == 1 || $n == 0){return $n;}return fab($n - 1) + fab($n - 2);}fab(4);

Then the call stack after executing fab(4) is:

---- n = 4 ---------------------------------------------
#0  fab(4) called at [/search/nginx/html/test/Fab.php:11]
---- n = 3 ---------------------------------------------
#0  fab(3) called at [/search/nginx/html/test/Fab.php:9]
#1  fab(4) called at [/search/nginx/html/test/Fab.php:11]
---- n = 2 ---------------------------------------------
#0  fab(2) called at [/search/nginx/html/test/Fab.php:9]
#1  fab(3) called at [/search/nginx/html/test/Fab.php:9]
#2  fab(4) called at [/search/nginx/html/test/Fab.php:11]
---- n = 1 ---------------------------------------------
#0  fab(1) called at [/search/nginx/html/test/Fab.php:9]
#1  fab(2) called at [/search/nginx/html/test/Fab.php:9]
#2  fab(3) called at [/search/nginx/html/test/Fab.php:9]
#3  fab(4) called at [/search/nginx/html/test/Fab.php:11]
---- n = 0 ---------------------------------------------
#0  fab(0) called at [/search/nginx/html/test/Fab.php:9]
#1  fab(2) called at [/search/nginx/html/test/Fab.php:9]
#2  fab(3) called at [/search/nginx/html/test/Fab.php:9]
#3  fab(4) called at [/search/nginx/html/test/Fab.php:11]
---- n = 1 ---------------------------------------------
#0  fab(1) called at [/search/nginx/html/test/Fab.php:9]
#1  fab(3) called at [/search/nginx/html/test/Fab.php:9]
#2  fab(4) called at [/search/nginx/html/test/Fab.php:11]
---- n = 2 ---------------------------------------------
#0  fab(2) called at [/search/nginx/html/test/Fab.php:9]
#1  fab(4) called at [/search/nginx/html/test/Fab.php:11]
---- n = 1 ---------------------------------------------
#0  fab(1) called at [/search/nginx/html/test/Fab.php:9]
#1  fab(2) called at [/search/nginx/html/test/Fab.php:9]
#2  fab(4) called at [/search/nginx/html/test/Fab.php:11]
---- n = 0 ---------------------------------------------
#0  fab(0) called at [/search/nginx/html/test/Fab.php:9]
#1  fab(2) called at [/search/nginx/html/test/Fab.php:9]
#2  fab(4) called at [/search/nginx/html/test/Fab.php:11]

 Explanation of the output (note the first two columns of the output): Because the program needs to calculate the value of fab(4). The value of fab(4) depends on the values ​​of fab(3) and fab(2), so the value of fab(4) cannot be calculated directly. It needs to be pushed onto the stack, corresponding to 1 in the figure below. The left branch of fab(4) is fab(3), and the value of fab(3) cannot be calculated directly. Therefore, fab(3) needs to be also pushed onto the stack, corresponding to 2 in the figure below. The same is true for fab(2 ) Also needs to be pushed onto the stack until the leaf node of the recursive tree. After calculating the leaf nodes, return the stack in turn until the stack is empty, as shown in the following figure:

 Performance-recursive efficiency analysis

 

 

  Yesterday, when I was reading Pu Ling's "Non-Simplicity NODE.js", I saw the test results given by the author when he tested the performance of different languages. Roughly: Through a simple recursive calculation of the Fibonacci sequence, the calculation time of different languages ​​is tested, so as to roughly evaluate the calculation performance of different languages. Among them, the calculation time of PHP surprised me: In the case of n=40, the time consumed by PHP to calculate the Fibonacci sequence is 1m17.728s, which is 77.728s, which is far worse than 0.202s of the C language. About 380 times! (The test results can be seen in the figure below)

 

  As we know, the execution process of PHP code is through scanning code, lexical analysis, syntax analysis and other processes, the PHP program is compiled into intermediate code (Opcode bytecode), and then executed by the Zend core engine, so in essence, PHP It is a high-level language implementation based on the C language. In this way, because the PHP compilation process did not do too many compilation optimizations, and the need to run on the Zend virtual machine, the efficiency is bound to be greatly reduced compared with the native C language. However, there will be such a big gap, it is inevitable. It's incredible.

Why is the efficiency of recursion in PHP so low? (One thing to know is that PHP does not support tail recursion optimization, which will lead to repeated iterations and repeated calculations of tree recursion, so the efficiency of recursion is greatly reduced, and the recursion level that can be tolerated is also greatly Decrease. In c/c++, when using gcc -O2 or higher, the compiler will optimize the recursion accordingly)? In this article ( implementation principle and performance analysis of PHP functions ), one of the author’s explanations is: " Function recursion is done through the stack. In php, it is also implemented in a similar way. Zend provides each php function An active symbol table (active_sym_table) is allocated to record the state of all local variables in the current function. All symbol tables are maintained in the form of a stack. Whenever there is a function call, a new symbol table is allocated and added to the stack.
When After the call, the current symbol table is popped out of the stack. This realizes the preservation and recursion of the state. For the maintenance of the stack, zend is optimized here. Pre-allocate a static array of length N to simulate the stack. The method of simulating dynamic data structure is also often used in our own programs. This method avoids the memory allocation and destruction caused by each call. ZEND just cleans the symbol table data at the top of the current stack at the end of the function call. Yes. Because the static array length is N, once the function call level exceeds N, the program will not overflow the stack. In this case, zend will allocate and destroy the symbol table, which will result in a lot of performance degradation. In zend, The current value of N is 32. Therefore, when we write php programs, the function call level should not exceed 32.
"

 

另外,php bug中也有说明:“PHP 4.0 (Zend) uses the stack for intensive data, rather than using the heap. That means that its tolerance recursive functions is significantly

lower than that of other languages ”

SO, in PHP, if it is not very necessary, we suggest that it is best to use recursion as little as possible, especially when the recursion level is large or cannot be estimated.

references:

1.  http://www.csharpwin.com/csharpspace/12292r4006.shtml

2. http:/devzone.zend.com/283/recursion-in-php-tapping-unharnessed-power/

3. http://blog.csdn.net/heiyeshuwu/article/details/5840025

4. http:/www.nowamagic.net/librarys/veda/detail/2336 

5. http://www.cnblogs.com/JeffreyZhao/archive/2009/03/26/tail-recursion-and-continuation.html

6. http://wenku.baidu.com/view/7fb00bc2d5bbfd0a7956737d.html

Guess you like

Origin blog.csdn.net/kexin178/article/details/112728323
php