Python interpreter virtual machine

introduction

As we all know, computer programming languages ​​are mainly divided into two categories: compiled languages ​​and interpreted languages.

The so-called compiled language is to first compile the source code of the program into computer-executable bytecode, and these bytecodes are assembled to form the final executable file. Since the bytecode is directly executed, the operating efficiency and In terms of speed, compiled languages ​​are usually far better than interpreted languages. Representatives of compiled languages ​​include: C/C++, C#, and the popular Rust and Golang, etc.

An interpreted language usually runs an interpreter first, and this interpreter will run the source code line by line. Of course, since the computer itself can only run bytecode, the interpreter will first convert the source code into bytecode, Then execute the bytecode. The main representatives of interpreted languages ​​are: Python, JavaScript and so on.

Since they are all running bytecodes, why do interpreted languages ​​have to compile bytecodes one by one instead of compiling all source codes in one go like compilers? In fact, this is precisely the advantage of interpreted languages. The compiler compiles and executes the code one by one, and because it executes one by one, the interpreted language does not need to consider the context environment too much, which makes languages ​​such as Python do not need to care about the variable types in the running process.

Therefore, in daily life, many people also refer to compiled languages ​​as statically typed languages, and interpreted languages ​​as dynamically typed languages.

Components of a Python program

As a typical interpreted language, Python is also executed through an interpreter.
When you execute a Python file, the Python interpreter will be loaded into the system memory first. And this interpreter will be based on the specified py file path, will The code files found are compiled and executed in order from top to bottom and left to right.
Of course, this so-called compilation process is definitely a little different from static language compilation. First
, it is the classic environment for parsing program code, lexical analysis and grammar Analysis. Through this step, the interpreter can get the abstract syntax spanning tree of the code to be executed, which is the so-called AST. Therefore, the interpreter also roughly understands what to actually do, what to do first and then what to do do. These codes are then converted toSpecified, Python executable, run object.Code Object.
Python is an object-oriented programming language, which means that everything is an object. And this object is the code Object. In Python, all functions, classes, variables, and other operable things, from From the perspective of Python design, they are all code objects.
Does this Code Object run directly in the interpreter? Actually, it is not.
We all know when writing code that functions can call each other, and such as Functions, variables, and constants all have their own scope (execution space). Usually, things that are not in their own scope cannot be operated on. To achieve this, the interpreter needs to be different
. The scope is separated. In Python, the different scopes used to separate are Frame.

Frames in Python

In Python, every entry and exit of a function is accompanied by a frame operation. The Python interpreter maintains a call stack, which is the same as the traditional stack structure, and you can approximately understand it as a LIFO (Last In First Out) queue .Whenever a function is called, a frame is created and pushed into the call stack. Whenever a return is made from a function, a frame is popped from the call stack. When we debug a Python program, when the
program After disconnecting, the function call relationship you see is actually the call stack of Python.
call stack example
Each level in the debugger represents a frame.
The following is Python’s official definition of frame
The official definition of frame
. It can be seen that the frame itself is like a container. With the current running environment loaded,
we use the following code to view these properties

import inspect
def A():
    frame = inspect.currentframe()
    print(f"""
    f_back: {
      
      frame.f_back}
    f_builtins: {
      
      frame.f_builtins}
    f_code: {
      
      frame.f_code}
    f_globals: {
      
      frame.f_globals}
    f_lasti: {
      
      frame.f_lasti}
    f_lineno: {
      
      frame.f_lineno}
    f_locals: {
      
      frame.f_locals}
    f_trace: {
      
      frame.f_trace}
    """)
    return
def B():
    return A()
def C():
    return B()
C()

The running results are as follows:
frame property
It can be seen that when the A function is called, the frame of the A function records the previous frame of the current frame, that is to say, in the function scope of A, we can know the previous frame through the current frame , that is, which function calls A. At the same time, the frame also records some current running information, such as the number of code execution, and which file the current code is in. Among them, the more interesting ones
are It is the f_trace attribute, which specifies the tracking function of the frame.
The following is an example of setting the tracking function

import sys

def trace_func(frame, event, arg):
    print(f"跟随事件 {
      
      event} 目前正处于 {
      
      frame.f_code.co_name} 函数 参数为:{
      
      arg}")
    return trace_func

def add(a,b):
    print(f"进入add函数,运算结果为:{
      
      a+b}")
    return a+b

# 设置跟踪函数
sys.settrace(trace_func)

# 执行函数
add(1,2)

# 取消跟踪函数
sys.settrace(None)

Trace function example
As can be seen from the figure, there are three trigger events of the tracking function, which are triggered once when calling the function (call), twice when executing the function (line), and once when exiting the function (return). Before the function actually returns, the return value 3 of the add function is obtained first.

Using this feature of the trace function, we can easily understand the running status of the program when debugging the program. The principle of many debuggers is actually the same.

From f_builtins, we can see many familiar names, such as len, id, print, max, etc. This also explains why, we can call these built-in functions anywhere. Because in the frame where the program is running, there is a
dedicated The space f_builtins exists to store these built-in functions.

If you observe carefully, you can also find that there are many familiar words in f_globals. For example, the value of __name__ is __main__. And there are three keys, A, B, and C, and the values ​​​​are all code objects. In fact, this Attributes are mainly stored in the global namespace of the current frame. That is, the place where global variable attribute functions are stored. If you often use techniques such as closures, you should know that there is a keyword in Python called global, and variables defined using global will be Stored in f_globals and then become a global variable.

With the global scope, there will naturally be a local scope. This is what f_locals does. All local variable attributes and so on will be stored in this f_locals. Of course, after the previous foreshadowing, it can be seen that this local variable and Global variables are also a gentlemen's agreement at the bottom of Python design. Although there are different f_locals in different frames. For ordinary Python developers, the role of scope isolation is indeed achieved. However, in fact, we can use trace function, or other methods, get the desired frame and get what you want from it.

That is to say,In Python, as long as you want, you can get the variable values, running conditions, and running results of other functions or methods in any function at any time.

The last one that is more interesting is f_code, which stores the execution object in the current frame, that is, the code object generated by the interpreter during compilation as mentioned above.

Code Object

The official definition of code object is as follows.
code object definition
Code Object can be understood as the operating unit of the Python interpreter. Whenever your Python interpreter wants to execute code, run functions, create classes, etc., you will find the corresponding code object, and then execute the code object.

Let's use a specific example to see what these attributes represent.
code object property example
From the running results, we can clearly see that when Python is running, we can also know that function D has a total of 3 parameters, and this There are two constant values ​​in the function, one is None and the other is 0. Note that None here exists by default. Through the three attributes of co_filename, co_firstlineno, and co_name, we can know the py where the function D to be run in the code is located The specific location of the file, and the location of the first line of code in this file is the second line of the py file, and the function name of the function to be run is D. At this time, the value of co_flags is 67, but in the official
document In the definition in , it should exist as a flag bit, why is it an integer here.
Just like the special flag bit register in the CPU of the computer, co_flags here is also a field of bitwise flags. Here The 67 at the place is converted into decimal form for the convenience of display. Its essence is actually 1000011 in binary, with a total of 7 signs from the 0th bit on the left to the 6th bit on the right. The description of the flag bits is as follows: For example, the
last
co_flags flag bit
two Bits represent whether this function is a generator function or a coroutine. The last third represents whether this function is a closure function. If the first bit is 1, it proves that this code object has been
optimized , will use fast storage (Load_Fast) when actually executing bytecode (bytecode)

The bytecode (bytecode) here refers to the co_code in the code object. Of course, it is definitely not possible to print it out directly. Regarding the bytecode, Python officially has a dis module that can be easily viewed.

It is still the D function just now, now we use the dis module to check its bytecode. The
dis view bytecode
bytecode here is essentially a one-to-one correspondence with the code in the source code.

First, Load_const 0 pushes the constant 0 into the stack, and Store_Fast saves the top 0 of the stack and stores it as e in co_varnames. Then,
use Load_Fast to take out the two variables a and b from the variable stack, and use Binary_Add to Add two variables, and use Store_Fast to store the result of the addition into the variable stack and name it c. Then read c and d from the variable stack, use Inplace_Add to calculate the calculation result, and store the result as c. Finally, take the value of c from the variable stack and use Return_Value to return c to the function called by the upper layer.

From the above bytecode flow, it can be seen that the seemingly simple lines of code contain a lot of operations in and out of the stack. For example, Binary_Add is used as an addition operation, but the bytecode itself does not have any operands, which is Because he actually adds the two elements on the top of the stack. Therefore, there is no need to specify which variable it is. Use Load_Fast to push the variable to the top of the stack in advance, and then directly perform the addition operation.

If you want to know more about Python bytecode, you can check the definition given by Python's official documentation
bytecode description

https://docs.python.org/zh-cn/3/library/dis.html?highlight=dis#python-bytecode-instructions

Summarize

After the above introduction, the running process of a Python program can be summarized up to now.

  • Start the Python interpreter and initialize your own operating environment
  • Read the py file to be run, and convert the human-readable source code into a code object through syntax and lexical analysis
  • Execute the code object in order and run the bytecode in it
  • Return the result of the operation

Guess you like

Origin blog.csdn.net/weixin_45608294/article/details/131019141