Python Performance Optimization Guide
The most criticized thing about Python is its execution speed. How to make Python programs run faster has always been the direction of the Python core team and community. As Python developers, we can also adopt certain principles and techniques to write Python code with better performance. This article will take you to an in-depth discussion of Python program performance optimization methods.
-
Article directory
-
- Optimization principles
- Optimization tools
- Solve bottlenecks
-
- Choose appropriate algorithms and data structures
- Multipurpose list comprehension
- Use less `.` operations
- Make good use of multiple assignments
- Avoid using global variables
- Use library methods whenever possible
- Join strings together
- Make good use of generators
- Take advantage of acceleration tools
- Use C/C++/Rust to implement core functions
- Use the latest version of Python
Optimization principles
Some optimization principles are applicable to all programming languages, and of course they are also applicable to Python. These principles serve as "mental methods" for program optimization. Each of our programmers must keep them in mind and implement them in daily development.
1. Avoid optimizing while developing
Don't think about possible optimizations when writing your program, but focus on making sure your code is clean, correct, readable, and understandable. If you find after writing it that it's too big or too slow, then think about how to optimize it. The great god Donald Knuth once said a wise saying:
"Premature optimization is the root of all evil. (Premature optimization is the root of all evil.)"
This is actually Zeng Guofan's philosophy: "Accommodate things as they come, don't welcome the future, don't be confused at the moment, don't love the past . "
2. Remember the 20/80 rule
In many fields, you can get 80% of the results with 20% of the effort (sometimes it may be the 10/90 rule). Whenever you want to optimize your code, first use profiling tools to find out where 80% of the execution time is spent, so you know where to focus your optimization efforts.
3. Be sure to compare performance before and after optimization
Without performance comparison before and after optimization, we cannot know whether the optimization has actual effects. If the optimized code is only slightly faster than the pre-optimized version, undo the optimization and return to the pre-optimized version. Because it is not worth the slightest performance improvement at the expense of clear, tidy, easy-to-read, and understandable code.
Please keep the above three optimization principles in mind. No matter what language you use in the future, please abide by these three rules when doing performance optimization.
Optimization tools
As mentioned in the second optimization principle, we need to focus optimization efforts on the most time-consuming areas. So how do you find the most time-consuming parts of your program? At this time, we need to use optimization tools to collect data during program operation to help us find the bottleneck of the program. This process is called _Profiling_. There are multiple Profiling tools in Python, each of which has its own usage scenarios and focus. We will introduce them one by one below.
cProfile
Python comes with a Profiling tool called cProfile
. This is also the Profiling tool that I recommend everyone to use, because it is the most powerful. It can be injected into every method in the program and collect rich data, including:
ncalls
: The number of times the method is calledtottime
: The total time of method execution (excluding the execution time of sub-functions)percall
: The average time spent on each execution, which is the quotienttottime
divided byncalls
cumtime
: The cumulative time of method execution (including the execution time of sub-functions) is also accurate and effective for recursion.percall
:cumtime
the quotient divided by the number of original calls (excluding recursive calls)filename:lineno(function)
: Provide corresponding data for each function
cProfile
Can be used directly from the command line.
$ python -m cProfile main.py
Assume that our main.py
implementation is to find the sum of prime numbers within 1,000,000. The code is as follows:
import math
def is_prime(n: int) -> bool:
for i in range(2, math.floor(math.sqrt(n)) + 1):
if n % i == 0:
return False
return True
def main():
s = 0
for i in range(2, 1000000):
if is_prime(i):
s += i
print(s)
if __name__ == "__main__":
main()
After execution cProfile
, the console will output (part of) the following information:
Judging from the output, the entire program took 3.091 seconds to run, with a total of 3,000,064 function calls. The following list is the detailed data of each function. cProfile is sorted by function name by default, and we pay more attention to the execution time of the function, so we usually bring it when using cProfile -s time
and let cProfile sort the output by execution time.
$ python -m cProfile -s time .\main.py
The output information sorted by time is as follows:
As you can see from the output information above, the most time-consuming function is is_prime
the function. If we want to optimize, is_prime
it will be our optimization focus.
%%timeit and %timeit
The ones introduced above cProfile
are mainly used in the command line. But in data analysis and machine learning we often use Jupyter as an interactive programming environment. In interactive programming environments such as Jupyter or IPython, cProfile
it cannot be used. We need to use %%timeit
and %timeit
.
%%timeit
%timeit
The difference between and is %%timeit
that it acts on the entire code block and counts the execution time of the entire code block; %timeit
it acts on the statement and counts the execution time of the statement in that line. Still looking for the code to find the sum of prime numbers, let’s take a look at how to get its running time in Jupyter:
The above code is added at the beginning of the code block %%timeit
, and it will count the running time of the entire code. %%timeit
It will be run multiple times and the average running time will be calculated. The output is as follows:
3.87 s ± 151 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
timeit() method
Sometimes we may just want to know the execution status of a certain function or a certain line of statements in the code. At this time, using cProfile cProfile
is too heavy (cProfile will output the execution status of all functions). We can import timeit
use timeit()
it to wrap the function we want to profile. or statement. For example:
import timeit
timeit.timeit('list(itertools.repeat("a", 100))', 'import itertools', number=10000000)
The above code will be tested list(itertools.repeat("a", 100))
10000000 times and the average running time will be calculated.
10.997665435877963
timeit
Can also be used from the command line. For example:
$ python -m timeit "'-'.join(str(n) for n in range(100))"
20000 loops, best of 5: 10.5 usec per loop
Third-party tool line_profiler
The tools introduced above all come with Python or IPython, and the functions provided are more about the running time of the function. When we need to deeply understand the execution of the program, the three tools introduced above are not enough. At this time we need to use the code optimization tool – line_profiler. ine_profiler is a third-party library for Python that functions as a function-based line-by-line code analysis tool. Through this library, time consumption analysis of the target function (allowing analysis of multiple functions) can be performed to facilitate code tuning.
Since it is a third-party tool, it needs to be installed before using line_profiler.
$ pip install line_profiler
After successful installation, we can use @profile annotation and kernprof command to collect code running status. We transform the above example of finding the sum of prime numbers into the form of @profile annotation.
import math
import profile
@profile
def is_prime(n: int) -> bool:
for i in range(2, math.floor(math.sqrt(n)) + 1):
if n % i == 0:
return False
return True
@profile
def main():
s = 0
for i in range(2, 1000000):
if is_prime(i):
s += i
print(s)
if __name__ == "__main__":
main()
Then run kernprof
the command
$ kernprof -lv main.py
Among them, the parameter -l
indicates using line-by-line profiler instead of cProfile. For the @profile annotation to take effect, parameters must be added -l
. The analysis results will be saved to .lprof
a file, and -v
parameters can be added to write the results to a file and display them on the console. You can view the analysis results with line_profiler
the command:
$ python -m line_profiler <.lprof文件>
Solve bottlenecks
Choose appropriate algorithms and data structures
Using profiling tools we can easily find the bottlenecks of the program. The next step is how to solve the bottleneck. According to statistics, 90% of program performance problems are related to algorithms and data structures. Choosing the appropriate algorithm is most important to improve program performance. For example, if you want to sort a list containing thousands of elements, do not use bubble sort with a time complexity of O ( n 2 ) O(n^2) O(n2), but use quick sort (a time complexity of O ( n log n ) O(n\log_n) O(nlogn)) will be much faster.
The above example talks about the impact of algorithms on performance. Choosing the appropriate data structure also has a great impact on performance. For example, searching massive amounts of data. If we use a list to store data, the time complexity of finding the specified element is O ( n ) O(n) O(n); but if we use a binary tree to store the data, the search speed will increase to O ( log n ) O (\log_n) O(logn); if we use a hash table to store, the search speed will become O ( 1 ) O(1) O(1).
When describing algorithm complexity, we usually use Big O notation . Big O notation defines an upper bound on the time required by the algorithm. Insertion sort, for example, takes linear time in the best case and quadratic time in the worst case. So the time complexity of our insertion sort is O ( n 2 ) O(n^2) O(n2). This is the upper bound of the time required by the algorithm, and it will not exceed this time under any circumstances.
To this end, I have specially compiled the time complexity of common operations in Python for your reference.
∗ ∗ ∗ \ast \ast \ast ∗∗∗
In addition to choosing appropriate algorithms and data structures , there are also some techniques in the Python development process that can improve program execution speed.
Multipurpose list comprehension
Use list comprehensions wherever possible. For example, to find the multiples of 3 within 10,000, we can write:
l = []
for i in range (1, 10000):
if i%3 == 0:
l.append(i)
It would be better to write it in list comprehension. Not only is the code concise, but its performance is also higher than the above code, because list comprehension is more append
performant.
l = [i for i in range (1, 100000) if i%3 == 0]
Comparison of running time of two pieces of code (%%timeit)
It can be seen that by looping 100 times and taking the average time, list comprehension is faster than using append.
Use less .
operations
Try to avoid using operations during development .
, such as
import math
val = math.sqrt(60)
should be replaced by
from math import sqrt
val = sqrt(60)
It is thought that when we use .
the calling method, we will first call __getattribute()__
or __getattr()__
. Both methods contain some dictionary operations, which are time-consuming.
Judging from the above test, it doesn't need .
to be much faster. Therefore, it is often used from module import function
to introduce methods directly and avoid .
calling methods.
Make good use of multiple assignments
If you encounter continuous variable assignment, such as
a = 2
b = 3
c = 4
d = 5
It is recommended to write
a, b, c, d = 2, 3, 4, 5
Avoid using global variables
Python has global
keywords to declare or associate global variables. But dealing with global variables takes more time than local variables. Therefore, do not use global variables unless necessary.
Use library methods whenever possible
If a function is already provided by the Python standard library or a third-party library, use the library method instead of implementing it yourself. Library methods are highly optimized, and even many of the underlying methods are implemented in C language. There is a high probability that the methods we write ourselves will not be more efficient than the library methods, and writing them ourselves is not in line with the DRY spirit.
Join strings together
Many languages use +
string concatenation. Of course, Python also supports +
concatenating strings, but I prefer to use join()
this method to concatenate strings, because join()
concatenating strings is +
faster. +
will create a new string and copy the value of the old string over it, join()
not .
Make good use of generators
When we have to deal with lists containing large amounts of data, it is faster to use generators. I wrote a special article "In-depth Understanding of Python Generators and Yield" to explain in detail Python generators and why it is faster to use generators to process large files or large data sets.
Take advantage of acceleration tools
There are many projects dedicated to making Python faster by providing a better running environment or runtime optimizations. Among them, mature ones include PyPy and Numba .
PyPy is on average 4.5 times faster than CPython; for information on how to use PyPy to speed up Python, please see this article "Accelerating Python Programs with PyPy" .
Numba is a JIT compiler that works well with Numpy and can compile Python functions into machine code, greatly improving the speed of scientific calculations. How to use Numb to improve the running speed of Python ? Please read this article "Using Numba: One line of code increases the running speed of Python programs by 100 times"
So if conditions permit, you can use the above two tools to speed up Python code.
Use C/C++/Rust to implement core functions
C/C++/Rust are all much faster than Python. The power of Python is that it can be bundled with other languages. So when dealing with certain performance-sensitive functions, we can consider using C/C++/Rust to implement the core functions and then bind them to the Python language. Many libraries in Python do this, such as Numpy , Scipy , Pandas , Polars , etc.
For how to develop C extension modules in C language, please refer to "Make your Python program as fast as C language"
Use the latest version of Python
Python's core team is also working tirelessly to optimize Python's performance. Each new version released is more optimized and faster than the last version. Not long ago, Python released the latest 3.11.0 . The performance of this version has been greatly improved, 10% to 60% higher than 3.10, and 5% faster than Python 2.7. Therefore, if conditions permit, try to use a newer version of Python or obtain performance improvements.