Comment utiliser le module multitraitement Python

Dans cet article ^[1] , nous apprendrons comment utiliser une classe Python spécifique (classe de processus) du module multitraitement. Je vais vous donner un aperçu rapide avec des exemples.

Qu'est-ce qu'un module multitraitement ?

Quelle meilleure façon de décrire un module que de le tirer de sa documentation officielle ? Multiprocessingest un package qui prend en charge les processus de génération à l'aide d'une API similaire au module thread. Le package multitraitement fournit une concurrence locale et distante, contournant efficacement le verrouillage global de l'interpréteur en utilisant des sous-processus au lieu de threads.

Le module de threading n'est pas l'objet de cet article, mais en résumé, le module de threading gérera l'exécution d'un petit morceau de code (léger et avec mémoire partagée), tandis que le module multitraitement gérera l'exécution du programme (plus lourd et entièrement isolé).

En général, le module multitraitement fournit diverses autres classes, fonctions et utilitaires qui peuvent être utilisés pour gérer plusieurs processus exécutés pendant l'exécution du programme. Ce module est spécifiquement conçu pour être le principal point d'interaction si un programme doit appliquer le parallélisme dans son flux de travail. Nous ne discuterons pas de toutes les classes et utilitaires du module multitraitement, mais nous concentrerons sur une classe très spécifique, la classe processus.

Qu'est-ce qu'une classe de processus ?

Dans cette section, nous essaierons de fournir une meilleure introduction à ce qu'est un processus et comment l'identifier, l'utiliser et le gérer en Python. Comme GNU Cl'explique la bibliothèque : "Un processus est l'unité de base pour l'allocation des ressources système. Chaque processus a son propre espace d'adressage et (généralement) un thread de contrôle. Un processus exécute un programme ; plusieurs processus peuvent exécuter le même programme. , mais chacun Le processus possède sa propre copie du programme dans son propre espace d'adressage et l'exécute indépendamment des autres copies.

Mais à quoi cela ressemble-t-il en Python ? Jusqu'à présent, nous avons réussi à obtenir une description et une référence sur ce qu'est un processus, la différence entre un processus et un thread, mais jusqu'à présent, nous n'avons touché à aucun code. Bon, changeons les choses et faisons un exemple de processus très simple en Python :

#!/usr/bin/env python
import os

# A very, very simple process.
if __name__ == "__main__":
    print(f"Hi! I'm process {os.getpid()}")

Cela produira le résultat suivant :

[r0x0d@fedora ~]$ python /tmp/tmp.iuW2VAurGG/scratch.py
Hi! I'm process 144112

Comme vous pouvez le constater, tout script ou programme Python en cours d'exécution est un processus en soi.

Créer un processus enfant

Alors qu’en est-il de la création de différents processus enfants au sein du processus parent ? Eh bien, pour ce faire, nous avons besoin de l'aide de la classe Process du module multitraitement, cela ressemble à ceci :

#!/usr/bin/env python
import os
import multiprocessing

def child_process():
    print(f"Hi! I'm a child process {os.getpid()}")

if __name__ == "__main__":
    print(f"Hi! I'm process {os.getpid()}")

    # Here we create a new instance of the Process class and assign our
    # `child_process` function to be executed.
    process = multiprocessing.Process(target=child_process)

    # We then start the process
    process.start()

    # And finally, we join the process. This will make our script to hang and
    # wait until the child process is done.
    process.join()

Cela produira le résultat suivant :

[r0x0d@fedora ~]$ python /tmp/tmp.iuW2VAurGG/scratch.py
Hi! I'm process 144078
Hi! I'm a child process 144079

关于上一个脚本的一个非常重要的注意事项：如果您不使用 process.join() 来等待子进程执行并完成，那么该点的任何其他后续代码将实际执行，并且可能会变得有点难以同步您的工作流程。

考虑以下示例：

#!/usr/bin/env python
import os
import multiprocessing

def child_process():
    print(f"Hi! I'm a child process {os.getpid()}")

if __name__ == "__main__":
    print(f"Hi! I'm process {os.getpid()}")

    # Here we create a new instance of the Process class and assign our
    # `child_process` function to be executed.
    process = multiprocessing.Process(target=child_process)

    # We then start the process
    process.start()

    # And finally, we join the process. This will make our script to hang and
    # wait until the child process is done.
    #process.join()

    print("AFTER CHILD EXECUTION! RIGHT?!")

该代码片段将产生以下输出：

[r0x0d@fedora ~]$ python /tmp/tmp.iuW2VAurGG/scratch.py
Hi! I'm process 145489
AFTER CHILD EXECUTION! RIGHT?!
Hi! I'm a child process 145490

当然，断言上面的代码片段是错误的也是不正确的。这完全取决于您想要如何使用该模块以及您的子进程将如何执行。所以要明智地使用它。

创建各种子进程

如果要生成多个进程，可以利用 for 循环（或任何其他类型的循环）。它们将允许您创建对所需流程的尽可能多的引用，并在稍后阶段启动/加入它们。

#!/usr/bin/env python
import os
import multiprocessing

def child_process(id):
    print(f"Hi! I'm a child process {os.getpid()} with id#{id}")

if __name__ == "__main__":
    print(f"Hi! I'm process {os.getpid()}")
    list_of_processes = []

    # Loop through the number 0 to 10 and create processes for each one of
    # them.
    for i in range(0, 10):
        # Here we create a new instance of the Process class and assign our
        # `child_process` function to be executed. Note the difference now that
        # we are using the `args` parameter now, this means that we can pass
        # down parameters to the function being executed as a child process.
        process = multiprocessing.Process(target=child_process, args=(i,))
        list_of_processes.append(process)

    for process in list_of_processes:
        # We then start the process
        process.start()

        # And finally, we join the process. This will make our script to hang
        # and wait until the child process is done.
        process.join()

这将产生以下输出：

[r0x0d@fedora ~]$ python /tmp/tmp.iuW2VAurGG/scratch.py
Hi! I'm process 146056
Hi! I'm a child process 146057 with id#0
Hi! I'm a child process 146058 with id#1
Hi! I'm a child process 146059 with id#2
Hi! I'm a child process 146060 with id#3
Hi! I'm a child process 146061 with id#4
Hi! I'm a child process 146062 with id#5
Hi! I'm a child process 146063 with id#6
Hi! I'm a child process 146064 with id#7
Hi! I'm a child process 146065 with id#8
Hi! I'm a child process 146066 with id#9

数据通信

在上一节中，我描述了向 multiprocessing.Process 类构造函数添加一个新参数 args。此参数允许您将值传递给子进程以在函数内部使用。但你知道如何从子进程返回数据吗？

您可能会认为，要从子级返回数据，必须使用其中的 return 语句才能真正检索数据。进程非常适合以隔离的方式执行函数，而不会干扰共享资源，这意味着我们知道从函数返回数据的正常且常用的方式。在这里，由于其隔离而不允许。

相反，我们可以使用队列类，它将为我们提供一个在父进程与其子进程之间通信数据的接口。在这种情况下，队列是一个普通的 FIFO（先进先出），具有用于处理多处理的内置机制。

考虑以下示例：

#!/usr/bin/env python
import os
import multiprocessing

def child_process(queue, number1, number2):
    print(f"Hi! I'm a child process {os.getpid()}. I do calculations.")
    sum = number1 + number2

    # Putting data into the queue
    queue.put(sum)

if __name__ == "__main__":
    print(f"Hi! I'm process {os.getpid()}")

    # Defining a new Queue()
    queue = multiprocessing.Queue()

    # Here we create a new instance of the Process class and assign our
    # `child_process` function to be executed. Note the difference now that
    # we are using the `args` parameter now, this means that we can pass
    # down parameters to the function being executed as a child process.
    process = multiprocessing.Process(target=child_process, args=(queue,1, 2))

    # We then start the process
    process.start()

    # And finally, we join the process. This will make our script to hang and
    # wait until the child process is done.
    process.join()

    # Accessing the result from the queue.
    print(f"Got the result from child process as {queue.get()}")

它将给出以下输出：

[r0x0d@fedora ~]$ python /tmp/tmp.iuW2VAurGG/scratch.py
Hi! I'm process 149002
Hi! I'm a child process 149003. I do calculations.
Got the result from child process as 3

异常处理

处理异常是一项特殊且有些困难的任务，我们在使用流程模块时必须不时地完成它。原因是，默认情况下，子进程内发生的任何异常将始终由生成它的 Process 类处理。

下面的代码引发带有文本的异常：

#!/usr/bin/env python
import os
import multiprocessing

def child_process():
    print(f"Hi! I'm a child process {os.getpid()}.")
    raise Exception("Oh no! :(")

if __name__ == "__main__":
    print(f"Hi! I'm process {os.getpid()}")

    # Here we create a new instance of the Process class and assign our
    # `child_process` function to be executed. Note the difference now that
    # we are using the `args` parameter now, this means that we can pass
    # down parameters to the function being executed as a child process.
    process = multiprocessing.Process(target=child_process)

    try:
        # We then start the process
        process.start()

        # And finally, we join the process. This will make our script to hang and
        # wait until the child process is done.
        process.join()

        print("AFTER CHILD EXECUTION! RIGHT?!")
    except Exception:
        print("Uhhh... It failed?")

输出结果：

[r0x0d@fedora ~]$ python /tmp/tmp.iuW2VAurGG/scratch.py
Hi! I'm process 149505
Hi! I'm a child process 149506.
Process Process-1:
Traceback (most recent call last):
  File "/usr/lib64/python3.11/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib64/python3.11/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/tmp/tmp.iuW2VAurGG/scratch.py", line 7, in child_process
    raise Exception("Oh no! :(")
Exception: Oh no! :(
AFTER CHILD EXECUTION! RIGHT?!

如果您跟踪代码，您将能够注意到在 process.join() 调用之后仔细放置了一条 print 语句，以模拟父进程仍在运行，即使在子进程中引发了未处理的异常之后也是如此。

克服这种情况的一种方法是在子进程中实际处理异常，如下所示：

#!/usr/bin/env python
import os
import multiprocessing

def child_process():
    try:
        print(f"Hi! I'm a child process {os.getpid()}.")
        raise Exception("Oh no! :(")
    except Exception:
        print("Uh, I think it's fine now...")

if __name__ == "__main__":
    print(f"Hi! I'm process {os.getpid()}")

    # Here we create a new instance of the Process class and assign our
    # `child_process` function to be executed. Note the difference now that
    # we are using the `args` parameter now, this means that we can pass
    # down parameters to the function being executed as a child process.
    process = multiprocessing.Process(target=child_process)

    # We then start the process
    process.start()

    # And finally, we join the process. This will make our script to hang and
    # wait until the child process is done.
    process.join()

    print("AFTER CHILD EXECUTION! RIGHT?!")

现在，您的异常将在您的子进程内处理，这意味着您可以控制它会发生什么以及在这种情况下应该做什么。

总结

当工作和实现依赖于并行方式执行的解决方案时，多处理模块非常强大，特别是与 Process 类一起使用时。这增加了在其自己的隔离进程中执行任何函数的惊人可能性。

Reference

[1]

Source: https://developers.redhat.com/articles/2023/07/27/how-use-python-multiprocessing-module

本文由 mdnice 多平台发布