RabbitMQ entry (two) work queue

  In the article RabbitMQ entry (a) World of the Hello , we write a program to send and receive messages through a designated queue. In this article, we will create 工作队列(Work Queue), to assign time-consuming tasks by multiple workers.
  The central idea work queue (Work Queue, also known as Task Queue, task queue) is to avoid the implementation of a huge consumption of resources and must wait for it to complete the task immediately. On the contrary, we have good scheduling queue can schedule the task for later execution. We will a task (task) encapsulated into a message, sends it to the queue. Work process that runs in the background will throw this task, and ultimately perform the task. When you run multiple workers, the task will be shared among them.
  This concept is useful in web development, because short by a HTTP request can not handle complex tasks.
  In the previous article, we send a message that contains "Hello World!" Is. Now we will send representatives complex tasks string character. We did not actually tasks, such as resizing a picture or rendering PDF, we pretend there is such a complex task, by using the time.sleep()function. (.) We will use the point string to represent the complexity; each point represents one second task. For example, the string Hello...takes three seconds.
  We need to slightly modify the sent.pycode, allowing arbitrary input string in the command. The program will work to schedule the task queue, hence the name new_task.py:

import sys

message = ' '.join(sys.argv[1:]) or "Hello World!"
channel.basic_publish(exchange='',
                      routing_key='hello',
                      body=message)
print(" [x] Sent %r" % message)

  Our original receive.pyalso need to change: it takes a second job at every point in the string represents the message body. It will throw the message from the queue and execute the message, hence the name task.py:

import time

def callback(ch, method, properties, body):
    print(" [x] Received %r" % body)
    time.sleep(body.count(b'.'))
    print(" [x] Done")

Round-Robin distribution (distribution of polling)

  One advantage of using a work queue is that it can easily work in parallel. If we create a backlog of work, then we can add more worker to make it better scalability.
  First, we run two worker.pyscripts. They are able to get messages from the queue, but specifically how to achieve it? Let's read on.
  You need to open three terminals view. Two terminal used to run the worker.pyscript. The two terminals will be two consumer --C1 and C2.

# shell 1
python worker.py
# => [*] Waiting for messages. To exit press CTRL+C
# shell 2
python worker.py
# => [*] Waiting for messages. To exit press CTRL+C

In the third end, we will have a new task. Once you start these consumers, you can send some news:

# shell 3
python new_task.py First message.
python new_task.py Second message..
python new_task.py Third message...
python new_task.py Fourth message....
python new_task.py Fifth message.....

Let's see what these two workers passed:

# shell 1
python worker.py
# => [*] Waiting for messages. To exit press CTRL+C
# => [x] Received 'First message.'
# => [x] Received 'Third message...'
# => [x] Received 'Fifth message.....'
# shell 2
python worker.py
# => [*] Waiting for messages. To exit press CTRL+C
# => [x] Received 'Second message..'
# => [x] Received 'Fourth message....'

RabbitMQ default will send each message in turn to the next consumer. So all in all, each consumer will be the same number of messages. This method of allocation is called a message Round-Robin. You can try three or more worker.

Acknowledgment message (Message Acknowledgement)

  Perform a task takes a few seconds. You may wonder, if one consumer to perform a time-consuming task very long, and in the implementation part of the time hung up, what will happen? According to our current code once RabbitMQ delivery of the message to the consumer, then the RabbitMQ will mark it for deletion state. In this case, if we kill a worker, we will lose the news of his being processed. We will lose all assigned to the worker's news, of course, these messages have not yet been processed.
  But we do not want to lose any task. If there is a worker hung up, we hope that these tasks can be transferred to another worker.
  To ensure that the message is not lost, RabbitMQ support 消息确认. A ack (nowledgement) is sent back by the consumer, used to tell RabbitMQ, this particular message has been accepted, to be treated, can be deleted.
  If a consumer hung up (its channel is closed, the connection is closed, or TCP connection is lost) but did not send a ack, RabbitMQ will know the news has not been fully processed, will restart it into the queue. If at the same time there are other online consumers, RabbbitMQ This message will be redelivered to another consumer. In this way we can ensure that no messages are lost, even if the workers occasionally scrape.
  There is no message timeout; if consumers hung up, RabbitMQ will resend the message. Such a message even if the process consumes very long time, are also possible.
  The default 消息确认mode is 人工消息确认. In our previous example, we will clear it off, use auto_ack=Truethis command. When we complete a task, if necessary, to remove this flag, send an appropriate acknowledgment from the worker.

def callback(ch, method, properties, body):
    print(" [x] Received %r" % body)
    time.sleep( body.count('.') )
    print(" [x] Done")
    ch.basic_ack(delivery_tag = method.delivery_tag)

channel.basic_consume(queue='hello', on_message_callback=callback)

Using the above code, we can ensure that, even if we use the CTRL+Ccommand to kill woker a message is being processed, it will not lose anything. The worker Soon, all unacknowledged messages will be retransmitted after hang.
  消息确认It must be sent in the same transmission channel message. Try a different channel in a message confirmed will lead to channel-level protocol exception.

Message persistence (Message Durability)

  We have learned how to hang in the case of consumers, the task will not be lost. However, when the RabbitMQ server stops, our task is still missing.
  When RabbitMQ stops or crashes, it will forget all queues and messages, unless you tell it not to. In this case, we need to do two things to ensure that messages are not lost: we need to queues and messages are set to persist.
  First, we need to make sure RabbitMQ queue is not lost. To achieve this, we need to be declared as persistent queue:

channel.queue_declare(queue='hello', durable=True)

Although this order was correct, but he still will not work. This is because we have created is called a hellonon-persistent queue. RabbitMQ does not allow you to redefine an existing queue but not the same parameters, all programs to do so will only cause an error. But there is a rapid response approach - we can create a queue a different name, such as task_queue:

channel.queue_declare(queue='task_queue', durable=True)

queue_declareNeed applies to both producers and consumers.
  At this point we can ensure that task_queuethe queue will not be lost even if the message RabbitMQ restart. Now, we need to declare the message is persistent - will delivery_modethis parameter is set to 2.

channel.basic_publish(exchange='',
                      routing_key="task_queue",
                      body=message,
                      properties=pika.BasicProperties(
                         delivery_mode = 2, # make message persistent
                      ))

Fair distribution (Fair Dispatch)

  You may have noticed, the message distribution mechanism just does not strictly follow the way we had hoped. For such an example, imagine two worker, and all the odd and even-numbered messages are very heavy messages are lightweight, so one worker and another worker would have been busy almost do not do any work. However, RabbitMQ know nothing about it, it will still average allocation message.
  This happens because RabbitMQ is only time when the message into the queue will distribute the news. It does not pay attention to the number of unacknowledged messages received by consumers. It blindly n-th to n-th message consumers.

  To overcome this situation, we can basic.qosset the method prefetch_count=1. This tells RabbitMQ one should not be more than one message to a worker. In other words, do not distribute a new message to the message before unless the worker has to deal with worker and carried the message confirmation. Is to say, RabbitMQ will be distributed to the news the next is not busy worker.

channel.basic_qos(prefetch_count=1)

1 combat

  In order to have a good understanding of the above example, we need to write the code actually practice it.
  Manufacturer new_task.py's code:

# -*- coding: utf-8 -*-

import pika
import sys

connection = pika.BlockingConnection(pika.ConnectionParameters(host='localhost'))
channel = connection.channel()

channel.queue_declare(queue='task_queue', durable=True)

message = ' '.join(sys.argv[1:]) or "Hello World!"
channel.basic_publish(
    exchange='',
    routing_key='task_queue',
    body=message,
    properties=pika.BasicProperties(
        delivery_mode=2,  # make message persistent
    ))
print(" [x] Sent %r" % message)
connection.close()

  Consumers worker.pycomplete code is as follows:

# -*- coding: utf-8 -*-

import pika
import time

connection = pika.BlockingConnection(pika.ConnectionParameters(host='localhost'))
channel = connection.channel()

channel.queue_declare(queue='task_queue', durable=True)
print(' [*] Waiting for messages. To exit press CTRL+C')


def callback(ch, method, properties, body):
    print(" [x] Received %r" % body)
    time.sleep(body.count(b'.'))
    print(" [x] Done")
    ch.basic_ack(delivery_tag=method.delivery_tag)


channel.basic_qos(prefetch_count=1)
channel.basic_consume(queue='task_queue', on_message_callback=callback)

channel.start_consuming()

  Open three terminals, sending and receiving messages following circumstances:
transmission and reception of the message
  if we stopped one worker, then the receiving situation message is as follows:
One worker hang
you can see, worker now all messages sent will be received this still working.

2 combat

  Next, we will use this way of working queue of RabbitMQ to MySQL database table insert data.
  Database orm_test, to form exam_user, the table structure is as follows:

exam_user database table structure

  Next, we need to insert randomly created data to this table. If we use third-party modules pymysql Python, and each time a record is inserted, then one minute insert 53,237 records.
  Use RabbitMQ, our producer code is as follows:

# -*- coding: utf-8 -*-
# author: Jclian91
# place: Pudong Shanghai
# time: 2020-01-13 23:23
import pika
from random import choice

names = ['Jack', 'Rose', 'Mark', 'Hill', 'Docker', 'Lilei', 'Lee', 'Bruce', 'Dark',
         'Super', 'Cell', 'Fail', 'Suceess', 'Su', 'Alex', 'Bob', 'Cook', 'David',
         'Ella', 'Lake', 'Moon', 'Nake', 'Zoo']
places = ['Beijing', 'Shanghai', 'Guangzhou', 'Dalian', 'Qingdao']
types = ['DG001', 'DG002', 'DG003', 'DG004', 'DG005', 'DG006', 'DG007', 'DG008',
         'DG009', 'DG010', 'DG020']


connection = pika.BlockingConnection(pika.ConnectionParameters(host='localhost'))
channel = connection.channel()

channel.queue_declare(queue='task_queue', durable=True)

for id in range(1, 20000001):
    name = choice(names)
    place = choice(places)
    type2 = choice(types)
    message = "insert into exam_users values(%s, '%s', '%s', '%s');" % (id, name, place, type2)

    channel.basic_publish(
        exchange='',
        routing_key='task_queue',
        body=message,
        properties=pika.BasicProperties(
            delivery_mode=2,  # make message persistent
        ))
    print(" [x] Sent %r" % message)
connection.close()

  Consumers code is as follows:

# -*- coding: utf-8 -*-
# author: Jclian91
# place: Pudong Shanghai
# time: 2020-01-13 23:28
# -*- coding: utf-8 -*-
# author: Jclian91
# place: Sanya Hainan
# time: 2020-01-12 13:45
import pika
import time
import pymysql

# 打开数据库连接
db = pymysql.connect(host="localhost", port=3306, user="root", password="", db="orm_test")

# 使用 cursor() 方法创建一个游标对象 cursor
cursor = db.cursor()

connection = pika.BlockingConnection(pika.ConnectionParameters(host='localhost'))
channel = connection.channel()

channel.queue_declare(queue='task_queue', durable=True)
print(' [*] Waiting for messages. To exit press CTRL+C')

def callback(ch, method, properties, body):
    print(" [x] Received %r" % body)
    cursor.execute(body)
    db.commit()
    print(" [x] Insert successfully!")
    ch.basic_ack(delivery_tag=method.delivery_tag)


channel.basic_qos(prefetch_count=1)
channel.basic_consume(queue='task_queue', on_message_callback=callback)

channel.start_consuming()

We open a terminal 9, wherein a consumer 8 producers, consumers first start, then the producer according to the above mode data import, one minute inserted 133,084 records, is 2.50 times the normal way, the efficiency of a large magnitude upgrade!
  Let's slightly modify the code under producers and consumers, commit to insert multiple records, each submitted once inserted to reduce a record time consuming. The new producer code is as follows:

# -*- coding: utf-8 -*-
# author: Jclian91
# place: Pudong Shanghai
# time: 2020-01-13 23:23
import pika
from random import choice
import json

names = ['Jack', 'Rose', 'Mark', 'Hill', 'Docker', 'Lilei', 'Lee', 'Bruce', 'Dark',
         'Super', 'Cell', 'Fail', 'Suceess', 'Su', 'Alex', 'Bob', 'Cook', 'David',
         'Ella', 'Lake', 'Moon', 'Nake', 'Zoo']
places = ['Beijing', 'Shanghai', 'Guangzhou', 'Dalian', 'Qingdao']
types = ['DG001', 'DG002', 'DG003', 'DG004', 'DG005', 'DG006', 'DG007', 'DG008',
         'DG009', 'DG010', 'DG020']


connection = pika.BlockingConnection(pika.ConnectionParameters(host='localhost'))
channel = connection.channel()

channel.queue_declare(queue='task_queue', durable=True)

for _ in range(1, 200001):

    values = []
    for i in range(100):
        name = choice(names)
        place = choice(places)
        type2 = choice(types)
        values.append([100*_+i+1, name, place, type2])
    message = json.dumps(values)


    channel.basic_publish(
        exchange='',
        routing_key='task_queue',
        body=message,
        properties=pika.BasicProperties(
            delivery_mode=2,  # make message persistent
        ))
    print(" [x] Sent %r" % message)

connection.close()

  New consumer code is as follows:

# -*- coding: utf-8 -*-
# author: Jclian91
# place: Pudong Shanghai
# time: 2020-01-13 23:28
# -*- coding: utf-8 -*-
# author: Jclian91
# place: Sanya Hainan
# time: 2020-01-12 13:45
import pika
import json
import time
import pymysql

# 打开数据库连接
db = pymysql.connect(host="localhost", port=3306, user="root", password="", db="orm_test")

# 使用 cursor() 方法创建一个游标对象 cursor
cursor = db.cursor()

connection = pika.BlockingConnection(pika.ConnectionParameters(host='localhost'))
channel = connection.channel()

channel.queue_declare(queue='task_queue', durable=True)
print(' [*] Waiting for messages. To exit press CTRL+C')

def callback(ch, method, properties, body):
    print(" [x] Received %r" % body)
    sql = 'insert into exam_users values(%s, %s, %s, %s)'
    cursor.executemany(sql, json.loads(body))
    db.commit()
    print(" [x] Insert successfully!")
    ch.basic_ack(delivery_tag=method.delivery_tag)


channel.basic_qos(prefetch_count=1)
channel.basic_consume(queue='task_queue', on_message_callback=callback)

channel.start_consuming()

Like just now, we open the terminal 9, 8 of a consumer producers, consumers first start, and producers, follow the above data into the way, one minute inserted 3.1706 million records, is 59.56 times the normal way , it is a previously submitted only 23.82 times the insert mode of a record. Such speed is undoubtedly very alarming!
  Of course, there are more efficient data insertion method, the method described herein is merely to demonstrate RabbitMQ work queue and speed data insertion aspect.

  The share is over, thank you for reading ~

Guess you like

Origin www.cnblogs.com/jclian91/p/12219652.html