Microservice System Face-to-face II: Taking the Seckill System as an Example

16 Microservices and Cluster Deployment

16.1 Will a microservice generally be deployed in a cluster?

Whether a microservice is deployed in a cluster depends entirely on the specific business requirements and system scale. If a microservice has high access pressure or needs to provide high availability, cluster deployment is a common strategy. Through cluster deployment, service availability and fault tolerance can be improved to a certain extent, because when a node fails, other nodes can continue to provide services. At the same time, through load balancing technology, requests can be distributed to each node to improve the processing capacity of the system.

16.2 Why can each microservice provide read and write services? Rather than deploying like a raft cluster, only the leader node can provide read and write capabilities, and other nodes can ensure consistency and availability

This is because their read and write databases are the same, so the consistency guarantee is provided by the independently deployed distributed database test.

In the microservice architecture, each microservice can provide read and write services, mainly because one of the design principles of the microservice architecture is service autonomy. Each microservice should be able to handle requests independently without depending on other services. For data storage, **Generally speaking, each microservice will have its own independent database to avoid cross-service data dependencies. **In this way, even write operations can be handled by any microservice node. Of course, the data consistency guarantee is usually provided by the database itself.

As for the Raft cluster you mentioned, it is a strongly consistent replication protocol, usually used to implement state machine replication of distributed systems. In Raft, only the Leader node can process client requests, which is mainly to ensure data consistency. But in the microservice architecture, since each service has its own database, the consistency problem is mainly handled by the database, and the microservice only needs to focus on how to process business logic.

16.3 Each microservice will have its own independent database, but all instances of a microservice share this database, right?

Yes, all instances of a microservice usually share a single database. This is because in a microservices architecture, each microservice should be autonomous, meaning they should have their own data store and be able to handle requests independently.

For example, if you have an order service, you may have multiple instances to handle highly concurrent order requests. Although you have multiple instances of the order service, they will all share the same order database.

Such a design allows microservices to manage and control access to their own data and how to implement business logic, which is conducive to the independent development and rapid iteration of microservices. At the same time, since all instances of microservices access the same database, they can see the same data, ensuring data consistency.

However, it should also be noted that such a design may cause some problems, such as concurrent access control of the database, data consistency and isolation issues, etc., which requires us to properly handle it in the actual system design and development.

17 The order of the request (emphasis)

17.1 The order of requests in a single machine and its solution

17.1.1 The order of two http requests sent by the same application may not be consistent with the order in which they arrive at the server, right? What harm will there be if they are inconsistent (a problem that Didi faced during the summer internship at the time, and I only understand what happened now)

Yes, two HTTP requests sent by the same app may arrive at the server in a different order than they were sent. This is due to network delays, server processing speed, and other factors that may cause this to happen. This can cause problems if your application logic relies on the order in which requests are sent.

For example, if one request is to update data and another request is to read that data, then if the read request arrives before the update request, then the old data will be read. Or, if you have two requests, one to create a resource and the other to delete it, and if the delete request arrives first, then you'll get an error that you can't delete a resource that doesn't exist.

In this case, a possible solution is to implement some form of request serialization on the server side, that is, to process requests according to certain rules (such as the timestamp of the request, or some business-related order). However, this may increase server complexity and processing delays.

Another solution is to let the client handle this situation. For example, if the client knows that one request depends on the outcome of another request, it can wait for the first request to complete before sending the second request. Alternatively, the client can include some information in the request to let the server know the relevance and order of the requests.

This problem is very common in distributed systems and is called the "request order problem". Dealing with this problem usually requires some complex design and programming skills.

17.1.2 Solution 1: Guarantee the order of requests on the client side (guaranteed at the business level)

Take the example of a user placing an order before paying in the seckill scenario. Before paying, you can query the redis cache or the mysql database to determine whether the order has been completed. If it is completed, the payment operation is allowed, otherwise it is rejected.

17.1.3 Solution 2: Use the message queue FIFO feature to ensure the sequential consistency of server consumption requests

A message queue is maintained for each user, and all requests from the user enter this queue first, and then the server processes these requests according to the order of the queue (FIFO). Taking Kafka as an example, Kafka creates a new topic for each user, and ensures that the number of partitions of this topic (equivalent to the number of queues) and the number of consumers correspond one-to-one. When the client sends a message to the queue, it must specify a specific partition. The order/payment (these two must be bound together) consumers must consume this specified partition, then the order of consumption can be guaranteed

But there is another problem here, how to ensure that the order in which the messages sent in production arrive in the queue is consistent with the order in which they are sent?

Answer: Use the stop-wait protocol to generate synchronous blocking. The producer will block after sending the message to the queue, and will not continue to send the next message until it receives the confirmation from the server where the queue is located.

But after this setting, it is necessary to perform message deduplication operation, and at the same time ensure the idempotence of the message. Here we will learn from the tcp protocol to ensure idempotence. At the sending end, every time a message is sent, the serial number will be +1 (but the retransmitted request shares the serial number assigned at the beginning), the Kafka side itself will have a mapping from the client id to the serial number, which stores the last message serial number successfully sent by the client, that is It means that the serial number of the resent request must be less than or equal to the serial number of the client stored in the map, and by comparison, we can know whether the request is old.

However, this solution is generally not recommended, because it is well known that the stop-and-wait protocol is very slow, and doing so will only waste Kafka's consumption speed

You can refer to this article: How does Kafka achieve sequential consistency between the sender and receiver?

17.1.4 The real solution one: hard-coding the business layer to ensure order

Rearrange from the business layer, typically check the table before payment to determine whether the order is snapped up and the order is placed successfully

17.1.4 The real solution two: use kafka

(1) The serious solution is not to use Kafka in scenarios that require strict order preservation, such as deduction of inventory operations, it doesn’t matter which user requests the deduction first
. The messages are put into the consumption queue of the same consumption partition. The messages in this partition are consumed sequentially. You only need to ensure that the order of putting them into the partition queue conforms to the dependency.

17.2 Request ordering under multi-machine

17.2.1 A microservice is usually deployed in a cluster, and each instance can provide a full amount of functions. If the order request hits instance A, but the payment request hits instance B, and finally B arrives at the database before A, this order How to ensure inconsistency?

Instance B queries the redis cache or mysql database before payment to determine whether the order has been completed. If it is completed, the payment operation is allowed, otherwise it will fail and then resend. However, it is necessary to ensure the idempotence of the payment operation here (results of multiple requests It is the same as the result state of the first execution), just refer to the tcp idempotence guarantee mentioned above.

17.3 Interview Scenario (Key Points)

17.3.4 Message fairness definition in seckill scenario

Message fairness can have different meanings in different contexts. In some cases, it may refer to the guarantee that messages are processed in the same order as they arrive (first come, first served, such as fair locks in AQS). In other contexts, it might refer to guaranteeing that every sender has a fair chance of getting their message processed, rather than giving priority to certain senders who came first.

In the flash sale scenario, message fairness usually pays more attention to the latter, that is, ensuring that all users have a fair chance to participate in the flash sale, rather than allowing users who request to arrive first have a higher chance. This is usually achieved by means of random selection, purchase limit per person, and equal probability extraction.

As for the first-come-first-serve problem, due to differences in network delays and server processing speeds, it is difficult to implement strict first-come-first-serve in an actual network environment. At the same time, if only the users whose requests arrive first are given priority, users with good network conditions or fast hands may always take the first step, which is not conducive to fairness.

17.3.5 Interviewer: How to ensure the fairness of the message , that is, to ensure that every request has the right to obtain the flash sale of the product? That is, to answer how to confirm which one hundred of these 1000w items can get the product data?

Answer: Among the following four methods, 1 and 2 can only guarantee first-come, first-served fairness, and are not applicable to flash sales scenarios, while 3 and 4 are suitable for flash sales scenarios

Ideally, we want to be able to handle all user requests completely fairly. However, in actual operation, in order to ensure the high concurrency performance of the system, the general practice is that only a certain amount of fairness can be guaranteed when processing a large number of requests, that is, only most requests can be guaranteed to be processed, and the requests cannot be guaranteed to be completely processed. processed in order.

In order to ensure fairness as much as possible, we can adopt the following strategies:

  1. Pre-reduction of inventory: When a user requests a flash sale, we can perform a pre-reduction of inventory in a cache such as Redis. If the pre-reduction is successful, the user's request is sent to the message queue, waiting for subsequent processing. This method can effectively prevent inventory from being oversold, but it cannot fully guarantee fairness.

  2. Use an ordered queue: If we want better fairness, we can consider using an ordered message queue, such as Kafka. This way, we can process the requests in the order they are requested by the user. However, this approach may reduce the concurrent processing capability of the system.

  3. Batch processing: We can also consider batching user requests. For example, we could process a certain number of requests per second, and then randomly select a portion of those requests to process. This method can not only guarantee a certain degree of fairness, but also ensure the concurrent processing capability of the system.

  4. (This may really answer how to confirm 1000w items) Use the lottery system: For ultra-high concurrency scenarios, we can even consider using the lottery system. When a user requests a flash sale, we can allow the user to participate in a lottery, and only the users who are drawn are eligible for the flash sale. This method can effectively prevent oversold inventory and guarantee fairness to a certain extent.

It should be noted that the various strategies mentioned above can only guarantee fairness to a certain extent, but cannot guarantee complete fairness. Because in a high-concurrency environment, complete fairness often means lower performance. Therefore, in actual system design, we need to find an appropriate balance between fairness and performance.

17.3.6 If the lottery system is used (it can be a separate microservice, or it can be generated locally), how to design this system, give a feasible university method, and what algorithm will be used

Designing a lottery system can usually be done according to the following steps:

  1. **User registration:** When a user clicks the seckill button, the user is not directly executed, but is marked as participating in the lottery. This can be achieved by creating a record for each user in the database, which contains information such as user ID and product ID.

  2. **Generate lottery pool:**Before the seckill starts, select some users from all registered users as users participating in the lottery to form a lottery pool. There are many ways to generate a lottery pool, such as random selection, or screening based on the user's purchase history, credit rating and other factors.

  3. **Draw lots:** At the beginning of the seckill, a part of users are randomly selected from the lottery pool as the winning users, and these users have the right to perform seckill operations. There are also many ways to draw lots. For example, a simple random algorithm can be used, or a more complex weighted random algorithm can be used to give users with higher credit ratings a higher probability of winning the lottery.

  4. **Notify the winning users:** After the lottery is over, all users who participated in the lottery need to be notified of the result of the lottery. For the winning user, a notification containing the link of the seckill can be directly sent, and the user can directly perform the seckill operation after clicking the link.

The key to this system is how the lottery is drawn. The easiest way is to use a random algorithm. For example, you can use Java Randomclasses to generate random numbers, and then decide whether to select a certain user based on the random numbers. For more complex scenarios, such as drawing lots based on factors such as the user's credit rating, a weighted random algorithm can be used to generate random numbers based on the weight of each user. The range of random numbers generated by users with larger weights is greater, so the probability of winning Also taller.

Note that the above is just a basic design scheme of a lottery system, and more details may need to be considered in the actual implementation, such as how to deal with network delays, how to ensure the fairness of the lottery, and how to deal with repeated registrations from users.

17.3.7 Introduce the implementation of weighted random method and java code

  1. Implementation of weighted random method

Weighted random method is a method of random selection based on weights. The specific implementation is as follows:

  • First, we need a list to store all users and their weights.
  • Then, we calculate the sum of all user weights.
  • Then, we generate a random number between 0 and the total weight.
  • Finally, we start from the first element of the list, accumulate the weight of each element, and return the element when the accumulated weight is greater than or equal to the random number.

The time complexity of this algorithm is O(n), and the space complexity is also O(n).

Here is the Java implementation of the method:

public class WeightedRandom {
    
    
    private static class Node {
    
    
        int weight;
        String user;
        
        Node(String user, int weight) {
    
    
            this.weight = weight;
            this.user = user;
        }
    }
    
    private List<Node> nodeList = new ArrayList<>();
    private int totalWeight = 0;
    
    public void addUser(String user, int weight) {
    
    
        nodeList.add(new Node(user, weight));
        totalWeight += weight;
    }
    
    public String getRandomUser() {
    
    
        int randomWeight = new Random().nextInt(totalWeight);
        int weightSum = 0;
        for (Node node : nodeList) {
    
    
            weightSum += node.weight;
            if (randomWeight < weightSum)
                return node.user;
        }
        return null;
    }
}

In this code, we first define an inner class Nodeto store users and weights, and then WeightedRandomdefine a list in the main class nodeListto store all Node, and a variable totalWeightto store the sum of all weights. Then, we defined two methods addUserfor adding a user and getRandomUserfor randomly selecting a user based on the weight.

17.3.8 After using the lottery method, is it necessary to use redis to pre-decrease the inventory

After using the lottery method, traffic can be sent directly to the message queue, but it is still a good practice to use redis to pre-reduce inventory, which can further improve the response speed of the system and reduce the pressure on the database.

17.3.9 Considerations of lottery system in high concurrency environment

In a high-concurrency environment, the lottery system needs to consider concurrency control and performance optimization. For example, when a large number of users request a lottery at the same time, it may cause a high load on the system. In order to solve this problem, you can use a current-limiting algorithm, such as the leaky bucket algorithm or the token bucket algorithm, to control the number of concurrent requests; each time only a batch of requests are sent to draw lots

17.3.10 How to deduplicate the undrawn people (prevent it from continuous clicking)

For users who have not been drawn, a tag can be created for each user in the database or cache to mark whether the user has participated in the lottery. Then, when the user requests a lottery, first check this flag. If the user has already participated in the lottery, a prompt message is returned directly, telling the user not to click repeatedly. This method is very efficient and can effectively prevent users from continuously clicking.

18 About overselling

An article I wrote about the solution to the overselling problem

18.1 Solution to the oversold problem in the seckill system :

**The overselling problem is mainly due to the fact that the system cannot guarantee the atomicity of inventory deduction under concurrent conditions. **Common ways to solve oversold problems are as follows:

  • Optimistic locking : Optimistic locking is actually an idea, that is, it is assumed that the system will not have concurrency conflicts most of the time, so it is only checked for conflicts when data is actually written. In the seckill system, you can set a version number (or timestamp) for the product in the database, and compare the version number (or timestamp) every time the inventory is updated, and only when the version number (or timestamp) meets expectations , the update is performed, and the version number (or timestamp) is incremented by one.

  • Pessimistic locking : Compared with the optimistic assumption of optimistic locking, pessimistic locking assumes that concurrency conflicts will occur every time data is read, so it will be locked every time data is read to ensure that it will not be affected by other threads when reading and writing data.

  • Use distributed locks : such as the SETNX command of Redis or distributed coordination services such as Zookeeper and etcd.

  • Database row-level locks : such as row-level locks of the MySQL InnoDB engine.

  • Queue serialization : Put requests into the message queue and process them one by one.

18.2 About the scheme of using etcd distributed lock :

The etcd distributed lock can be used to solve the overselling problem in the seckill system. Before the request hits the database, each request tries to acquire a lock, and only the request that successfully acquires the lock can go to the database to deduct inventory. This can ensure that the operation of inventory deduction is serial, thereby avoiding the problem of overselling.

  • Advantages : It can effectively avoid overselling problems and ensure data consistency.

  • Disadvantage : Concurrency performance may decrease. Because all requests need to try to acquire the lock, if the request volume is very large, it may cause a large number of requests waiting to acquire the lock, thus increasing the response time of the system. At the same time, for the use of distributed locks, it is also necessary to consider the lifecycle management of locks, such as avoiding deadlocks and automatic release of locks when they expire.

A common optimization scheme is to combine the cache, message queue and database, first perform inventory check and pre-decrease inventory in the cache, then put the request into the message queue, and finally let the consumer thread deduct the real inventory from the database in stock. This solution can not only avoid the overselling problem, but also improve the concurrent processing capability of the system.

19 About the multi-level current limiting problem of the micro-service system of seckill

19.1 Multi-level current limiting measures for microservice systems? What is the current limiting relationship between levels?

Nginx, api gateway, rpc framework, and the interface itself may all perform current limiting, and these types of current limiters are allowed to exist at the same time

19.2 How is nginx's current limiter implemented

Nginx current limiting practice under high concurrency
Nginx uses an algorithm called Leaky Bucket (leaky bucket) for current limiting. The basic idea is that all incoming requests (as water enters the leaky bucket) are staged, then processed and sent at a fixed rate (the size of the hole). If too many incoming requests cause the "bucket" to become full, new requests will be rejected or queued.

The current limiting configuration of nginx is usually set in the nginx configuration file, as follows:

nginx
Copy code
http {
    
    
    limit_req_zone $binary_remote_addr zone=mylimit:10m rate=1r/s;

    server {
    
    
        location /request/ {
    
    
            limit_req zone=mylimit burst=5;
        }
    }
}

In this example, limit_req_zone defines a limit zone called mylimit and a rate of 1r/s (one request per second). burst=5 means that short-term burst traffic can be tolerated.

19.2.1 Which is the current limiting of nginx generally aimed at?

Nginx's current limiting is generally based on the number of requests or the rate of requests. Specifically, current limiting can be performed on the following dimensions:

  1. IP address current limit : You can set how many requests each IP address can only send within a certain period of time or the request rate within a certain period of time.

  2. Location or URL current limit : You can limit the current based on a specific request path or URL. For example, for some specific API paths, you can set different traffic limiting policies.

  3. User identity or authentication current limiting : If your system has user identity authentication, you can limit the flow based on user identity to limit the number or rate of requests for each user.

  4. Backend server throttling : Sometimes, you may want to limit the number or rate of requests to the backend server to prevent server overload.

  5. HTTP method current limiting : You can limit the current based on the HTTP request method (GET, POST, PUT, etc.).

  6. Specific User-Agent current limiting : If you want to limit the requests of specific User-Agents, such as crawlers, you can set User-Agent as the current limiting dimension.

  7. Referer current limiting : Sometimes you may want to limit the current based on the Referer information in the request to prevent malicious requests.

In short, Nginx's current limiting is very flexible, and different current limiting dimensions can be selected according to different needs to protect the server from excessive requests and ensure system stability and availability.

19.3 How does api-gateway implement current limiting, taking zuul as an example

Zuul is an API gateway produced by Netflix, usually used in microservice architecture. Current limiting can be achieved through filters.

Pre Filter: Here, you can limit traffic according to different rules (such as IP address, request path, etc.).
Token Bucket Algorithm: Usually used to implement efficient current limiting.
A simple Zuul filter example might look like this:


@Component
public class RateLimitFilter extends ZuulFilter {
    
    
  
  private RateLimiter rateLimiter = RateLimiter.create(1000);  //每秒1000个令牌
  
  @Override
  public String filterType() {
    
    
    return "pre";
  }
  
  @Override
  public int filterOrder() {
    
    
    return -4;
  }
  
  @Override
  public boolean shouldFilter() {
    
    
    return true;
  }
  
  @Override
  public Object run() {
    
    
    if (!rateLimiter.tryAcquire()) {
    
    
      throw new RateLimitException();
    }
    return null;
  }
}

19.4 How dubbo implements service current limiting and fusing

How dubbo implements service current limiting and fusing

19.5 How does the custom current limiter achieve current limitation?

Generally, there are three common algorithms, sliding window, token bucket, and leaky bucket. Choose one of them as the current limiting algorithm, and then cooperate with AOP's pre-notification or annotation and redis to achieve different granularities, different interfaces, and different users. flow.

19.3 The granularity and method of the first flow and the trade-offs of various methods

You can check the 19th point of the article below
for the summary of system design topics

20 How to design the data sheet of the seckill module

20.1 In order to avoid affecting the normal shopping traffic, a separate set of seckill tables are created for the activities that are being sold, including the seckill product table and the seckill order table. Why?

Flash Sales usually lead to a huge influx of traffic, which is a huge stress test for any e-commerce website or application. There are multiple benefits to designing a database table structure specifically for seckill activities:

1. System performance

Lightning deals require very fast read and write operations. By creating separate seckill tables, specific optimizations can be performed on these tables, such as: cache, partition, etc., to improve performance.

2. Data isolation

Normal shopping traffic and Lightning Deal traffic usually have different business rules and data requirements. By isolating this data, you can reduce complexity and keep routine operations unaffected by Lightning Deals.

3. Scalability

A separate deal table means that you can more easily migrate deal-related data and logic to an independent service or database to improve the overall scalability of the system.

4. Simplify troubleshooting

When a problem arises, having a dedicated spike table makes it easier to locate and fix the problem, since you only need to focus on a relatively small and specialized data set.

5. Improve security

Since Lightning Deals are vulnerable to various attacks (e.g., swiping, bots, etc.), the separate data table structure makes it easier to implement specific security measures.

20.2 What is the relationship between the seckill product and order form and the ordinary product order form?

The seckill product and order tables are related to ordinary product and order tables in terms of business logic and usage, but are usually designed as relatively independent data structures. Here is a possible relationship between them:

20.2.1 Data Model

  1. Shared product data : Lightning deals are usually Productsa subset of the common product table ( ). In this case, the deals table ( FlashSaleProducts) might have a field, eg original_product_id, that acts as a foreign key pointing to the corresponding entry in the regular products table.

    FlashSaleProducts
    - original_product_id (Foreign Key -> Products.id)
    
  2. Shared user data : Similarly, the Lightning Deals table ( FlashSaleOrders) and normal orders table ( Orders) may share the same user table ( Users).

    FlashSaleOrders
    - user_id (Foreign Key -> Users.id)
    
    Orders
    - user_id (Foreign Key -> Users.id)
    

20.2.2 Data Flow

  1. Synchronization of product information : If a product appears in both the general product table and the seckill product table, it is necessary to ensure that the product information (such as price, inventory, etc.) in the two tables can be synchronized or at least consistent.

  2. Order Conversion : In some cases, an order that started out as a Lightning Deal may end up converting to a regular order, or vice versa.

20.2.3 Data isolation

Although there is a certain correlation, it is generally recommended to separate data tables related to deals from normal shopping processes. Doing so helps:

  1. Performance optimization : perform specific performance optimization for flash sales scenarios.
  2. Reduced Complexity : Reduce the complexity of handling multiple business logics in a single table.
  3. Security : Lightning deals usually require additional security measures, such as anti-scaling, anti-bots, etc.

Overall, although the Lightning Deals and Orders tables may have some data and business logic overlap with the normal Products and Orders tables, it is usually preferable to design them as separate tables. This allows more flexibility to respond to different performance, security, and business needs.

20.3 So after deducting the product inventory in the seckill, do we need to reduce the inventory reduction in the product table? Similarly, does the seckill order also need to be synchronized to the normal order table?

20.3.1 Commodity Inventory

  1. Synchronization method : If the flash sale product and the normal product share the same inventory, then any deduction of the product inventory during the flash sale will indeed need to be correspondingly deducted in the normal product table.

  2. Separate Inventory : Another way is to allocate separate inventory for Lightning Deals. In this case, the inventory deduction in the seckill table will not affect the inventory in the ordinary product table.

Which method to choose depends on business needs. Shared inventory is more in line with real-world logistics, but may increase system complexity. Independent inventory is easier to manage, but may deviate from the actual logistics status.

20.3.2 Order Synchronization

  1. Asynchronous : Lightning deals and normal orders can be considered as completely separate entities, each maintained in its own table. Doing so simplifies the processing logic and allows specific optimizations for Lightning Deals.

  2. Synchronize to normal order table : After the seckill event ends, or when the order status changes (for example, from 'unpaid' to 'paid'), you can choose to sync the seckill order to the normal order table. Doing so unifies the order processing logic, but increases system complexity.

  3. Flag or status field : A field can also be added to the normal order table to identify whether it is a flash sale order. In this way, ordinary orders and flash sales orders can be managed in the same table, but this will also increase the complexity of query and management.

Whether you need synchronization depends on your specific needs, for example, whether you need to manage all types of orders in a unified interface, or whether it is necessary to integrate all order data into a single analysis system.

Overall, how to handle inventory and order synchronization is a business decision that depends on your specific needs and constraints. Regardless of which approach you choose, you need to ensure system consistency, scalability, and performance.

20.4 Here are a few related tables I created

1 Ordinary product table and seckill product table
insert image description here
insert image description here
2 Ordinary order table and seckill order table

insert image description here
insert image description here

20.5 Sample Data Table Design

Here's a very basic example describing a possible Lightning Deals table and Lightning Deals table:

20.5.1 FlashSaleProducts table (FlashSaleProducts)

  • id: primary key
  • product_name: product name
  • original_price: original price
  • flash_sale_price: Price spike
  • quantity_available: Quantity Available
  • start_time: seckill start time
  • end_time: seckill end time

20.5.2 FlashSaleOrders

  • id: primary key
  • user_id: User ID
  • flash_sale_product_id: seckill product ID (foreign key)
  • quantity: Purchase quantity
  • order_time: order time
  • status: order status (eg, completed, canceled, failed, etc.)

Such a design provides a basic framework that can be further customized and optimized according to specific needs.

21 JMeter pressure measurement

21.1 Talk about the calculation process between QPS and concurrency

QPS (Queries Per Second) is a commonly used indicator to measure database or system performance, which indicates the number of queries or requests that can be processed per second. Related indicators include concurrency (Concurrency) and response time (Response Time).

Calculating the relationship between QPS and concurrency usually involves the following factors:

  1. QPS (Queries Per Second) : QPS refers to the number of queries or requests that can be processed per second. It represents the processing capacity of the system, that is, how many requests can be processed in a unit of time.

  2. Concurrency : Concurrency refers to the number of concurrent requests that exist in the system at the same time. In the case of high concurrency, multiple requests are sent to the system at the same time, so there are multiple concurrent requests.

  3. Response Time (Response Time) : Response time refers to the time required for the system to process a request. It can be the execution time of a query, or the response time of an HTTP request.

The following formula can usually be used to calculate the relationship between QPS and concurrency:

QPS = number of concurrent requests / average response time

In this formula, QPS is an indicator that needs to be calculated, while the number of concurrent requests and average response time are known parameters.

  • Concurrent Requests : This is the number of requests processed by the system at the same time. It can be determined according to the concurrency settings of the system, the length of the request queue, and so on. For example, if 100 requests are sent to the system at the same time, then the number of concurrent requests is 100.

  • Average Response Time : This is the average time it takes the system to process a request. Usually expressed in milliseconds (ms). This value can be obtained by measuring the response time of the system. For example, if the average response time of the system is 10 milliseconds, then the average response time is 10ms.

By substituting the known number of concurrent requests and average response time into the above formula, the QPS of the system can be calculated. This calculation process can help evaluate the performance and load conditions of the system to ensure that the system can handle the expected request volume. It should be noted that QPS and concurrency are two key aspects of system performance, which need to be optimized and monitored according to specific conditions.

21.2 TPS and throughput: TPS refers to throughput

Yes, **TPS (Transactions Per Second) usually refers to the number of transactions processed per second, which is a measure of throughput. **TPS is used to measure the number of transactions that a system or application can successfully complete per unit of time. These transactions can be database transactions, network transactions, transactions, etc., depending on the nature of the application.

TPS is an important performance indicator, which is used to evaluate the processing power and performance of the system. A high TPS indicates that the system can process a large number of transactions in a short period of time, while a low TPS may indicate that the system can process fewer transactions per unit of time. Different applications and scenarios may have different TPS requirements, therefore, TPS is often used to determine whether the system can meet business needs.

It should be noted that TPS usually focuses on the number of successfully completed transactions and does not include failed transactions. It is different from QPS (Queries Per Second). QPS is usually used to measure the number of queries or requests, while TPS is used to measure the number of transactions.

21.2 The difference between throughput (TPS) and qps

TPS loading process: the user initiates a request to the server, and the server performs various processes internally, such as calling other services, querying data, network IO, writing files, etc., and then returns to the user. Such a process counts as a TPS; for example, In an e-commerce website, the complete process of a user buying a product can be regarded as a transaction, which includes multiple steps such as user browsing the product, adding the product to the shopping cart, performing settlement, and paying. This complete process can be counted as a TPS. If there is only one task in a TPS, TPS and QPS are equivalent at this time;

To give an example: accessing a page, getting a response from the access to the client, will generate a TPS, but the loading of the pictures on the page itself may generate a request to the server, resulting in additional multiple qps operations.

Throughput (Throughput) and QPS (Queries Per Second) are both indicators used to measure system performance, but they have some key differences:

  1. Definition :

    • Throughput : Throughput refers to the number of requests or transactions successfully completed by the system per unit of time. It can represent a measure of a system's processing power, including not only the number of queries or requests, but also successfully processed transactions. Throughput is often used to describe the overall performance of a system, including aspects such as data transfer, network bandwidth, and disk I/O .

    • QPS (Queries Per Second) : QPS refers to the number of queries or requests that the system can process within a unit of time. It is commonly used to measure the performance of processing queries or requests, such as database or application servers. QPS focuses primarily on the number of queries or requests, regardless of the success or failure of transactions or data transfers.

  2. Application fields :

    • Throughput : Throughput is usually used to describe the overall performance of the system. For example, the throughput of a web server can represent the number of HTTP requests processed per second, including successful responses and failed responses.

    • QPS : QPS is usually used in database or specific application server performance evaluation, it measures the number of queries or requests processed per second in a database engine or application.

  3. Success and failure :

    • Throughput : Throughput includes successful and failed requests or transactions. It represents the number of all operations completed by the system in unit time.

    • QPS : QPS usually only focuses on the number of successful queries or requests. It does not take into account failed queries or requests.

  4. Uses :

    • Throughput : Throughput is often used to measure the overall performance and stability of a system. It can be used for evaluating system capacity, load testing, and performance tuning.

    • QPS : QPS is mainly used to measure the performance of a specific component or service, for example, the query processing capability of a database engine or the request processing speed of a web server.

In summary, both throughput and QPS are important indicators for measuring system performance, but they focus on slightly different aspects. Throughput takes into account both successful and failed operations, while QPS usually only focuses on the number of successful queries or requests. The choice of which metric to use depends on which aspects of performance you care about and which system components you want to measure.

21.3 If there is only one task in a TPS, TPS and QPS are equivalent; right?

Yes, if a TPS (Transactions Per Second, the number of transactions per second) contains only one task or a sub-operation, then this TPS and QPS (Queries Per Second, the number of queries per second) can be basically equivalent, because they both represent The number of operations or queries per second.

Specifically, TPS is a more general performance indicator that represents the number of transactions successfully completed per unit of time, and transactions can include various types of operations, including queries, calculations, file operations, and more. If a TPS contains only one query task, its quantity is basically equal to QPS, because this TPS only involves one type of operation, that is, query.

QPS is often used to measure the performance of a specific query or request operation, especially in the database field. But if only one specific query operation is involved in the performance test of the system or application, and this query operation is the main operation in the test, then this TPS can be equivalent to QPS because they both measure the same operation frequency.

In summary, the key difference between TPS and QPS is that TPS is more general and can include various types of operations, while QPS is more focused on the performance of queries or requests. In certain cases, if only one task is included in a TPS, it can be equivalent to QPS.

21.3 tps/qps test

Background: Because JMeter only provides TPS measurement tools, so according to the theory in 21.3, here I use JMeter to test only one operation - the interface to query the database to obtain product information, which is equivalent to completing the qps test operation.

21.3.1 Under 1GB memory, single-core cpu, 10MB bandwidth, open 1000 threads, use cache, and test the product list page

15 qps

21.3.2 Under 2GB memory, single-core cpu, and 10MB bandwidth, open 1000 threads, use cache, and test the product list page

60qps

21.3.3 Under 16GB memory, 10MB bandwidth, no cache, open 10,000 threads, use the same user account, and test an interface that returns user input information.

1400qps

21.3.4 Under 16GB memory, 10MB bandwidth, no cache, open 10,000 threads, use the same user account, test the product list page

Note: The product list page involves the operation of querying the database
1100qps

Page optimization: Redis is used below to cache commodity html pages,

insert image description here
After page cache optimization, tps under windows: 2340

Problem: The front and back ends are not separated, the entire page is passed to the front end, and the amount of data transferred is very large

Re-optimization: the front and back end are separated, the back end only needs to pass the json data required by the front end, not the html page

Page static TPS: 2560

21.3.5 1000 threads, loop 10 times, a total of 5w threads, test the product list page

Windows before optimization: 16GB memory, 8-core CPU, 10MB bandwidth, no cache, 1300 TPS
Linux before optimization: 2GB memory, single-core CPU, 10MB bandwidth, no cache, 207 TPS

21.3.6 The test does not do any optimization, including the seckill interface of mq and redis: under 16GB memory, 10MB bandwidth, open 1w threads (1w thread corresponds to 1w users, test 3 times), simulate different users, and go to seckill (emphasis )

Windows before optimization: 785 TPS
Linux before optimization: 175 TPS

Windows after redis cache order optimization: 1185tps

Redis cache order + windows after page static: 1375tps

Page static + redis cache order + redis pre-decrease inventory + rabbitMQ asynchronous order + windows: 2450tps

Page static + redis cache order + multi-level cache deduction inventory (redis pre-decrease inventory + java local cache) + rabbitMQ asynchronous order + windows: 2650tps

21. 4 Why is the qps increased several times by adding 1GB of memory?

In performance testing, increasing the memory usually has a positive impact on the performance of the system, especially when processing a large number of requests or data. In your test, after adding 1GB of memory, QPS (Queries Per Second, the number of queries per second) increased from 15 to 60. There are some possible reasons for this increase:

  1. Larger memory cache : Increased memory provides more space to cache data and query results. When the memory is small, the system may need to frequently read data from the disk, but after increasing the memory, more data can be cached in the memory, reducing disk I/O operations, thereby speeding up query response.

  2. Reduced memory pressure : Smaller memory can cause the system to have to perform memory management and data scrubbing frequently to free up memory space. These operations may degrade system performance. With more memory, the system is more likely to keep data in memory, reducing these overheads.

  3. More concurrent processing capacity : An increase in memory can increase the concurrent processing capacity of the system, because more threads can work in memory at the same time without being limited by memory constraints. This can lead to more concurrent queries and higher QPS.

  4. Database performance optimization : With increased memory, the database server may be able to better utilize memory for caching query plans and data. This can speed up the execution of the query, thus increasing the QPS.

  5. Reduced memory swapping : When memory is low, the operating system may swap some data to disk, which is called memory swapping. Memory swapping can cause a significant performance hit because disk access is much slower than memory. Increasing memory can reduce the occurrence of memory swapping and improve system performance.

It should be noted that the impact of memory increase on system performance is complex, and the specific effect depends on multiple factors such as the nature of the application, database design, and hardware configuration. When performing performance testing, increasing memory is usually an effective means of performance optimization, but resource costs and performance requirements need to be weighed according to the actual situation.

21.4 Why is the page query initiated by 10 threads, the qps is more than 90, but the query initiated by 1000 threads, the qps is only more than 200? Instead of 100 times the original?

This situation may involve multiple factors, including system resource constraints, concurrency control, database performance, network latency, and more. Here are some reasons why this might happen:

  1. System resource limitation : When a large number of threads initiate queries at the same time, system resources (such as CPU, memory, and disk IO) may become a bottleneck. If system resources are insufficient, the system may not be able to efficiently handle requests from a large number of threads, resulting in degraded performance. This may cause QPS to not grow linearly.

  2. Concurrency control : Database systems usually use locks and transactions for concurrency control. When a large number of threads query the database at the same time, lock competition may occur, causing some queries to be blocked, thereby reducing QPS.

  3. Database performance : The performance of the database itself is also a key factor. If the database server has insufficient processing power or is not properly optimized, it may not be able to handle large numbers of concurrent queries, resulting in poor performance.

  4. Network Latency : In distributed systems, network latency can also become a performance bottleneck. When a large number of threads query at the same time, the overhead of network transmission may increase, thereby affecting QPS.

  5. Thread management overhead : Creating and managing large numbers of threads has its own overhead. Thread creation, destruction, and context switching can consume system resources.

21.5 Why is the TPS of the seckill interface only half of the TPS of the product list interface?

Because the seckill interface is to modify and add data to the database, while the product list interface only involves reading data

21.6 million levels of concurrency

21.6.1 What we usually call million-level concurrency means that millions of requests arrive at the backend within one second?

Usually, when we talk about "millions of concurrency", we mean that the system handles millions of concurrent requests in a certain period of time (usually every second). This does not necessarily require precise millions of requests to arrive at the backend per second, but means that the system must process millions of requests within a short time window (usually 1 second).

Million-level concurrency is usually a very high load, which requires powerful hardware resources, optimized system architecture, and efficient processing mechanisms to support it. In some Internet applications, high-performance computing, cloud computing, financial transaction systems and other fields that need to deal with large-scale users or data, million-level concurrency is an important performance indicator, which is of great importance to the scalability and stability of the system. challenge.

It should be noted that this number is usually an abstract concept, and the performance of the actual system will be affected by many factors, including hardware resources, network delay, load balancing, concurrency control, database performance, and so on. In practical applications, performance testing and optimization are required to ensure that the system can run stably under actual load.

21.6.2 Normally, when we talk about "million-level concurrency", we mean that the system handles millions of concurrent requests within a certain period of time (usually every second). The processing here should refer to the processing of the entire microservice system.

Yes, usually, when we talk about "million-level concurrency", we mean that the entire microservice system handles millions of concurrent requests within a certain period of time (usually every second). This concurrent request can cover all microservices and components in the system, including front-end interfaces, back-end services, database access, cache operations, and more. Overall system performance and throughput should be taken into consideration.

In a microservices architecture, a system usually consists of multiple microservices, each with its own responsibilities and functions. Therefore, million-level concurrency is a consideration of the overall performance of the entire system, including the response of the front-end interface, communication between microservices, access to databases and caches, and other related operations. The system needs to be able to handle this concurrent load efficiently to meet the demands of users and ensure high availability.

In practical applications, performance testing and optimization of each microservice and system component is required to ensure that the entire system can run stably under high concurrent load. This includes the allocation of hardware resources, load balancing, cache strategy, database optimization and other aspects of work.

21.6.3 This concurrent request can cover all microservices and components in the system, including front-end interfaces, back-end services, database access, cache operations, and so on. Overall system performance and throughput should be taken into consideration. Does it mean including external http requests and internal service call requests?

Yes, under normal circumstances, the scope of consideration for millions of concurrent requests includes all requests in the system, including but not limited to:

  1. External HTTP requests : This includes requests from clients, such as HTTP requests made by users through browsers or mobile applications. These requests are usually handled by front-end interfaces or API endpoints.

  2. Internal service calls : In a microservice architecture, a system usually consists of multiple microservices that need to communicate with each other to complete their respective tasks. Therefore, calls between internal services also need to be considered, including REST API calls between services, message queue communication, and so on.

  3. Database access : Database is one of the key components of the system, so database access also needs to be taken into consideration. This includes operations that read and write to the database.

  4. Caching operations : Caching is often used to improve system performance, so read and write operations that involve caching also need to be taken into account.

  5. Other system components : Depending on the complexity of the system, requests and operations of other components may also be involved, such as message queue producers and consumers, file system access, etc.

Taken together, millions of concurrent requests are usually a comprehensive consideration of the performance and throughput of the entire system, including various requests and operations inside and outside the system. The system needs to maintain high availability and performance under high concurrent load, so the performance and tuning of each component are very important.

21.6.4 For the seckill interface in 21.3.6, its tps is only more than 2600, how to resist the concurrency of millions?

According to 21.6.2 and 3, it can be seen that the million-level concurrency is not all for a single machine, and the source is also diverse, that is to say, the entire distributed system can resist this number of concurrency. The machines of this system may include hundreds of Service clusters, each cluster includes multiple instances.

Assuming 260w concurrency, each machine can withstand a throughput of 2600tps, assuming that the throughput of the microservice system is linearly related to the number of machines, then from a linear point of view, 260w/2600=1000 machines are needed, but we all know that The rpc interaction between each instance of the microservice system will cause loss, so assuming that the performance loss of the entire distributed system will reach 30%, the actual need is 260w/(2600*0.7)=1428 machines

21.7 Ali side: How did you get the limit data of your stand-alone machine, why can’t tps be higher

21.7.1 Have you done a stress test?

Answer: The tps of the seckill interface has increased from 200 to 2600, which is tested on a single machine

21.7.2 Why more than 2600? How to do pressure test

Answer: Use 10,000 user threads, test three rounds,

21.7.3 Then why do you think the number 2600 is the most reasonable? And can't it be 3600? (Find a time to test again to find out the limit data)

answer:

The number 2600 is considered reasonable based on our stress test results and the following factors:

  1. Resource utilization : During the stress test, we monitored the CPU, memory, disk IO, and network IO of the server. Make sure they are not reaching bottlenecks or exceeding safe thresholds. If resource usage approaches or reaches 100%, service instability may result.

  2. Response time : When reaching 2600 TPS, the system response time we observed is still within an acceptable range, and there is no significant increase in response time due to overload.

  3. Error rate : Under this TPS, our service error rate is kept at a very low level. If TPS continues to increase, the error rate may increase , indicating that the system has exceeded its capacity.

  4. System Stability : During stress testing, the system showed good stability without any crashes or unexpected interruptions .

  5. Consistency across three rounds of testing : We ran three rounds of stress testing and got similar results in each round, which increases our confidence in this number .

  6. Feedback from other system components : In addition to the main seckill interface, there are other system components, such as databases, caches, etc., which also show good stability and performance under this load.

  7. Business needs and risk considerations : Based on our business forecast, 2600TPS meets our expected peak demand, and there is still a certain buffer space to deal with sudden traffic.

To sum up, although theoretically our system may support higher TPS, the number of 2600 is considered to be the most reasonable under the premise of ensuring system stability, response time and resource utilization.

21.7.4 Talk about the parameters of the jmeter summary report

insert image description here

Apache JMeter is a popular open source load and performance testing tool. After executing the test, JMeter can provide a detailed test report to help you analyze and interpret the test results. The following are some common parameters and their meanings in the JMeter summary report:

  1. Label - The name or label of the HTTP request in the test.
  2. # Samples - The total number of completed requests.
  3. Average - Average response time in milliseconds for all requests.
  4. Min - Minimum response time (in milliseconds) for all requests.
  5. Max - Maximum response time in milliseconds for all requests.
  6. Std. Dev. (Standard Deviation) - The standard deviation of the response time. It shows the fluctuation or variation of the response time, if this value is high, it means that the response time has a large uncertainty.
  7. Error % - The percentage of errors. This is the percentage of failed requests.
  8. Throughput - Measures the throughput of the server. It shows how many requests the server was able to handle in a certain time period.
  9. Received KB/sec - The amount of data (KB) received from the server per second.
  10. Sent KB/sec - Amount of data (KB) sent to the server per second.
  11. Median - The median of the response times. 50% of the samples have a response time smaller than it, and the other 50% of the samples have a response time larger than it.
  12. 90% Line - 90% of requests have a response time less than this value, and only 10% of requests have a longer response time. This is a good metric for determining the worst response times most users are likely to experience.
  13. 95% Line - 95% of requests have a response time less than this value.
  14. 99% Line - 99% of requests have a response time less than this value.

Through these parameters, you can conduct a comprehensive analysis and evaluation of the server's performance and responsiveness. If you see any unusual or unexpected values ​​in the summary report, this may be a sign of a system performance problem or bottleneck.

Guess you like

Origin blog.csdn.net/yxg520s/article/details/132521094