2023 Digital Modeling National Competition Question C: Automatic Pricing and Replenishment Decision of Vegetable Commodities - Full Version of Innovation and Multiple Ideas Detailed Explanation (Including Code)

Brief comment on the question: It seems that question C is the simpler of the three questions. It examines more comprehensive points and focuses on data analysis. It involves prediction models and operational optimization (linear programming), and also has an open question, which is suitable for beginners to get started and has a lot of room for development.

Topic analysis and ideas:

Background: In fresh food supermarkets, the shelf life of general vegetable products is relatively short, and the quality deteriorates with the increase of sales time. If most varieties are not sold on the same day, they cannot be resold the next day. Therefore, supermarkets usually restock every day based on the historical sales and demand of each product.

Since there are many varieties of vegetables sold in supermarkets with different origins, and the purchase transaction time of vegetables is usually between 3:00 and 4:00 am, merchants must do this without knowing exactly the specific items and purchase prices. Make replenishment decisions for each vegetable category on the day. The pricing of vegetables generally adopts the "cost-plus pricing" method. Supermarkets usually offer discounts for products that have been damaged during transportation or have deteriorated in quality. Reliable market demand analysis is particularly important for replenishment decisions and pricing decisions. From the demand side, there is often a certain correlation between the sales volume of vegetable commodities and time; from the supply side, the supply varieties of vegetables are relatively abundant from April to October, and the restrictions on the sales space of supermarkets make reasonable sales Combination becomes extremely important.

Appendix 1 gives the product information of six vegetable categories distributed by a certain supermarket;

Attachments 2 and 3 respectively provide relevant data on the sales details and wholesale prices of each commodity in the supermarket from July 1, 2020 to June 30, 2023;

Appendix 4 gives the recent loss rate data of each commodity.

Question 1: There may be certain correlations between different categories or single products of vegetable commodities. Please analyze the distribution patterns and interrelationships of the sales volume of various vegetable categories and single products.

Summary of ideas: This question requires us to use the data given in Attachments 1 and 2 to conduct data analysis, calculate the correlation coefficients of the sales of each category and the sales of each single product to determine the relevant relationship, and perform reasonable regression fitting to further determine the specific mutual relationship. At the same time, the question can perform rich visualization operations (such as heat maps, frequency diagrams, distribution diagrams, regression fitting diagrams, etc.) .

Detailed explanation of the idea: First, extract the relevant variables in Appendix 1 and Appendix 2 for preliminary data viewing: (the code uses python, the software is Jupiter)

import pandas as pd
import numpy as np
df1=pd.read_excel('附件1.xlsx')
df2=pd.read_excel('附件2.xlsx')
print(df2)
print(df2)   

Df1:

Df2:

For dataframe tables with massive data, blank value checks need to be performed first.

print(np.any(df1.isnull()))  # 只要有一个空值便会返回True,否则返回False
print(np.any(df2.isnull()))  # 只要有一个空值便会返回True,否则返回False    

The output is:

False,False

Therefore, there are no missing values ​​and analysis can be performed directly.

·First, based on the data in Appendix 2, analyze the distribution pattern of sales data of each single product .

Since the data set is detailed data for each single product, it needs to be merged with the same single product:

single_item=df2['单品编码'].unique().tolist() #记录不重复的单品编码
df_singlesales=pd.DataFrame(columns=['单品编码','销量'])#创建新的空dataframe
for item in single_item:
    sales=df2[df2['单品编码']==item]['销量(千克)'].sum()
    df_singlesales.loc[len(df_singlesales)] = [item, sales]
print(df_singlesales)   

Directly visualize the sales distribution of each single product:

Due to the huge number of data sets, it is difficult to directly conduct statistical analysis of the data. It is possible to group the data range of the sales column data into reasonable numerical groups and make statistics on the number of single products in each interval to summarize the distribution rules of the sales data.

Then conduct a combined analysis on the categories given in Appendix 1:

·After exploring the distribution patterns of sales volume of each single product and category, it is also necessary to analyze the interrelationship between them.

What is more important is the correlation between the sales data of the six categories. If you want to analyze data correlation, you need to establish relevant time series.

Question 2: Considering that supermarkets make replenishment plans by category, please analyze the relationship between the total sales volume of each vegetable category and cost-plus pricing, and give the next week (July 1-7, 2023) for each vegetable category. The total daily replenishment volume and pricing strategy maximize the profits of supermarkets.

Summary of ideas: This question requires us to use the data given in Appendix 2 and Appendix 3 for data analysis, merge the sales data of the same category and the same date, and analyze the relationship between sales volume and pricing (markup rate) within the category, based on the analysis results Select an appropriate model (linear, quadratic, etc.) for fitting, and note that different categories should have different fitting relationships . After obtaining the specific sales volume and cost-plus pricing relationship, it is necessary to predict the sales volume in the next week based on historical sales data (the prediction method can use any time series prediction method), and formulate a formula based on the predicted sales volume and the obtained sales volume pricing relationship. Appropriate pricing strategy to obtain maximum benefits.

Detailed explanation of the idea: The formula of cost-plus pricing method refers to:  pricing = basic cost × (1 + markup rate)...

Scatter plot

LSTM prediction

Question 3: Due to the limited sales space of vegetable products, the supermarket hopes to further develop a replenishment plan for single products, requiring the total number of single products available for sale to be controlled to 27-33, and the  order quantity of each single product must meet the minimum display quantity  of 2.5 kg. Require. Based on  the varieties available for sale from June 24 to 30, 2023,  the single product replenishment volume and pricing strategy for July 1 are given, so as to maximize the profits of supermarkets and stores while trying to meet the market demand for various types of vegetable commodities.

Summary of ideas: This question requires us to construct a nonlinear programming model under the given constraints to obtain the maximum benefit. Question three gives various constraints, and the relationship function between sales volume and markup rate of each category obtained in question two is used as the actual sales volume calculation function under a given pricing strategy.

First, find the types of goods that can be sold from June 24 to June 30, 2023, recorded as the set Cavailable:

import pandas as pdimport numpy as npdf2=pd.read_excel('Attachment 2.xlsx')C_available=df2[(df2['Sale Date']>='2023-06-24')&(df2['Sale Date'] <='2023-06-30')&(df2['Sales Type'])]['Single Product Code'].unique().tolist() #6.24-6.30 Available varieties n=len(C_available)# Number of items available for sale

Where n is the number of elements in the set, that is, the number of product types that can be sold.

The result is 49, but the actual need is to limit the total number of sales m∈[27,33] (1) .

Let the purchase quantity of each single product be Xi, i=1,2,3,...m

Xi ≥ 2.5 (2) The number of each item IDi ∈C available (3)

(1) (2) (3) constitute the constraints of this question

Assuming the markup rate of each single product αi, corresponding to the six types of single products, the relationship between the markup rate and the maximum sales volume is:

F1(s)=α mosaics

F2(s)=α Cauliflower

F3(s)=α water root

F4(s)=α nightshade

F5(s)=α peppers

F6(s)=α edible fungi

Assuming the actual sales volume is Si, the calculation rule of Si is as follows:

Code:......

Question 4: In order to better make replenishment and pricing decisions for vegetable commodities, what other relevant data do supermarkets need to collect? How can these data help solve the above problems? Please give your opinions and reasons.

Summary of ideas: This question requires us to further explore the problem and improve the replenishment and pricing strategies by supplementing relevant data. Here we need to focus on the data not used in the previous problem solving, such as loss rate (%) and discounts. Secondly, the product space mentioned in the question does not give actual inventory limit information (Question 3 only gives the limited number of single products, but does not give specific affordable inventory space).

You can start with the calculation process of questions 1 to 3 to see whether other available data can be used in each calculation process.

The complete idea code has been sorted out, please see the pan link. If it fails, please send a private message~

Link: https://pan.baidu.com/s/1_Rmh1UZS6uuM_ETvMVC5EA Extraction code: n64s

Guess you like

Origin blog.csdn.net/lichensun/article/details/132767687