2021-08-20

数据推断

目标: 知道数据推断使用的业务场景 使用SQL 完成关键指标推断逻辑

  • 业务背景 确失关键指标
  • 场景 一家餐厅想要通过 数据分析 数据挖掘提升销量 但是在历史记录的数据中 缺少了重要的一个维度 用餐人数

梳理思路 :
1、如何 根据数据确定就餐人数
- 可以将菜单数据分成几类 根据不同种类食物的点单量来推断就餐人数
1、例如 主食类 饮料 小吃 酱料 主菜
- 确定判断的规则:
1、1份主食对应1个人 (1:1)
2、1份主菜对应1个人 (1:1)
3、1分饮料对应1个人 (1:1)
4、2份小吃对应1个人 (0.5:1)
5、蘸料 作料不做计算
判断逻辑图

上升层面 SQL 的处理流程
SQL数据推断处理 业务流程
看一个业务
数据图
数据表详情图

  • 菜单表
    - 商品名称
    - 商品价格
    商品详情

  • 原始的数据中缺少菜品的分类标签 需要先添加菜品类别标签
    - 方法 使用locate 函数进行字符串匹配 使用 case when 语句判断不同条件 创建新字段 food_category(菜品类别)
    - locate 函数介绍 locate(substr , str)返回substr 在str中的位置
    - substr 要查询的字符串
    - str 字段名字 或者是字符串

# 检查表里是否有food_category这个字段 如果有将其删除
DROP TABLE IF EXISTS food_category;
CREATE TABLE food_category AS SELECT   #as 是起别名用的
Item_Name AS item_name,
`Product_Price` AS price,
Restaurant_id AS restaurant_id,
#利用case when 给产品打标签
CASE
	WHEN locate( 'Dahi', Item_Name )> 0 THEN '酸奶'
	WHEN locate( 'wine', Item_Name )> 0
		OR locate( 'COBRA', Item_Name )> 0 THEN '酒'
	WHEN locate( 'water', Item_Name )> 0
		OR locate( 'Coke', Item_Name )> 0
		OR locate( 'Lemonade', Item_Name )> 0 THEN '饮料'
	WHEN locate( 'Rice', Item_Name )> 0 THEN '米饭'
	WHEN locate( 'Chapati', Item_Name )> 0
		OR locate( 'Paratha', Item_Name )> 0OR locate( 'Naan', Item_Name )> 0
		OR locate( 'roti', Item_Name )> 0
		OR locate( 'Papadum', Item_Name )> 0 THEN '饼'
	WHEN locate ( 'Main', Item_Name )> 0 THEN '主菜'
	WHEN locate ( 'Chaat', Item_Name )> 0
		OR locate( 'Muttar', Item_Name )> 0 THEN '小吃'
	WHEN locate( 'Chicken', Item_Name )> 0 THEN '鸡肉类'
	WHEN locate( 'Lamb', Item_Name )> 0 THEN '羊肉类'
	WHEN locate( 'Fish', Item_Name )> 0 THEN '鱼肉类'
	WHEN locate( 'Prawn', Item_Name )> 0
		OR locate( 'Jinga', Item_Name )> 0 THEN '虾类'
	WHEN locate( 'Pakora', Item_Name )> 0 THEN '炸素丸子'
	WHEN LOCATE( 'Saag', Item_Name )> 0 THEN '绿叶菜胡胡'
	WHEN locate( 'Paneer', Item_Name )> 0 THEN '芝士菜'
	WHEN locate( 'Pickle', Item_Name )> 0
		OR locate( 'Chutney', Item_Name )> 0 THEN '腌菜'
	WHEN locate( 'Aloo', Item_Name )> 0 THEN '土豆类'
	WHEN locate( 'Salad', Item_Name )> 0 THEN  '沙拉'
	WHEN locate( 'Tikka', Item_Name )> 0 THEN '烤串'
	WHEN locate( 'Chana', Item_Name )> 0 THEN '豆类'
	WHEN locate( 'Dall', Item_Name )> 0
		OR locate( 'Hari Mirch', Item_Name )> 0 THEN '素菜'
	WHEN locate( 'Puree', Item_Name )> 0 THEN '胡胡'
	WHEN locate( 'Raitha', Item_Name )> 0
		OR locate( 'Raita', Item_Name )> 0 THEN '酸奶沙拉'
	WHEN locate( 'French Fries', Item_Name )> 0 THEN '炸薯条'
	WHEN locate( 'Samosa', Item_Name )> 0 THEN '咖喱角'
	WHEN locate( 'Kehab', Item_Name )> 0
		OR locate( 'Kebab', Item_Name )> 0 THEN '烤串(小食)'
	WHEN locate( 'Bhajee', Item_Name )> 0
		OR locate( 'Bhaji', Item_Name )> 0 THEN '油炸蔬菜团'
	WHEN locate( 'Mushroom', Item_Name )> 0
		OR locate( 'Vegetable', Item_Name )> 0 THEN '蔬菜'
	WHEN locate( 'Starter', Item_Name)> 0 THEN '开胃小吃'
	WHEN locate( 'Sauce', Item_Name)> 0 THEN '酱' ELSE '咖喱菜'
	END AS food_category
FROM restaurant_products_price;

以上是给产品品进行分组 通过 case when 方便后续做数据推断

下一步 :首先查看一下 产品明细表,知道我们数据库内 现有的产品划分找到与我们推理时用到的分类之间的差异。

SELECT
#使用count 对item_name 字段进行去重计数  取别名
	count( DISTINCT item_name ) AS item_num,
	food_category  #是我们给产品分类创建的字段
FROM
	food_category
GROUP BY
	food_category
ORDER BY
	item_num DESC;
  • 在现有的产品明细表里重新定义一列字段,作为我们就餐人数判断的产品分类
drop table if exists food_type;
create table food_type as
SELECT
item_name,
price,
restaurant_id,
food_category,
CASE
WHEN food_category IN ( '鸡肉类', '羊肉类', '虾类', '咖喱菜', '鱼肉类', '主菜', '芝
士菜' ) THEN
'主菜'
WHEN food_category IN ( '饼', '米饭' ) THEN
'主食'
WHEN food_category IN ( '饮料', '酒', '酸奶' ) THEN
'饮料'
WHEN food_category IN ( '酱', '腌菜' ) THEN
'佐料' ELSE '小食'
END AS food_type
FROM
food_category;
  • 再来看一下重新定义过后产品的分类情况
SELECT
count( DISTINCT item_name ) AS item_num,
food_type,
food_category
FROM
food_type
GROUP BY
food_type,
food_category
ORDER BY
food_type,
food_category,
item_num DESC;
  • 将交易明细表与我们刚定义的产品明细表进行关联,把产品分类和价格加入明细当中
SELECT
a.*,
b.food_type,
b.price
FROM
restaurant_orders a
JOIN food_type b ON a.`Item_Name` = b.item_name
AND a.Restaurant_Id = b.restaurant_id;
  • 目前一行记录是一笔订单的一种产品的售卖情况,如果一笔订单有若干样产品会分成若干行,我们希望把一笔订单的详情,从多行统一到一行中,同时用我们事先定义的系数计算
select
a.`Order_Number`,a.`Order_Date`,a.restaurant_id,round(sum(a.Quantity*b.price),2) as total_amount,
sum(case when food_type='主食' then a.Quantity*1 else 0 end) as staple_food_count,
sum(case when food_type='主菜' then a.Quantity*1 else 0 end) as main_course_count,
sum(case when food_type='饮料' then a.Quantity*1 else 0 end) as drink_count,
sum(case when food_type='小食' then a.Quantity*0.5 else 0 end) as snack_count
from restaurant_orders a join food_type b
on a.`Item_Name`=b.item_name and a.Restaurant_Id=b.restaurant_id
group by a.`Order_Number`,a.`Order_Date`,a.Restaurant_Id;
  • 比较主食,主菜,饮料,小食中的最大者
select
c.*,
GREATEST(c.staple_food_count,c.main_course_count,c.drink_count,c.snack_count) as max_count
from
(select
a.`Order_Number`,a.`Order_Date`,a.restaurant_id,round(sum(a.Quantity*b.price),2) as
total_amount,
sum(case when food_type='主食' then a.Quantity*1 else 0 end) as staple_food_count,
sum(case when food_type='主菜' then a.Quantity*1 else 0 end) as main_course_count,
sum(case when food_type='饮料' then a.Quantity*1 else 0 end) as drink_count,
sum(case when food_type='小食' then a.Quantity*0.5 else 0 end) as snack_count
from restaurant_orders a join food_type b
on a.`Item_Name`=b.item_name and a.Restaurant_Id=b.restaurant_id
group by a.`Order_Number`,a.`Order_Date`,a.Restaurant_Id) c;
  • 最后,增加向下取整的逻辑,并且确保最小就餐人数为1
select c.*,
GREATEST(floor(GREATEST(c.staple_food_count,c.main_course_count,c.drink_count,c.snack
_count)),1) as customer_count from
(select
a.`Order_Number`,a.`Order_Date`,a.restaurant_id,round(sum(a.Quantity*b.price),2) as
total_amount,
sum(case when food_type='主食' then a.Quantity*1 else 0 end) as staple_food_count,
sum(case when food_type='主菜' then a.Quantity*1 else 0 end) as main_course_count,
sum(case when food_type='饮料' then a.Quantity*1 else 0 end) as drink_count,
sum(case when food_type='小食' then a.Quantity*0.5 else 0 end) as snack_count
from restaurant_orders a join food_type b
on a.`Item_Name`=b.item_name and a.Restaurant_Id=b.restaurant_id
group by a.`Order_Number`,a.`Order_Date`,a.Restaurant_Id) c;
  • 新建一张表格,将两家店每一笔交易的时间,金额,就餐人数,主食,饮料,小食数存储进去
DROP TABLE IF EXISTS restaurants_orders_customer_count;outlier_count total_count outlier_rate
13 33055 0.0004
检查一下之前担心的饮料过多导致的推测异常占比
有了就餐人数之后,我们还可以进行进一步分析
CREATE TABLE restaurants_orders_customer_count AS SELECT
c.*, GREATEST( floor( GREATEST( c.staple_food_count, c.main_course_count,
c.drink_count, c.snack_count )), 1 ) AS customer_count
FROM
(SELECT
a.`Order_Number`,
a.`Order_Date`,
a.restaurant_id,
round( sum( a.Quantity * b.price ), 2 ) AS total_amount,
sum( CASE WHEN food_type = '主食' THEN a.Quantity * 1 ELSE 0 END ) AS
staple_food_count,
sum( CASE WHEN food_type = '主菜' THEN a.Quantity * 1 ELSE 0 END ) AS
main_course_count,
sum( CASE WHEN food_type = '饮料' THEN a.Quantity * 1 ELSE 0 END ) AS
drink_count,
sum( CASE WHEN food_type = '小食' THEN a.Quantity * 0.5 ELSE 0 END ) AS
snack_count
FROM
restaurant_orders a
JOIN food_type b ON a.`Item_Name` = b.item_name
AND a.Restaurant_Id = b.restaurant_id
GROUP BY
a.`Order_Number`,
a.`Order_Date`,
a.Restaurant_Id
) c;
  • 检查一下之前担心的饮料过多导致的推测异常占比
SELECT
count( CASE WHEN drink_count >= 5 THEN `Order_Number` ELSE NULL END ) AS
outlier_count,
count(*) AS total_count,
round( count( CASE WHEN drink_count >= 5 THEN `Order_Number` ELSE NULL END )/
count(*), 5 ) AS outlier_rate
FROM
restaurants_orders_customer_count;

在这里插入图片描述

  • 有了就餐人数之后,我们还可以进行进一步分析
SELECT
restaurant_id,
avg( customer_count ) AS avg_cc,
avg( total_amount ) AS ta,
avg( total_amount / customer_count ) AS avg_scc,
avg( staple_food_count / customer_count ) AS avg_staple,
avg( main_course_count / customer_count ) AS avg_main,
avg( drink_count / customer_count ) AS avg_drink,
avg( snack_count / customer_count ) AS avg_snake
FROM
restaurants_orders_customer_count
group by restaurant_id;
  • 数据推断的价值:
    数据本身就是资产,而获取新数据的成本不断增加,数据推断的价值愈发突出
    数据推断是基于已经存在的数据资产,发掘出来的“新”数据,相当于是已较低的成本增加了数据资产

猜你喜欢

转载自blog.csdn.net/qq_59472803/article/details/119828245