版权声明:未经同意,不得转载。 https://blog.csdn.net/qq_36235275/article/details/82502533
需求
- 统计所有订单中每年的销售单数、销售总额
- 统计每年最大金额订单的销售额
- 统计每年最畅销货品(哪个货品销售额amount在当年最高,哪个就是最畅销货品)
我们首先需要在scala里连接hive,然后创建这三个表,导入数据。创建以及导入数据与操作Hive一致。在Spark SQL中,可以完全接管Hive,一切操作都可以在spark中实现。主要是将Hive的一个配置文件hive-site.xml导入到Spark的config配置中。
主要实现sql语句如下:
print("---------1、所有订单中每年的销售单数、销售总额------------")
spark.sql("select c.theyear,count(distinct a.ordernumber),sum(b.amount) " +
"from tbStock a join tbStockDetail b on a.ordernumber=b.ordernumber " +
"join tbDate c on a.dateid=c.dateid group by c.theyear order by c.theyear")
print("---------2.1、先求出每份订单的销售额以其发生时间----------")
spark.sql("select a.dateid,a.ordernumber,sum(b.amount) as sumofamount " +
"from tbStock a join tbStockDetail b on a.ordernumber=b.ordernumber " +
"group by a.dateid,a.ordernumber")
print("---------2.2、求出每年最大金额订单的销售额----------------------")
spark.sql("select c.theyear,max(d.sumofamount) from tbDate c " +
"join (select a.dateid,a.ordernumber,sum(b.amount) as sumofamount " +
"from tbStock a join tbStockDetail b on a.ordernumber=b.ordernumber " +
"group by a.dateid,a.ordernumber ) d on c.dateid=d.dateid " +
"group by c.theyear sort by c.theyear")
print("---------3.1、求出每年每个货品的销售金额-----------------")
spark.sql("select c.theyear,b.itemid,sum(b.amount) as sumofamount from tbStock a join tbStockDetail b on a.ordernumber=b.ordernumber join tbDate c on a.dateid=c.dateid group by c.theyear,b.itemid")
print("---------3.2、求出每年单品销售的最大金额-----------------")
spark.sql("select d.theyear,max(d.sumofamount) as maxofamount from (select c.theyear,b.itemid,sum(b.amount) as sumofamount from tbStock a join tbStockDetail b on a.ordernumber=b.ordernumber join tbDate c on a.dateid=c.dateid group by c.theyear,b.itemid) d group by d.theyear")