文章目录

一、分布统计
二、时间数据可视化
- 1、时间轴的定义
- 2.日历图

一、分布统计

某个特征下值的分布情况的可视化
检查特征的分布情况可以了解数据对象的很多特性

1、基于value_counts统计值的分布情况

value_count() 的参数：

normalize : boolean, default False ，如果为True，则返回的对象将包含唯一值的相对频率。
sort : boolean, default True ，按值排序
ascending : boolean, default False ，按频率计数升序排序
bins : integer, optional 而不是数值计算，把它们分成半开放的箱子，一个方便的pd.cut，只适用于数字数据
dropna : boolean, default True ，不包括NaN的数量。

对数据进行预处理：

import numpy as np
import pandas as pd
import json
from pandas.io.json import json_normalize
import pyecharts as pe
from collections import Counter

op1=open(r'D:\python学习\数据分析与可视化数据\shoes.json', 'r',encoding='utf-8')
li=[]
dict1={
    
    }
for i in op1:
    k=json.loads(i.encode("utf-8"))#把字符串转换为json
    li.append(k)
a=json_normalize(li)#把由json数据构成的列表转换成数据框

a.sales=a.sales.str.split("人",expand=True)[0]
a.sales = a.sales.astype(np.int64)#转换列的类型为整数
a.price = a.price.astype(np.float)

统计加排序：

from pyecharts.globals import ThemeType
from pyecharts.faker import Faker
from pyecharts import options as opts
from pyecharts.charts import Bar
from pyecharts.charts import Line

p1 = a.price.value_counts()#按价格进行统计
p1

p1.sort_index(inplace=True)#对p1的索引进行排序，做线图需要排序

数据处理好后绘制图：

#排好序后，我吧索引作为x轴，值作为y轴
f1=Line().add_xaxis(p1.index.tolist()).add_yaxis("price", p1.tolist(),is_smooth=True)
f1.set_global_opts(title_opts=opts.TitleOpts(title="意尔康男鞋分析"),
                   xaxis_opts=opts.AxisOpts(type_="value"),
                  datazoom_opts=opts.DataZoomOpts(is_show=True))
f1.set_series_opts(label_opts=opts.LabelOpts(is_show=False))
f1.render_notebook()

在这里插入图片描述
上面的图可以看出价格整10效应很明显，如果我们想看分布的形态，不需要如此的详细分布则需要取bins

p2=a.sales.value_counts(bins=500,sort=False)#bins是来划分500 个区间的

x=[]
y=[]
for i in p2.items():
    #print(i[0].mid,i[1])
    x.append(i[0].mid)#区间数据类型用mid可以得到区间的中值
    y.append(i[1])

f1=Line().add_xaxis(x).add_yaxis("price", y,is_smooth=True)
f1.set_global_opts(title_opts=opts.TitleOpts(title="意尔康男鞋分析"),
                   xaxis_opts=opts.AxisOpts(type_="value"),
                  datazoom_opts=opts.DataZoomOpts(is_show=True))
f1.set_series_opts(label_opts=opts.LabelOpts(is_show=False))
f1.render_notebook()

在这里插入图片描述

2、基于histogram进行含权统计

如果我们不仅想知道某区间的商品数量，并且还想知道在此区间商品的销售情况，这样就需要引入销量作为权值进行含权统计。
首先理解它的基本用法：
在这里插入图片描述
例子

y,x=np.histogram(a.price.values,np.linspace(a.price.min(),700,50),weights=a.sales.values)
#把商品销量作为区间内求和的权重，来计算价格区间内的销量
f1=Line().add_xaxis(x.tolist()).add_yaxis("price", [0]+y.tolist(),is_smooth=True)
#由于上面区间值比区间要多一个数，为了对应在区间量里增加了一个值，以满足对应需求
f1.set_global_opts(title_opts=opts.TitleOpts(title="意尔康男鞋分析"),
                   xaxis_opts=opts.AxisOpts(type_="value"),
                  datazoom_opts=opts.DataZoomOpts(is_show=True))
f1.set_series_opts(label_opts=opts.LabelOpts(is_show=False))
f1.render_notebook()
#200左右的极大值相比300左右极大值区间，前者商品数量多但是后者虽然数量少但是价格高，说明300左右的竞争小很多
#30左右的皮鞋销量很异常

在这里插入图片描述

3、通过箱形图来可视化了解数据分位数情况

apply可以用来对groupby后分组数据进行自定义操作
Boxplot对数据的要求特殊

c = a.groupby('info.鞋面材质').size()
def c1(x):
    return x.price.values
p3=a.groupby("info.鞋面材质").apply(c1)

from pyecharts.charts import Boxplot

x=["PU","二层牛皮（除牛反绒）","头层牛皮（除牛反绒）","人造革"]
y=[p3[x[0]],p3[x[1]],p3[x[2]],p3[x[3]]]
c = Boxplot()
c.add_xaxis(x).add_yaxis("A", c.prepare_data(y))
c.set_global_opts(title_opts=opts.TitleOpts(title="BoxPlot-基本示例"))
c.render_notebook()
#头层牛皮最贵
#材质越差，价格波动空间越小，好材质，定价空间更大

结果：
在这里插入图片描述

二、时间数据可视化

1、时间轴的定义

读入数据:

r1=pd.read_csv(r"D:\python学习\数据分析与可视化数据\t_alibaba_data3.txt",names=["user","brand","behavr","date"],sep="\t",dtype={
    
    "behavr":int})
#pandas会自己判断数据类型，但是有时也需要自己额外指定数据类型
r1.head()

在这里插入图片描述
转换格式：

r1.date="2011/"+r1.date
r1.date=pd.to_datetime(r1.date)
r1.head()

在这里插入图片描述
以时间分组：

t1=r1.groupby(pd.Grouper(key="date")).size()#根据时间来分组
t1.sort_index(inplace=True)
t1

在这里插入图片描述
绘制图：

t1.index.tolist()
f1=Line().add_xaxis(t1.index.tolist()).add_yaxis("price", t1.tolist(),is_smooth=True)
f1.set_global_opts(title_opts=opts.TitleOpts(title="意尔康男鞋分析"),
                   xaxis_opts=opts.AxisOpts(type_="time"))
f1.set_series_opts(label_opts=opts.LabelOpts(is_show=False))
f1.render_notebook()

在这里插入图片描述

2.日历图

数据：
在这里插入图片描述
日历图需要的格式：

data=[]
for i in t1.items():
    data.append(i)##构建日历图需要的格式
    
    
data

绘制图：

import datetime
import random

from pyecharts import options as opts
from pyecharts.charts import Calendar

c1=Calendar()
c1.add("", data, calendar_opts=opts.CalendarOpts(range_=["2011-04-15","2011-08-15"]))
c1.set_global_opts(
            title_opts=opts.TitleOpts(title="Calendar-2011年品牌销售量"),
            visualmap_opts=opts.VisualMapOpts(
                max_=3000,#基于真实数据范围来定义
                min_=40,
                orient="horizontal",
                is_piecewise=True,
                pos_top="230px",
                pos_left="100px",
            ),
        )
c1.render_notebook()

结果如下：
在这里插入图片描述

分布统计与时间数据可视化

文章目录

一、分布统计

1、基于value_counts统计值的分布情况

2、基于histogram进行含权统计

3、通过箱形图来可视化了解数据分位数情况

二、时间数据可视化

1、时间轴的定义

2.日历图

猜你喜欢