笔者是一个痴迷于挖掘数据中的价值的学习人,希望在平日的工作学习中,挖掘数据的价值,找寻数据的秘密,笔者认为,数据的价值不仅仅只体现在企业中,个人也可以体会到数据的魅力,用技术力量探索行为密码,让大数据助跑每一个人,欢迎直筒们关注我的公众号,大家一起讨论数据中的那些有趣的事情。
我的公众号为:livandata
网站的分析是数据分析师的一项常用技能,主要是通过了解客户、网站、业务逻辑构建以套完整的指标体系,协助分析师了解网站的运营情况,这一分析可以协助产品经理熟悉网站不同功能的使用情况,协助营销人员了解哪一类客户喜欢什么样的产品,协助运营人员了解流程上的不合理,总之,在现在情况下,网站分析为企业决策人员了解企业提供了一个线上的全量数据集。
网站分析可以分为:日志分析、流量分析、客群分析等;
常用的指标有:
1)页面停留时长;
2)跳出率/退出率;
3)页面深度分析;
4)唯一身份浏览量;
5)访问者/唯一身份访问者;
6)访问频率/新老客群区分;
7)曝光量/曝光达标率/唯一曝光量/曝光频率;
8)点击量/点击达标率/唯一点击量/点击频率;
9)曝光点击率;
10)转化率;
11)综合浏览量(PV)/独立访问用户量(UV)。
由于近期项目需要,构建了一套转化率分析代码,用户漏斗构建,本文先对转化率进行分析,后面会逐渐的分析剩下的各个指标。
项目的起源是统计某APP中,用户在每个页面上的转化率,常用的思路为先计算每个页面的PV和UV,然后计算一个页面访问流程中不同节点的页面的转化率。
即:PV转化率=PV(第i+1节点)/PV(第i节点)。
UV转化率=UV(第i+1节点)/UV(第i节点)。
文中由于不确定UV的统计指标,所有构建了两个指标(UV和UV_M),思路上没有影响,后续比较哪个更准一点即可。
在项目进展过程中遇到了一些问题,即python的运算平台与数据平台,无法实现网络联通,请教了公司的前辈,确定了如下思路,在此项目中曲线救国,通过代码拼写出HQL语言,通过文件手动传入到数据平台上,每天运行HQL,将数据结果存储在csv的文件中,再手动将数据传入到运算平台,计算出转化率,并通过移动平滑构建转化率监控平台。
因此,项目中的流程为:
1)整理需要构建漏斗的流程,按照固定的格式存入到excel文件中;
2)按照excel中的内容格式拼写HQL;
3)将生成的HQL传到数据平台,定时抽取每天、每个漏斗、每个节点的PV/UV值;
4)将数据传入到运算平台,汇总节点,计算他们对应的PV/UV转化率;
5)将转化率呈现在漏斗中;
6)将每天转化率的值传入到监控平台,按天监控每个流程的转化率。
用到的漏斗模版为:
funnel_name | funnel_time_start | funnel_time_end | funnel_node_name | step | event_type | event_name | action | page_id | page_name | product_id | inner_id | push_id | last_page_id | label_id | label | context_id | context | datetime |
login | 20190203 | 20190305 | login_page | 1 | page_level | 页面级别 | 5 | page_login_1 | login_page | 20190203 | ||||||||
login | 20190203 | 20190305 | login_label | 2 | label_level | 区域级别 | 5 | page_login_1 | login_page | label_login_1 | login_label | 20190203 | ||||||
20190203 | 20190305 | print_page | 1 | page_level | 页面级别 | 5 | page_print_1 | print_page | 20190203 | |||||||||
20190203 | 20190305 | print_page | 1 | page_level | 页面级别 | 5 | page_print_2 | print_page | 20190203 | |||||||||
20190203 | 20190305 | print_page | 1 | page_level | 页面级别 | 5 | page_print_3 | print_page | 20190203 | |||||||||
20190203 | 20190305 | print_label | 2 | label_level | 区域级别 | 5 | page_print_2 | print_page | page_print_1 | label_print | print_label | 20190203 | ||||||
20190203 | 20190305 | print_label | 2 | label_level | 区域级别 | 5 | page_print_4 | print_page | page_print_1 | label_print | print_label | 20190203 |
关于数据的解读:
1)页面流程有三个级别:页面级别、区域级别和事件级别。
2)同一个页面中有多个区域区、同一个区域区有多个事件,用户可以操作其中的任意节点,用户操作区域区时页面首先有值,操作事件区时前两个也有值。
3)由于数据量比较大,数据平台将数据切片成按天的数据,每天一个HQL,统计数据,然后合并。
对应的代码为:
1)funnel_build文件:
#!/usr/bin/env python
# _*_ UTF-8 _*_
import pandas as pd
import datetime
# 构建漏斗类:
class Funnel_build(object):
def __init__(self, path):
self.path = path
def get_data(self):
path = self.path
funnel_data = pd.read_excel(path, dtype='O').fillna('nan')
return funnel_data
def get_sql(self,
funnel_name,
funnel_time_start,
funnel_time_end,
funnel_node_name,
step,
event_type,
event_name,
action,
page_id,
page_name,
product_id,
inner_id,
push_id,
last_page_id,
label_id,
label,
context_id,
context,
cond):
sql = 'SELECT' \
'\n\t %s funnel_name,' \
'\n\t %s funnel_time_start,' \
'\n\t %s funnel_time_end,' \
'\n\t %s funnel_node_name,' \
'\n\t %s step,' \
'\n\t %s event_type,' \
'\n\t %s event_name,' \
'\n\t %s action,' \
'\n\t %s page_id,' \
'\n\t %s page_name,' \
'\n\t %s product_id,' \
'\n\t %s inner_id,' \
'\n\t %s push_id,' \
'\n\t %s last_page_id,' \
'\n\t %s label_id,' \
'\n\t %s label,' \
'\n\t %s context_id,' \
'\n\t %s context,' \
'\n\t t.dt datetime,' \
'\n\t count(1) pv,' \
'\n\t size(collect_set(t.becif_no)) uv,' \
'\n\t count(distinct t.mid) uv_m' \
'\nFROM mid.tracker_action_event t' \
'\nWHERE' \
'\n\t %s' % ("'"+str(funnel_name)+"'",
"'"+str(funnel_time_start)+"'",
"'"+str(funnel_time_end)+"'",
"'"+str(funnel_node_name)+"'",
"'"+str(step)+"'",
"'"+str(event_type)+"'",
"'"+str(event_name)+"'",
"'"+str(action)+"'",
"'"+str(page_id)+"'",
"'"+str(page_name)+"'",
"'"+str(product_id)+"'",
"'"+str(inner_id)+"'",
"'"+str(push_id)+"'",
"'"+str(last_page_id)+"'",
"'"+str(label_id)+"'",
"'"+str(label)+"'",
"'"+str(context_id)+"'",
"'"+str(context)+"'",
cond)
return sql
def add_times(self, funnel_time_start, funnel_time_end):
period = []
# 字符串转换为datetime类型
times1 = datetime.datetime.strptime(str(funnel_time_start), '%Y%m%d')
times2 = datetime.datetime.strptime(str(funnel_time_end), '%Y%m%d')
# 利用datetime计算时间差并格式化输出
times = str(times2 - times1).split(',')
times = times[0].split(' ')
for j in range(0, int(times[0])):
delta = datetime.timedelta(days=j)
next_day = times1 + delta
next_day = str(next_day).split(' ')[0]
next_day = next_day.split('-')
next_day = next_day[0] + next_day[1] + next_day[2]
period.append(next_day)
return period
def join_sqls(self, sql_list):
j0 = '\n\nunion all\n'
return j0.join(sql_list)
def built_sql(self):
sqls = []
funnel_data = self.get_data()
for i in range(len(funnel_data)):
if((funnel_data['page_id'][i] != 'nan')&
(funnel_data['label_id'][i] == 'nan')&
(funnel_data['context_id'][i] == 'nan')):
cond="\nand\tt.page_id='%s'" %(funnel_data['page_id'][i])
if ((funnel_data['page_id'][i] != 'nan')&
(funnel_data['label_id'][i] != 'nan')&
(funnel_data['context_id'][i] == 'nan')):
cond1 = "\nand\tt.page_id='%s'" % (funnel_data['page_id'][i])
cond2 = "\nand\tt.label_id='%s'" % (funnel_data['label_id'][i])
cond = cond1 + '\t' + cond2
if ((funnel_data['page_id'][i] != 'nan')&
(funnel_data['label_id'][i] != 'nan')&
(funnel_data['context_id'][i] != 'nan')):
cond1 = "\nand\tt.page_id='%s'" % (funnel_data['page_id'][i])
cond2 = "\nand\tt.label_id='%s'" % (funnel_data['label_id'][i])
cond3 = "\nand\tt.name_id='%s'" % (funnel_data['context_id'][i])
cond = cond1 + '\t' + cond2 + '\t' + cond3
cond_time = self.add_times(funnel_data['funnel_time_start'][i],
funnel_data['funnel_time_end'][i])
for j in range(len(cond_time)):
cond_t = "t.dt='" + cond_time[j] + "'\t"
conds = cond_t + cond
sql = self.get_sql(
funnel_data['funnel_name'][i],
funnel_data['funnel_time_start'][i],
funnel_data['funnel_time_end'][i],
funnel_data['funnel_node_name'][i],
funnel_data['step'][i],
funnel_data['event_type'][i],
funnel_data['event_name'][i],
funnel_data['action'][i],
funnel_data['page_id'][i],
funnel_data['page_name'][i],
funnel_data['product_id'][i],
funnel_data['inner_id'][i],
funnel_data['push_id'][i],
funnel_data['last_page_id'][i],
funnel_data['label_id'][i],
funnel_data['label'][i],
funnel_data['context_id'][i],
funnel_data['context'][i],
conds)
sqls.append(sql)
sqls_total = self.join_sqls(sqls)
return sqls_total
2)常规工具代码:Funnel_utils
#!/usr/bin/env python
# _*_ UTF-8 _*_
import pandas as pd
import numpy as np
def ch_dtype(df):
dtype = dict(funnel_id=np.str, idx=np.int, page_id=np.str,
page_name=np.str, pv=np.int64, date_=np.str,
funnel_name=np.str, uv=np.int64, dt=np.str,
product_id=np.str, push_id=np.str, inner_id=np.str)
return df.astype(dtype)
def write(string, path):
with open(path, 'w') as f:
f.write(string)
def write_excel(df, path):
writer = pd.ExcelWriter(path)
try:
df.to_excel(writer)
except Exception as e:
raise e
finally:
writer.close()
def set_ch_font():
return FontProperties(fname='../data/font/msyh.ttf')
def read_page_info():
return pd.read_csv('info/page_info.csv', index_col = 0), set_index('page_id')
def write_csv(df, path):
df.to_csv(path)
def read_funnel_info():
return pd.read_table('../data/funnel_info.txt').fillna('nan')
def read_funnel_info_xls():
return pd.read_excel('../data/funnel_infos.xlsx').fillna('nan')
def join_sqls(sql_list):
j0 = '\n\nUNION ALL\n'
return j0.join(sql_list)
def format_cols(df):
df.columns = [c.split('.')[1] for c in df.columns]
def product(x):
def _product(x, y):
if x:
if y:
z = []
for i in x.pop():
for k in y:
if isinstance(k, list):
ik = [i]
ik.extend(k)
else:
z.append([i, k])
y = z
else:
y = x.pop()
return _product(x, y)
y = []
return _product(x, y)
def flow_means(data):
ms = []
for i in range(len(data)):
sums = 0
for j in range(i, i + 10):
if (j < len(data)):
sums = sums + data[j]
else:
sums = sums + 0
if (len(data) - i >= 10):
means = sums / 10
else:
means = sums / (len(data) - i)
ms.append(means)
return ms
3)数据的整理代码为:Funnel_data
#!/usr/bin/env python
# _*_ UTF-8 _*_
import pandas as pd
class Funnel_data(object):
def __init__(self, path):
self.path = path
def get_data(self):
path = self.path
result_data = pd.read_excel(path, dtype='O').fillna('nan')
return result_data
def get_funnels_name(self):
path = self.path
result_data = pd.read_excel(path, dtype='O').fillna('nan')
funnel_names = result_data['funnel_name'].drop_duplicates().tolist()
return funnel_names
# 单步转化率:本漏斗、本节点、所有天的求和。
def result_calculate_single(self):
result_data = self.get_data()
funnel_data = result_data[['funnel_name', 'step', 'pv', 'uv', 'uv_m']]
sum_pv = funnel_data['pv'].groupby([funnel_data['funnel_name'], funnel_data['step']]).sum()
sum_uv = funnel_data['uv'].groupby([funnel_data['funnel_name'], funnel_data['step']]).sum()
sum_uv_m = funnel_data['uv_m'].groupby([funnel_data['funnel_name'], funnel_data['step']]).sum()
sum_data = pd.DataFrame(pd.concat([sum_pv, sum_uv, sum_uv_m], axis=1)).reset_index()
funnel_names = sum_data['funnel_name'].drop_duplicates()
funnel_pv_rates = {}
funnel_uv_rates = {}
funnel_uv_m_rates = {}
for funnel_name in funnel_names:
data = sum_data[sum_data['funnel_name'] == funnel_name]
# pv计算:
pv_rates = []
for i in range(1, 7):
t1 = data[data['step'] == i+1]['pv']
t2 = data[data['step'] == i]['pv']
if(len(t1.values) != 0):
pv_rate = float(t1)/float(t2)
pv_rates.append(round(pv_rate, 6))
else:
pv_rate = 0
pv_rates.append(pv_rate)
funnel_pv_rates[funnel_name] = pv_rates
# uv计算:
uv_rates = []
for i in range(1, 7):
t1 = data[data['step'] == i + 1]['uv']
t2 = data[data['step'] == i]['uv']
if (len(t1.values) != 0):
uv_rate = float(t1) / float(t2)
uv_rates.append(round(uv_rate, 6))
else:
uv_rate = 0
uv_rates.append(uv_rate)
funnel_uv_rates[funnel_name] = uv_rates
# uv_m计算:
uv_m_rates = []
for i in range(1, 7):
t1 = data[data['step'] == i + 1]['uv_m']
t2 = data[data['step'] == i]['uv_m']
if (len(t1.values) != 0):
uv_m_rate = float(t1) / float(t2)
uv_m_rates.append(round(uv_m_rate, 6))
else:
uv_m_rate = 0
uv_m_rates.append(uv_m_rate)
funnel_uv_m_rates[funnel_name] = uv_m_rates
funnel_pv = pd.DataFrame(funnel_pv_rates)
funnel_uv = pd.DataFrame(funnel_uv_rates)
funnel_uv_m = pd.DataFrame(funnel_uv_m_rates)
funnel_pv_single = funnel_pv.rename({0: '1-->2', 1: '2-->3', 2: '3-->4', 3: '4-->5', 4: '5-->6', 5: '6-->7'},
axis='index')
funnel_uv_single = funnel_uv.rename({0: '1-->2', 1: '2-->3', 2: '3-->4', 3: '4-->5', 4: '5-->6', 5: '6-->7'},
axis='index')
funnel_uv_m_single = funnel_uv_m.rename({0: '1-->2', 1: '2-->3', 2: '3-->4', 3: '4-->5', 4: '5-->6', 5: '6-->7'},
axis='index')
funnel_pv_single.index.name = '环节'
funnel_uv_single.index.name = '环节'
funnel_uv_m_single.index.name = '环节'
funnel_pv_single = funnel_pv_single.reset_index()
funnel_uv_single = funnel_uv_single.reset_index()
funnel_uv_m_single = funnel_uv_m_single.reset_index()
print(funnel_pv_single)
return funnel_pv_single, funnel_uv_single, funnel_uv_m_single
# 汇总转化率:本漏斗、本节点、所有天的求和。
def result_calculate_total(self):
result_data = self.get_data()
funnel_data = result_data[['funnel_name', 'step', 'pv', 'uv', 'uv_m']]
sum_pv = funnel_data['pv'].groupby([funnel_data['funnel_name'], funnel_data['step']]).sum()
sum_uv = funnel_data['uv'].groupby([funnel_data['funnel_name'], funnel_data['step']]).sum()
sum_uv_m = funnel_data['uv_m'].groupby([funnel_data['funnel_name'], funnel_data['step']]).sum()
sum_data = pd.DataFrame(pd.concat([sum_pv, sum_uv, sum_uv_m], axis=1)).reset_index()
funnel_names = sum_data['funnel_name'].drop_duplicates()
funnel_pv_total_rates = {}
funnel_uv_total_rates = {}
funnel_uv_m_total_rates = {}
for funnel_name in funnel_names:
data = sum_data[sum_data['funnel_name'] == funnel_name]
# pv计算:
pv_rates = []
for i in range(1, 7):
t1 = data[data['step'] == i+1]['pv']
t2 = data[data['step'] == 1]['pv']
if (len(t1.values) != 0):
pv_rate = float(t1) / float(t2)
pv_rates.append(round(pv_rate, 6))
else:
pv_rate = 0
pv_rates.append(pv_rate)
funnel_pv_total_rates[funnel_name] = pv_rates
# uv计算:
uv_rates = []
for i in range(1, 7):
t1 = data[data['step'] == i + 1]['uv']
t2 = data[data['step'] == 1]['uv']
if (len(t1.values) != 0):
uv_rate = float(t1) / float(t2)
uv_rates.append(round(uv_rate, 6))
else:
uv_rate = 0
uv_rates.append(uv_rate)
funnel_uv_total_rates[funnel_name] = uv_rates
# uv_m计算:
uv_m_rates = []
for i in range(1, 7):
t1 = data[data['step'] == i + 1]['uv_m']
t2 = data[data['step'] == 1]['uv_m']
if (len(t1.values) != 0):
uv_m_rate = float(t1) / float(t2)
uv_m_rates.append(round(uv_m_rate, 6))
else:
uv_m_rate = 0
uv_m_rates.append(uv_m_rate)
funnel_uv_m_total_rates[funnel_name] = uv_m_rates
funnel_pv = pd.DataFrame(funnel_pv_total_rates)
funnel_uv = pd.DataFrame(funnel_uv_total_rates)
funnel_uv_m = pd.DataFrame(funnel_uv_m_total_rates)
funnel_pv_total = funnel_pv.rename({0: '1-->2', 1: '1-->3', 2: '1-->4', 3: '1-->5', 4: '1-->6', 5: '1-->7'},
axis='index')
funnel_uv_total = funnel_uv.rename({0: '1-->2', 1: '1-->3', 2: '1-->4', 3: '1-->5', 4: '1-->6', 5: '1-->7'},
axis='index')
funnel_uv_m_total = funnel_uv_m.rename({0: '1-->2', 1: '1-->3', 2: '1-->4', 3: '1-->5', 4: '1-->6', 5: '1-->7'},
axis='index')
funnel_pv_total.index.name = '环节'
funnel_uv_total.index.name = '环节'
funnel_uv_m_total.index.name = '环节'
funnel_pv_total = funnel_pv_total.reset_index()
funnel_uv_total = funnel_uv_total.reset_index()
funnel_uv_m_total = funnel_uv_m_total.reset_index()
return funnel_pv_total, funnel_uv_total, funnel_uv_m_total
# 单个漏斗的数据:本漏斗、本节点、所有天的求和。
def data_combine_funnel(self, funnel_name):
funnel_pv_total, funnel_uv_total, funnel_uv_m_total = self.result_calculate_total()
funnel_pv_single, funnel_uv_single, funnel_uv_m_single = self.result_calculate_single()
funnel_index = funnel_pv_single['环节']
funnel_pv_t = funnel_pv_total[funnel_name]
funnel_uv_t = funnel_uv_total[funnel_name]
funnel_uv_m_t = funnel_uv_m_total[funnel_name]
funnel_pv_s = funnel_pv_single[funnel_name]
funnel_uv_s = funnel_uv_single[funnel_name]
funnel_uv_m_s = funnel_uv_m_single[funnel_name]
funnel_pv = pd.concat([funnel_index, funnel_pv_s, funnel_pv_t], axis=1)
funnel_pv.columns = ['环节', '单一环节转化率', '总体转化率']
funnel_uv = pd.concat([funnel_index, funnel_uv_t, funnel_uv_s], axis=1)
funnel_uv.columns = ['环节', '单一环节转化率', '总体转化率']
funnel_uv_m = pd.concat([funnel_index, funnel_uv_m_t, funnel_uv_m_s], axis=1)
funnel_uv_m.columns = ['环节', '单一环节转化率', '总体转化率']
return funnel_pv, funnel_uv, funnel_uv_m
4)数据的绘图代码:Funnel_plot
#!/usr/bin/env python
# _*_ UTF-8 _*_
from pyecharts import Funnel, Page, Line
from Funnel_livan import Funnel_utils
import numpy as np
import os
class Funnel_plot(object):
def __init__(self, name, data=[]):
self.name = name
self.data = data
def draw_plot(self):
funnel_name = self.name
funnels = self.data
page = Page()
for funnel in funnels:
funnel_list = funnel['环节'].tolist()
funnel_l_total = (np.array(funnel.ix[:, [1]]) * 100).tolist()
funnel_plot = Funnel('%s' % funnel_name,
width=800,
height=400,
title_pos='center')
funnel_plot.add(name=funnel_name, # 指定图例名称
attr=funnel_list, # 指定属性名称
value=funnel_l_total, # 指定漏斗所对应的值
is_label_show=True, # 指定标签是否显示
label_formatter='{c}%', # 指定标签显示的格式
label_pos="inside", # 指定标签的位置
legend_orient='vertical', # 指定图例的方向
legend_pos='left', # 指定图例的位置
is_legend_show=True) # 指定图例是否显示
has_files = os.path.exists(funnel_name)
if not has_files:
os.mkdir('./' + funnel_name)
funnel_plot.render(path='./%s/%s.gif' % (funnel_name, funnel_name))
page.add(funnel_plot)
page.render("./plots/%s.html" % funnel_name)
return page
def check_plot(self):
funnel_name = self.name
funnels = self.data
page = Page()
for funnel in funnels:
funnel_n = funnel['环节'].tolist()
# 横轴:
funnel_c = funnel.columns.tolist()
attr = []
for i in range(1, len(funnel_c)):
attr.append(str(funnel_c[i]))
# 取值:
v1 = funnel[funnel['环节'] == funnel_n[0]][:].filter(regex="[^环节]").iloc[0, :].tolist()
v2 = funnel[funnel['环节'] == funnel_n[1]][:].filter(regex="[^环节]").iloc[0, :].tolist()
v3 = funnel[funnel['环节'] == funnel_n[2]][:].filter(regex="[^环节]").iloc[0, :].tolist()
v4 = funnel[funnel['环节'] == funnel_n[3]][:].filter(regex="[^环节]").iloc[0, :].tolist()
v5 = funnel[funnel['环节'] == funnel_n[4]][:].filter(regex="[^环节]").iloc[0, :].tolist()
v6 = funnel[funnel['环节'] == funnel_n[5]][:].filter(regex="[^环节]").iloc[0, :].tolist()
mov_mean1 = Funnel_utils.flow_means(v1)
mov_mean2 = Funnel_utils.flow_means(v2)
mov_mean3 = Funnel_utils.flow_means(v3)
mov_mean4 = Funnel_utils.flow_means(v4)
mov_mean5 = Funnel_utils.flow_means(v5)
mov_mean6 = Funnel_utils.flow_means(v6)
mov_mean_up1 = []
mov_mean_down1 = []
mov_mean_up2 = []
mov_mean_down2 = []
mov_mean_up3 = []
mov_mean_down3 = []
mov_mean_up4 = []
mov_mean_down4 = []
mov_mean_up5 = []
mov_mean_down5 = []
mov_mean_up6 = []
mov_mean_down6 = []
for i in range(len(v1)):
mov_mean_up1.append(mov_mean1[i]*1.1)
mov_mean_down1.append(mov_mean1[i]*0.9)
mov_mean_up2.append(mov_mean2[i] * 1.1)
mov_mean_down2.append(mov_mean2[i] * 0.9)
mov_mean_up3.append(mov_mean3[i]*1.1)
mov_mean_down3.append(mov_mean3[i]*0.9)
mov_mean_up4.append(mov_mean4[i]*1.1)
mov_mean_down4.append(mov_mean4[i]*0.9)
mov_mean_up5.append(mov_mean5[i]*1.1)
mov_mean_down5.append(mov_mean5[i]*0.9)
mov_mean_up6.append(mov_mean6[i]*1.1)
mov_mean_down6.append(mov_mean6[i]*0.9)
has_files = os.path.exists(funnel_name)
if not has_files:
os.mkdir('./' + funnel_name)
line1 = Line("%s pv_total转化率" % funnel_name)
line1.add("平均下限", attr, mov_mean_down1,
yaxis_min=0,
yaxis_max="dataMax",
yaxis_name='转化率',
is_yaxis_show=True,
is_stack=True,
is_label_show=True)
line1.add(funnel_n[0], attr, v1,
yaxis_min=0,
yaxis_max="dataMax",
yaxis_name='转化率',
is_yaxis_show=True,
is_stack=True,
is_label_show=True)
line1.add("平均上限", attr, mov_mean_up1,
yaxis_min=0,
yaxis_max="dataMax",
yaxis_name='转化率',
is_yaxis_show=True,
is_stack=True,
is_label_show=True)
line1.render(path='./%s/pv_total转化率.gif' % funnel_name)
page.add(line1)
line2 = Line("%s uv_total转化率" % funnel_name)
line2.add("平均下限", attr, mov_mean_down2,
yaxis_min=0,
yaxis_max="dataMax",
yaxis_name='转化率',
is_yaxis_show=True,
is_stack=True,
is_label_show=True)
line2.add(funnel_n[1], attr, v2,
yaxis_min=0,
yaxis_max="dataMax",
yaxis_name='转化率',
is_yaxis_show=True,
is_stack=True,
is_label_show=True)
line2.add("平均上限", attr, mov_mean_up2,
yaxis_min=0,
yaxis_max="dataMax",
yaxis_name='转化率',
is_yaxis_show=True,
is_stack=True,
is_label_show=True)
line2.render(path='./%s/uv_total转化率.gif' % funnel_name)
page.add(line2)
line3 = Line("%s uv_m_total转化率" % funnel_name)
line3.add("平均下限", attr, mov_mean_down3,
yaxis_min=0,
yaxis_max="dataMax",
yaxis_name='转化率',
is_yaxis_show=True,
is_stack=True,
is_label_show=True)
line3.add(funnel_n[2], attr, v3,
yaxis_min=0,
yaxis_max="dataMax",
yaxis_name='转化率',
is_yaxis_show=True,
is_stack=True,
is_label_show=True)
line3.add("平均上限", attr, mov_mean_up3,
yaxis_min=0,
yaxis_max="dataMax",
yaxis_name='转化率',
is_yaxis_show=True,
is_stack=True,
is_label_show=True)
line3.render(path='./%s/uv_m_total转化率.gif' % funnel_name)
page.add(line3)
line4 = Line("%s pv_single转化率" % funnel_name)
line4.add("平均下限", attr, mov_mean_down4,
yaxis_min=0,
yaxis_max="dataMax",
yaxis_name='转化率',
is_yaxis_show=True,
is_stack=True,
is_label_show=True)
line4.add(funnel_n[3], attr, v4,
yaxis_min=0,
yaxis_max="dataMax",
yaxis_name='转化率',
is_yaxis_show=True,
is_stack=True,
is_label_show=True)
line4.add("平均上限", attr, mov_mean_up4,
yaxis_min=0,
yaxis_max="dataMax",
yaxis_name='转化率',
is_yaxis_show=True,
is_stack=True,
is_label_show=True)
line4.render(path='./%s/pv_single转化率.gif' % funnel_name)
page.add(line4)
line5 = Line("%s uv_single转化率" % funnel_name)
line5.add("平均下限", attr, mov_mean_down5,
yaxis_min=0,
yaxis_max="dataMax",
yaxis_name='转化率',
is_yaxis_show=True,
is_stack=True,
is_label_show=True)
line5.add(funnel_n[4], attr, v5,
yaxis_min=0,
yaxis_max="dataMax",
yaxis_name='转化率',
is_yaxis_show=True,
is_stack=True,
is_label_show=True)
line5.add("平均上限", attr, mov_mean_up5,
yaxis_min=0,
yaxis_max="dataMax",
yaxis_name='转化率',
is_yaxis_show=True,
is_stack=True,
is_label_show=True)
line5.render(path='./%s/uv_single转化率.gif' % funnel_name)
page.add(line5)
line6 = Line("%s uv_m_single转化率" % funnel_name)
line6.add("平均下限", attr, mov_mean_down6,
yaxis_min=0,
yaxis_max="dataMax",
yaxis_name='转化率',
is_yaxis_show=True,
is_stack=True,
is_label_show=True)
line6.add(funnel_n[5], attr, v6,
yaxis_min=0,
yaxis_max="dataMax",
yaxis_name='转化率',
is_yaxis_show=True,
is_stack=True,
is_label_show=True)
line6.add("平均上限", attr, mov_mean_up6,
yaxis_min=0,
yaxis_max="dataMax",
yaxis_name='转化率',
is_yaxis_show=True,
is_stack=True,
is_label_show=True)
line6.render(path='./%s/uv_m_single转化率.gif' % funnel_name)
page.add(line6)
page.render("./check/%s.html" % funnel_name)
5)转化率的监督平台:Funnel_check
#!/usr/bin/env python
# _*_ UTF-8 _*_
import pandas as pd
# 将某一个漏斗每天的转化率统计成一个点,转化成趋势图,然后呈现在plot上;
# 输入的是每天的转化率,横轴是时间,纵轴是转化率。
# 以漏斗为单位,一个漏斗构建一个检测图,一个漏斗分为最多六个步骤,
# 获取一个漏斗一段时间的转化率,得出对应的趋势图
class Funnel_check(object):
def __init__(self, path):
self.path = path
def get_data(self):
path = self.path
result_data = pd.read_excel(path, dtype='O').fillna('nan')
return result_data
def get_funnels_name(self):
path = self.path
result_data = pd.read_excel(path, dtype='O').fillna('nan')
funnel_names = result_data['funnel_name'].drop_duplicates().tolist()
return funnel_names
# 计算本漏斗、本节点、每一天的漏斗。
# 单步转化率:本漏斗、本节点、每一天的求和。
def day_result_calculate_single(self):
result_data = self.get_data()
funnel_data = result_data[['funnel_name', 'step', 'datetime', 'pv', 'uv', 'uv_m']]
sum_pv = funnel_data['pv'].groupby([funnel_data['funnel_name'],
funnel_data['step'],
funnel_data['datetime']]).sum()
sum_uv = funnel_data['uv'].groupby([funnel_data['funnel_name'],
funnel_data['step'],
funnel_data['datetime']]).sum()
sum_uv_m = funnel_data['uv_m'].groupby([funnel_data['funnel_name'],
funnel_data['step'],
funnel_data['datetime']]).sum()
sum_data = pd.DataFrame(pd.concat([sum_pv, sum_uv, sum_uv_m], axis=1)).reset_index()
funnel_names = sum_data['funnel_name'].drop_duplicates()
date_times = sum_data['datetime'].drop_duplicates()
funnel_pv_rates = {}
funnel_uv_rates = {}
funnel_uv_m_rates = {}
for funnel_name in funnel_names:
day_funnel_pv_rates = {}
day_funnel_uv_rates = {}
day_funnel_uv_m_rates = {}
for datetime in date_times:
data = sum_data[(sum_data['funnel_name'] == funnel_name) & (sum_data['datetime'] == datetime)]
# pv计算:
pv_rates = []
for i in range(1, 7):
t1 = data[data['step'] == i + 1]['pv']
t2 = data[data['step'] == i]['pv']
if (len(t1.values) != 0):
pv_rate = float(t1) / float(t2)
pv_rates.append(round(pv_rate, 6))
else:
pv_rate = 0
pv_rates.append(pv_rate)
day_funnel_pv_rates[datetime] = pv_rates
# uv计算:
uv_rates = []
for i in range(1, 7):
t1 = data[data['step'] == i + 1]['uv']
t2 = data[data['step'] == i]['uv']
if (len(t1.values) != 0):
uv_rate = float(t1) / float(t2)
uv_rates.append(round(uv_rate, 6))
else:
uv_rate = 0
uv_rates.append(uv_rate)
day_funnel_uv_rates[datetime] = uv_rates
# uv_m计算:
uv_m_rates = []
for i in range(1, 7):
t1 = data[data['step'] == i + 1]['uv_m']
t2 = data[data['step'] == i]['uv_m']
if (len(t1.values) != 0):
uv_m_rate = float(t1) / float(t2)
uv_m_rates.append(round(uv_m_rate, 6))
else:
uv_m_rate = 0
uv_m_rates.append(uv_m_rate)
day_funnel_uv_m_rates[datetime] = uv_m_rates
funnel_pv_rates[funnel_name] = day_funnel_pv_rates
funnel_uv_rates[funnel_name] = day_funnel_uv_rates
funnel_uv_m_rates[funnel_name] = day_funnel_uv_m_rates
funnel_pv_singles = {}
funnel_uv_singles = {}
funnel_uv_m_singles = {}
for day_funnel_pv_rate in funnel_pv_rates.keys():
funnel_pv = pd.DataFrame(funnel_pv_rates[day_funnel_pv_rate])
funnel_pv_single = funnel_pv.rename(
{0: '1-->2', 1: '2-->3', 2: '3-->4', 3: '4-->5', 4: '5-->6', 5: '6-->7'},
axis='index')
funnel_pv_single.index.name = '环节'
funnel_pv_single = funnel_pv_single.reset_index()
funnel_pv_singles[day_funnel_pv_rate] = funnel_pv_single
for day_funnel_uv_rate in funnel_uv_rates.keys():
funnel_uv = pd.DataFrame(funnel_uv_rates[day_funnel_uv_rate])
funnel_uv_single = funnel_uv.rename(
{0: '1-->2', 1: '2-->3', 2: '3-->4', 3: '4-->5', 4: '5-->6', 5: '6-->7'},
axis='index')
funnel_uv_single.index.name = '环节'
funnel_uv_single = funnel_uv_single.reset_index()
funnel_uv_singles[day_funnel_uv_rate] = funnel_uv_single
for day_funnel_uv_m_rate in funnel_uv_m_rates.keys():
funnel_uv_m = pd.DataFrame(funnel_uv_m_rates[day_funnel_uv_m_rate])
funnel_uv_m_single = funnel_uv_m.rename(
{0: '1-->2', 1: '2-->3', 2: '3-->4', 3: '4-->5', 4: '5-->6', 5: '6-->7'},
axis='index')
funnel_uv_m_single.index.name = '环节'
funnel_uv_m_single = funnel_uv_m_single.reset_index()
funnel_uv_m_singles[day_funnel_uv_m_rate] = funnel_uv_m_single
return funnel_pv_singles, funnel_uv_singles, funnel_uv_m_singles
# 汇总转化率:本漏斗、本节点、每一天的求和。
def day_result_calculate_total(self):
result_data = self.get_data()
funnel_data = result_data[['funnel_name', 'step', 'datetime', 'pv', 'uv', 'uv_m']]
sum_pv = funnel_data['pv'].groupby([funnel_data['funnel_name'],
funnel_data['step'],
funnel_data['datetime']]).sum()
sum_uv = funnel_data['uv'].groupby([funnel_data['funnel_name'],
funnel_data['step'],
funnel_data['datetime']]).sum()
sum_uv_m = funnel_data['uv_m'].groupby([funnel_data['funnel_name'],
funnel_data['step'],
funnel_data['datetime']]).sum()
sum_data = pd.DataFrame(pd.concat([sum_pv, sum_uv, sum_uv_m], axis=1)).reset_index()
funnel_names = sum_data['funnel_name'].drop_duplicates()
date_times = sum_data['datetime'].drop_duplicates()
funnel_pv_total_rates = {}
funnel_uv_total_rates = {}
funnel_uv_m_total_rates = {}
for funnel_name in funnel_names:
day_funnel_pv_total_rates = {}
day_funnel_uv_total_rates = {}
day_funnel_uv_m_total_rates = {}
for datetime in date_times:
data = sum_data[(sum_data['funnel_name'] == funnel_name) & (sum_data['datetime'] == datetime)]
# pv计算:
pv_rates = []
for i in range(1, 7):
t1 = data[data['step'] == i + 1]['pv']
t2 = data[data['step'] == 1]['pv']
if (len(t1.values) != 0):
pv_rate = float(t1) / float(t2)
pv_rates.append(round(pv_rate, 6))
else:
pv_rate = 0
pv_rates.append(pv_rate)
day_funnel_pv_total_rates[datetime] = pv_rates
# uv计算:
uv_rates = []
for i in range(1, 7):
t1 = data[data['step'] == i + 1]['uv']
t2 = data[data['step'] == 1]['uv']
if (len(t1.values) != 0):
uv_rate = float(t1) / float(t2)
uv_rates.append(round(uv_rate, 6))
else:
uv_rate = 0
uv_rates.append(uv_rate)
day_funnel_uv_total_rates[datetime] = uv_rates
# uv_m计算:
uv_m_rates = []
for i in range(1, 7):
t1 = data[data['step'] == i + 1]['uv_m']
t2 = data[data['step'] == 1]['uv_m']
if (len(t1.values) != 0):
uv_m_rate = float(t1) / float(t2)
uv_m_rates.append(round(uv_m_rate, 6))
else:
uv_m_rate = 0
uv_m_rates.append(uv_m_rate)
day_funnel_uv_m_total_rates[datetime] = uv_m_rates
funnel_pv_total_rates[funnel_name] = day_funnel_pv_total_rates
funnel_uv_total_rates[funnel_name] = day_funnel_uv_total_rates
funnel_uv_m_total_rates[funnel_name] = day_funnel_uv_m_total_rates
funnel_pv_totals = {}
funnel_uv_totals = {}
funnel_uv_m_totals = {}
for day_funnel_pv_rate in funnel_pv_total_rates.keys():
funnel_pv = pd.DataFrame(funnel_pv_total_rates[day_funnel_pv_rate])
funnel_pv_total = funnel_pv.rename(
{0: '1-->2', 1: '2-->3', 2: '3-->4', 3: '4-->5', 4: '5-->6', 5: '6-->7'},
axis='index')
funnel_pv_total.index.name = '环节'
funnel_pv_total = funnel_pv_total.reset_index()
funnel_pv_totals[day_funnel_pv_rate] = funnel_pv_total
for day_funnel_uv_rate in funnel_uv_total_rates.keys():
funnel_uv = pd.DataFrame(funnel_uv_total_rates[day_funnel_uv_rate])
funnel_uv_total = funnel_uv.rename(
{0: '1-->2', 1: '2-->3', 2: '3-->4', 3: '4-->5', 4: '5-->6', 5: '6-->7'},
axis='index')
funnel_uv_total.index.name = '环节'
funnel_uv_total = funnel_uv_total.reset_index()
funnel_uv_totals[day_funnel_uv_rate] = funnel_uv_total
for day_funnel_uv_m_rate in funnel_uv_m_total_rates.keys():
funnel_uv_m = pd.DataFrame(funnel_uv_m_total_rates[day_funnel_uv_m_rate])
funnel_uv_m_total = funnel_uv_m.rename(
{0: '1-->2', 1: '2-->3', 2: '3-->4', 3: '4-->5', 4: '5-->6', 5: '6-->7'},
axis='index')
funnel_uv_m_total.index.name = '环节'
funnel_uv_m_total = funnel_uv_m_total.reset_index()
funnel_uv_m_totals[day_funnel_uv_m_rate] = funnel_uv_m_total
return funnel_pv_totals, funnel_uv_totals, funnel_uv_m_totals
# 单个漏斗的数据:本漏斗、本节点、每一天的求和。
def day_data_combine_funnel(self, funnel_name):
funnel_pv_totals, funnel_uv_totals, funnel_uv_m_totals = self.day_result_calculate_total()
funnel_pv_singles, funnel_uv_singles, funnel_uv_m_singles = self.day_result_calculate_single()
# 会产生六个字典:
funnel_data = []
funnel_pv_total = funnel_pv_totals[funnel_name]
funnel_uv_total = funnel_uv_totals[funnel_name]
funnel_uv_m_total = funnel_uv_m_totals[funnel_name]
funnel_pv_single = funnel_pv_singles[funnel_name]
funnel_uv_single = funnel_uv_singles[funnel_name]
funnel_uv_m_single = funnel_uv_m_singles[funnel_name]
funnel_data.append(funnel_pv_total)
funnel_data.append(funnel_uv_total)
funnel_data.append(funnel_uv_m_total)
funnel_data.append(funnel_pv_single)
funnel_data.append(funnel_uv_single)
funnel_data.append(funnel_uv_m_single)
return funnel_data
6)主函数main:
#!/usr/bin/env python
# _*_ UTF-8 _*_
from Funnel_livan import Funnel_build, Funnel_plot, Funnel_data, Funnel_check
if __name__ == '__main__':
# 1、sql拼写过程。
# path = '/Users/livan/PycharmProjects/offices/data/train_data.xlsx'
# funnels = Funnel_build.Funnel_build(path=path)
# sqls = funnels.built_sql()
# with open('sqls.sql', 'w+') as f:
# f.write(sqls)
# 2、获取数据,计算转化率,求所有天的漏斗:
# path = '/Users/livan/PycharmProjects/offices/data/result_data.xlsx'
# result = Funnel_data.Funnel_data(path=path)
# funnels_name = result.get_funnels_name()
# for i in range(len(funnels_name)):
# funnel_pv, funnel_uv, funnel_uv_m = result.data_combine_funnel(funnels_name[i])
# # 经过上一步共生成六组需要计算的漏斗:
# funnel_pv_s = funnel_pv[['环节', '单一环节转化率']]
# funnel_uv_s = funnel_uv[['环节', '单一环节转化率']]
# funnel_uv_m_s = funnel_uv_m[['环节', '单一环节转化率']]
# funnel_pv_t = funnel_pv[['环节', '总体转化率']]
# funnel_uv_t = funnel_uv[['环节', '总体转化率']]
# funnel_uv_m_t = funnel_uv_m[['环节', '总体转化率']]
# funnel_d = [funnel_pv_s,
# funnel_uv_s,
# funnel_uv_m_s,
# funnel_pv_t,
# funnel_uv_t,
# funnel_uv_m_t]
# Funnel_plot.Funnel_plot(funnels_name[i], funnel_d).draw_plot()
# 3、转化率趋势分析,求每天的漏斗:
path = '/Users/livan/PycharmProjects/offices/data/result_data.xlsx'
result = Funnel_check.Funnel_check(path=path)
funnels_name = result.get_funnels_name()
for i in range(len(funnels_name)):
funnel_data = result.day_data_combine_funnel(funnels_name[i])
Funnel_plot.Funnel_plot(funnels_name[i], funnel_data).check_plot()
生成的HQL为:
SELECT
'login' funnel_name,'20190203' funnel_time_start,
'20190305' funnel_time_end,'login_page' funnel_node_name,'1' step,
'page_level' event_type,'页面级别' event_name,'5' action,
'page_login_1' page_id,'login_page' page_name,'nan' product_id,
'nan' inner_id,'nan' push_id,'nan' last_page_id,
'nan' label_id,'nan' label,'nan' context_id,'nan' context,
t.dt datetime,count(1) pv,
size(collect_set(t.becif_no)) uv,
count(distinct t.mid) uv_m
FROM mid.tracker_action_event t
WHERE
t.dt='20190203'
and t.page_id='page_login_1'
union all……
文中的图形展示使用的pyecharts,方便好用,下面会做一些相应的介绍。