网站分析01——转化率分析(漏斗构建)

笔者是一个痴迷于挖掘数据中的价值的学习人,希望在平日的工作学习中,挖掘数据的价值,找寻数据的秘密,笔者认为,数据的价值不仅仅只体现在企业中,个人也可以体会到数据的魅力,用技术力量探索行为密码,让大数据助跑每一个人,欢迎直筒们关注我的公众号,大家一起讨论数据中的那些有趣的事情。

我的公众号为:livandata

网站的分析是数据分析师的一项常用技能,主要是通过了解客户、网站、业务逻辑构建以套完整的指标体系,协助分析师了解网站的运营情况,这一分析可以协助产品经理熟悉网站不同功能的使用情况,协助营销人员了解哪一类客户喜欢什么样的产品,协助运营人员了解流程上的不合理,总之,在现在情况下,网站分析为企业决策人员了解企业提供了一个线上的全量数据集。

网站分析可以分为:日志分析、流量分析、客群分析等;

常用的指标有:

1)页面停留时长;

2)跳出率/退出率;

3)页面深度分析;

4)唯一身份浏览量;

5)访问者/唯一身份访问者;

6)访问频率/新老客群区分;

7)曝光量/曝光达标率/唯一曝光量/曝光频率;

8)点击量/点击达标率/唯一点击量/点击频率;

9)曝光点击率;

10)转化率;

11)综合浏览量(PV)/独立访问用户量(UV)。

由于近期项目需要,构建了一套转化率分析代码,用户漏斗构建,本文先对转化率进行分析,后面会逐渐的分析剩下的各个指标。

项目的起源是统计某APP中,用户在每个页面上的转化率,常用的思路为先计算每个页面的PV和UV,然后计算一个页面访问流程中不同节点的页面的转化率。

即:PV转化率=PV(第i+1节点)/PV(第i节点)。

       UV转化率=UV(第i+1节点)/UV(第i节点)。

文中由于不确定UV的统计指标,所有构建了两个指标(UV和UV_M),思路上没有影响,后续比较哪个更准一点即可。

在项目进展过程中遇到了一些问题,即python的运算平台与数据平台,无法实现网络联通,请教了公司的前辈,确定了如下思路,在此项目中曲线救国,通过代码拼写出HQL语言,通过文件手动传入到数据平台上,每天运行HQL,将数据结果存储在csv的文件中,再手动将数据传入到运算平台,计算出转化率,并通过移动平滑构建转化率监控平台。

因此,项目中的流程为:

1)整理需要构建漏斗的流程,按照固定的格式存入到excel文件中;

2)按照excel中的内容格式拼写HQL;

3)将生成的HQL传到数据平台,定时抽取每天、每个漏斗、每个节点的PV/UV值;

4)将数据传入到运算平台,汇总节点,计算他们对应的PV/UV转化率;

5)将转化率呈现在漏斗中;

6)将每天转化率的值传入到监控平台,按天监控每个流程的转化率。

用到的漏斗模版为:

funnel_name funnel_time_start funnel_time_end funnel_node_name step event_type event_name action page_id page_name product_id inner_id push_id last_page_id label_id label context_id context datetime
login 20190203 20190305 login_page 1 page_level 页面级别 5 page_login_1 login_page                 20190203
login 20190203 20190305 login_label 2 label_level 区域级别 5 page_login_1 login_page         label_login_1 login_label     20190203
print 20190203 20190305 print_page 1 page_level 页面级别 5 page_print_1 print_page                 20190203
print 20190203 20190305 print_page 1 page_level 页面级别 5 page_print_2 print_page                 20190203
print 20190203 20190305 print_page 1 page_level 页面级别 5 page_print_3 print_page                 20190203
print 20190203 20190305 print_label 2 label_level 区域级别 5 page_print_2 print_page       page_print_1 label_print print_label     20190203
print 20190203 20190305 print_label 2 label_level 区域级别 5 page_print_4 print_page       page_print_1 label_print print_label     20190203

关于数据的解读:

1)页面流程有三个级别:页面级别、区域级别和事件级别。

2)同一个页面中有多个区域区、同一个区域区有多个事件,用户可以操作其中的任意节点,用户操作区域区时页面首先有值,操作事件区时前两个也有值。

3)由于数据量比较大,数据平台将数据切片成按天的数据,每天一个HQL,统计数据,然后合并。

对应的代码为:

1)funnel_build文件:

#!/usr/bin/env python
# _*_ UTF-8 _*_

import pandas as pd
import datetime
# 构建漏斗类:
class Funnel_build(object):
    def __init__(self, path):
        self.path = path

    def get_data(self):
        path = self.path
        funnel_data = pd.read_excel(path, dtype='O').fillna('nan')
        return funnel_data

    def get_sql(self,
                funnel_name,
                funnel_time_start,
                funnel_time_end,
                funnel_node_name,
                step,
                event_type,
                event_name,
                action,
                page_id,
                page_name,
                product_id,
                inner_id,
                push_id,
                last_page_id,
                label_id,
                label,
                context_id,
                context,
                cond):
        sql = 'SELECT' \
              '\n\t %s funnel_name,' \
              '\n\t %s funnel_time_start,' \
              '\n\t %s funnel_time_end,' \
              '\n\t %s funnel_node_name,' \
              '\n\t %s step,' \
              '\n\t %s event_type,' \
              '\n\t %s event_name,' \
              '\n\t %s action,' \
              '\n\t %s page_id,' \
              '\n\t %s page_name,' \
              '\n\t %s product_id,' \
              '\n\t %s inner_id,' \
              '\n\t %s push_id,' \
              '\n\t %s last_page_id,' \
              '\n\t %s label_id,' \
              '\n\t %s label,' \
              '\n\t %s context_id,' \
              '\n\t %s context,' \
              '\n\t t.dt datetime,' \
              '\n\t count(1) pv,' \
              '\n\t size(collect_set(t.becif_no)) uv,' \
              '\n\t count(distinct t.mid) uv_m' \
              '\nFROM mid.tracker_action_event t' \
              '\nWHERE' \
              '\n\t %s' % ("'"+str(funnel_name)+"'",
                           "'"+str(funnel_time_start)+"'",
                           "'"+str(funnel_time_end)+"'",
                           "'"+str(funnel_node_name)+"'",
                           "'"+str(step)+"'",
                           "'"+str(event_type)+"'",
                           "'"+str(event_name)+"'",
                           "'"+str(action)+"'",
                           "'"+str(page_id)+"'",
                           "'"+str(page_name)+"'",
                           "'"+str(product_id)+"'",
                           "'"+str(inner_id)+"'",
                           "'"+str(push_id)+"'",
                           "'"+str(last_page_id)+"'",
                           "'"+str(label_id)+"'",
                           "'"+str(label)+"'",
                           "'"+str(context_id)+"'",
                           "'"+str(context)+"'",
                           cond)
        return sql

    def add_times(self, funnel_time_start, funnel_time_end):
        period = []
        # 字符串转换为datetime类型
        times1 = datetime.datetime.strptime(str(funnel_time_start), '%Y%m%d')
        times2 = datetime.datetime.strptime(str(funnel_time_end), '%Y%m%d')
        # 利用datetime计算时间差并格式化输出
        times = str(times2 - times1).split(',')
        times = times[0].split(' ')
        for j in range(0, int(times[0])):
            delta = datetime.timedelta(days=j)
            next_day = times1 + delta
            next_day = str(next_day).split(' ')[0]
            next_day = next_day.split('-')
            next_day = next_day[0] + next_day[1] + next_day[2]
            period.append(next_day)
        return period

    def join_sqls(self, sql_list):
        j0 = '\n\nunion all\n'
        return j0.join(sql_list)

    def built_sql(self):
        sqls = []
        funnel_data = self.get_data()
        for i in range(len(funnel_data)):
            if((funnel_data['page_id'][i] != 'nan')&
                    (funnel_data['label_id'][i] == 'nan')&
                    (funnel_data['context_id'][i] == 'nan')):
                cond="\nand\tt.page_id='%s'" %(funnel_data['page_id'][i])
            if ((funnel_data['page_id'][i] != 'nan')&
                    (funnel_data['label_id'][i] != 'nan')&
                    (funnel_data['context_id'][i] == 'nan')):
                cond1 = "\nand\tt.page_id='%s'" % (funnel_data['page_id'][i])
                cond2 = "\nand\tt.label_id='%s'" % (funnel_data['label_id'][i])
                cond = cond1 + '\t' + cond2
            if ((funnel_data['page_id'][i] != 'nan')&
                    (funnel_data['label_id'][i] != 'nan')&
                    (funnel_data['context_id'][i] != 'nan')):
                cond1 = "\nand\tt.page_id='%s'" % (funnel_data['page_id'][i])
                cond2 = "\nand\tt.label_id='%s'" % (funnel_data['label_id'][i])
                cond3 = "\nand\tt.name_id='%s'" % (funnel_data['context_id'][i])
                cond = cond1 + '\t' + cond2 + '\t' + cond3
            cond_time = self.add_times(funnel_data['funnel_time_start'][i],
                                       funnel_data['funnel_time_end'][i])
            for j in range(len(cond_time)):
                cond_t = "t.dt='" + cond_time[j] + "'\t"
                conds = cond_t + cond
                sql = self.get_sql(
                    funnel_data['funnel_name'][i],
                    funnel_data['funnel_time_start'][i],
                    funnel_data['funnel_time_end'][i],
                    funnel_data['funnel_node_name'][i],
                    funnel_data['step'][i],
                    funnel_data['event_type'][i],
                    funnel_data['event_name'][i],
                    funnel_data['action'][i],
                    funnel_data['page_id'][i],
                    funnel_data['page_name'][i],
                    funnel_data['product_id'][i],
                    funnel_data['inner_id'][i],
                    funnel_data['push_id'][i],
                    funnel_data['last_page_id'][i],
                    funnel_data['label_id'][i],
                    funnel_data['label'][i],
                    funnel_data['context_id'][i],
                    funnel_data['context'][i],
                    conds)
                sqls.append(sql)
                sqls_total = self.join_sqls(sqls)
        return sqls_total

2)常规工具代码:Funnel_utils

#!/usr/bin/env python
# _*_ UTF-8 _*_
import pandas as pd
import numpy as np

def ch_dtype(df):
    dtype = dict(funnel_id=np.str, idx=np.int, page_id=np.str,
                 page_name=np.str, pv=np.int64, date_=np.str,
                 funnel_name=np.str, uv=np.int64, dt=np.str,
                 product_id=np.str, push_id=np.str, inner_id=np.str)
    return df.astype(dtype)

def write(string, path):
    with open(path, 'w') as f:
        f.write(string)

def write_excel(df, path):
    writer = pd.ExcelWriter(path)
    try:
        df.to_excel(writer)
    except Exception as e:
        raise e
    finally:
        writer.close()

def set_ch_font():
    return FontProperties(fname='../data/font/msyh.ttf')

def read_page_info():
    return pd.read_csv('info/page_info.csv', index_col = 0), set_index('page_id')

def write_csv(df, path):
    df.to_csv(path)

def read_funnel_info():
    return pd.read_table('../data/funnel_info.txt').fillna('nan')

def read_funnel_info_xls():
    return pd.read_excel('../data/funnel_infos.xlsx').fillna('nan')

def join_sqls(sql_list):
    j0 = '\n\nUNION ALL\n'
    return j0.join(sql_list)

def format_cols(df):
    df.columns = [c.split('.')[1] for c in df.columns]

def product(x):
    def _product(x, y):
        if x:
            if y:
                z = []
                for i in x.pop():
                    for k in y:
                        if isinstance(k, list):
                            ik = [i]
                            ik.extend(k)
                        else:
                            z.append([i, k])
                y = z
            else:
                y = x.pop()
            return _product(x, y)
        y = []
        return _product(x, y)

def flow_means(data):
    ms = []
    for i in range(len(data)):
        sums = 0
        for j in range(i, i + 10):
            if (j < len(data)):
                sums = sums + data[j]
            else:
                sums = sums + 0
        if (len(data) - i >= 10):
            means = sums / 10
        else:
            means = sums / (len(data) - i)
        ms.append(means)
    return ms

3)数据的整理代码为:Funnel_data

#!/usr/bin/env python
# _*_ UTF-8 _*_

import pandas as pd

class Funnel_data(object):
    def __init__(self, path):
        self.path = path

    def get_data(self):
        path = self.path
        result_data = pd.read_excel(path, dtype='O').fillna('nan')
        return result_data

    def get_funnels_name(self):
        path = self.path
        result_data = pd.read_excel(path, dtype='O').fillna('nan')
        funnel_names = result_data['funnel_name'].drop_duplicates().tolist()
        return funnel_names

    # 单步转化率:本漏斗、本节点、所有天的求和。
    def result_calculate_single(self):
        result_data = self.get_data()
        funnel_data = result_data[['funnel_name', 'step', 'pv', 'uv', 'uv_m']]
        sum_pv = funnel_data['pv'].groupby([funnel_data['funnel_name'], funnel_data['step']]).sum()
        sum_uv = funnel_data['uv'].groupby([funnel_data['funnel_name'], funnel_data['step']]).sum()
        sum_uv_m = funnel_data['uv_m'].groupby([funnel_data['funnel_name'], funnel_data['step']]).sum()
        sum_data = pd.DataFrame(pd.concat([sum_pv, sum_uv, sum_uv_m], axis=1)).reset_index()
        funnel_names = sum_data['funnel_name'].drop_duplicates()
        funnel_pv_rates = {}
        funnel_uv_rates = {}
        funnel_uv_m_rates = {}
        for funnel_name in funnel_names:
            data = sum_data[sum_data['funnel_name'] == funnel_name]
            # pv计算:
            pv_rates = []
            for i in range(1, 7):
                t1 = data[data['step'] == i+1]['pv']
                t2 = data[data['step'] == i]['pv']
                if(len(t1.values) != 0):
                    pv_rate = float(t1)/float(t2)
                    pv_rates.append(round(pv_rate, 6))
                else:
                    pv_rate = 0
                    pv_rates.append(pv_rate)
            funnel_pv_rates[funnel_name] = pv_rates
            # uv计算:
            uv_rates = []
            for i in range(1, 7):
                t1 = data[data['step'] == i + 1]['uv']
                t2 = data[data['step'] == i]['uv']
                if (len(t1.values) != 0):
                    uv_rate = float(t1) / float(t2)
                    uv_rates.append(round(uv_rate, 6))
                else:
                    uv_rate = 0
                    uv_rates.append(uv_rate)
            funnel_uv_rates[funnel_name] = uv_rates
            # uv_m计算:
            uv_m_rates = []
            for i in range(1, 7):
                t1 = data[data['step'] == i + 1]['uv_m']
                t2 = data[data['step'] == i]['uv_m']
                if (len(t1.values) != 0):
                    uv_m_rate = float(t1) / float(t2)
                    uv_m_rates.append(round(uv_m_rate, 6))
                else:
                    uv_m_rate = 0
                    uv_m_rates.append(uv_m_rate)
            funnel_uv_m_rates[funnel_name] = uv_m_rates
        funnel_pv = pd.DataFrame(funnel_pv_rates)
        funnel_uv = pd.DataFrame(funnel_uv_rates)
        funnel_uv_m = pd.DataFrame(funnel_uv_m_rates)
        funnel_pv_single = funnel_pv.rename({0: '1-->2', 1: '2-->3', 2: '3-->4', 3: '4-->5', 4: '5-->6', 5: '6-->7'},
                                     axis='index')
        funnel_uv_single = funnel_uv.rename({0: '1-->2', 1: '2-->3', 2: '3-->4', 3: '4-->5', 4: '5-->6', 5: '6-->7'},
                                     axis='index')
        funnel_uv_m_single = funnel_uv_m.rename({0: '1-->2', 1: '2-->3', 2: '3-->4', 3: '4-->5', 4: '5-->6', 5: '6-->7'},
                                         axis='index')

        funnel_pv_single.index.name = '环节'
        funnel_uv_single.index.name = '环节'
        funnel_uv_m_single.index.name = '环节'

        funnel_pv_single = funnel_pv_single.reset_index()
        funnel_uv_single = funnel_uv_single.reset_index()
        funnel_uv_m_single = funnel_uv_m_single.reset_index()
        print(funnel_pv_single)
        return funnel_pv_single, funnel_uv_single, funnel_uv_m_single
    # 汇总转化率:本漏斗、本节点、所有天的求和。
    def result_calculate_total(self):
        result_data = self.get_data()
        funnel_data = result_data[['funnel_name', 'step', 'pv', 'uv', 'uv_m']]
        sum_pv = funnel_data['pv'].groupby([funnel_data['funnel_name'], funnel_data['step']]).sum()
        sum_uv = funnel_data['uv'].groupby([funnel_data['funnel_name'], funnel_data['step']]).sum()
        sum_uv_m = funnel_data['uv_m'].groupby([funnel_data['funnel_name'], funnel_data['step']]).sum()
        sum_data = pd.DataFrame(pd.concat([sum_pv, sum_uv, sum_uv_m], axis=1)).reset_index()
        funnel_names = sum_data['funnel_name'].drop_duplicates()
        funnel_pv_total_rates = {}
        funnel_uv_total_rates = {}
        funnel_uv_m_total_rates = {}
        for funnel_name in funnel_names:
            data = sum_data[sum_data['funnel_name'] == funnel_name]
            # pv计算:
            pv_rates = []
            for i in range(1, 7):
                t1 = data[data['step'] == i+1]['pv']
                t2 = data[data['step'] == 1]['pv']
                if (len(t1.values) != 0):
                    pv_rate = float(t1) / float(t2)
                    pv_rates.append(round(pv_rate, 6))
                else:
                    pv_rate = 0
                    pv_rates.append(pv_rate)
            funnel_pv_total_rates[funnel_name] = pv_rates

            # uv计算:
            uv_rates = []
            for i in range(1, 7):
                t1 = data[data['step'] == i + 1]['uv']
                t2 = data[data['step'] == 1]['uv']
                if (len(t1.values) != 0):
                    uv_rate = float(t1) / float(t2)
                    uv_rates.append(round(uv_rate, 6))
                else:
                    uv_rate = 0
                    uv_rates.append(uv_rate)
            funnel_uv_total_rates[funnel_name] = uv_rates

            # uv_m计算:
            uv_m_rates = []
            for i in range(1, 7):
                t1 = data[data['step'] == i + 1]['uv_m']
                t2 = data[data['step'] == 1]['uv_m']
                if (len(t1.values) != 0):
                    uv_m_rate = float(t1) / float(t2)
                    uv_m_rates.append(round(uv_m_rate, 6))
                else:
                    uv_m_rate = 0
                    uv_m_rates.append(uv_m_rate)
            funnel_uv_m_total_rates[funnel_name] = uv_m_rates
        funnel_pv = pd.DataFrame(funnel_pv_total_rates)
        funnel_uv = pd.DataFrame(funnel_uv_total_rates)
        funnel_uv_m = pd.DataFrame(funnel_uv_m_total_rates)
        funnel_pv_total = funnel_pv.rename({0: '1-->2', 1: '1-->3', 2: '1-->4', 3: '1-->5', 4: '1-->6', 5: '1-->7'},
                                     axis='index')
        funnel_uv_total = funnel_uv.rename({0: '1-->2', 1: '1-->3', 2: '1-->4', 3: '1-->5', 4: '1-->6', 5: '1-->7'},
                                     axis='index')
        funnel_uv_m_total = funnel_uv_m.rename({0: '1-->2', 1: '1-->3', 2: '1-->4', 3: '1-->5', 4: '1-->6', 5: '1-->7'},
                                         axis='index')

        funnel_pv_total.index.name = '环节'
        funnel_uv_total.index.name = '环节'
        funnel_uv_m_total.index.name = '环节'

        funnel_pv_total = funnel_pv_total.reset_index()
        funnel_uv_total = funnel_uv_total.reset_index()
        funnel_uv_m_total = funnel_uv_m_total.reset_index()
        return funnel_pv_total, funnel_uv_total, funnel_uv_m_total
    # 单个漏斗的数据:本漏斗、本节点、所有天的求和。
    def data_combine_funnel(self, funnel_name):
        funnel_pv_total, funnel_uv_total, funnel_uv_m_total = self.result_calculate_total()
        funnel_pv_single, funnel_uv_single, funnel_uv_m_single = self.result_calculate_single()
        funnel_index = funnel_pv_single['环节']
        funnel_pv_t = funnel_pv_total[funnel_name]
        funnel_uv_t = funnel_uv_total[funnel_name]
        funnel_uv_m_t = funnel_uv_m_total[funnel_name]
        funnel_pv_s = funnel_pv_single[funnel_name]
        funnel_uv_s = funnel_uv_single[funnel_name]
        funnel_uv_m_s = funnel_uv_m_single[funnel_name]
        funnel_pv = pd.concat([funnel_index, funnel_pv_s, funnel_pv_t], axis=1)
        funnel_pv.columns = ['环节', '单一环节转化率', '总体转化率']
        funnel_uv = pd.concat([funnel_index, funnel_uv_t, funnel_uv_s], axis=1)
        funnel_uv.columns = ['环节', '单一环节转化率', '总体转化率']
        funnel_uv_m = pd.concat([funnel_index, funnel_uv_m_t, funnel_uv_m_s], axis=1)
        funnel_uv_m.columns = ['环节', '单一环节转化率', '总体转化率']
        return funnel_pv, funnel_uv, funnel_uv_m

4)数据的绘图代码:Funnel_plot

#!/usr/bin/env python
# _*_ UTF-8 _*_

from pyecharts import Funnel, Page, Line
from Funnel_livan import Funnel_utils
import numpy as np
import os

class Funnel_plot(object):
    def __init__(self, name, data=[]):
        self.name = name
        self.data = data

    def draw_plot(self):
        funnel_name = self.name
        funnels = self.data
        page = Page()
        for funnel in funnels:
            funnel_list = funnel['环节'].tolist()
            funnel_l_total = (np.array(funnel.ix[:, [1]]) * 100).tolist()
            funnel_plot = Funnel('%s' % funnel_name,
                                 width=800,
                                 height=400,
                                 title_pos='center')
            funnel_plot.add(name=funnel_name,  # 指定图例名称
                            attr=funnel_list,  # 指定属性名称
                            value=funnel_l_total,  # 指定漏斗所对应的值
                            is_label_show=True,  # 指定标签是否显示
                            label_formatter='{c}%',  # 指定标签显示的格式
                            label_pos="inside",  # 指定标签的位置
                            legend_orient='vertical',  # 指定图例的方向
                            legend_pos='left',  # 指定图例的位置
                            is_legend_show=True)  # 指定图例是否显示
            has_files = os.path.exists(funnel_name)
            if not has_files:
                os.mkdir('./' + funnel_name)
            funnel_plot.render(path='./%s/%s.gif' % (funnel_name, funnel_name))
            page.add(funnel_plot)
        page.render("./plots/%s.html" % funnel_name)
        return page

    def check_plot(self):
        funnel_name = self.name
        funnels = self.data
        page = Page()
        for funnel in funnels:
            funnel_n = funnel['环节'].tolist()
            # 横轴:
            funnel_c = funnel.columns.tolist()
            attr = []
            for i in range(1, len(funnel_c)):
                attr.append(str(funnel_c[i]))
            # 取值:
            v1 = funnel[funnel['环节'] == funnel_n[0]][:].filter(regex="[^环节]").iloc[0, :].tolist()
            v2 = funnel[funnel['环节'] == funnel_n[1]][:].filter(regex="[^环节]").iloc[0, :].tolist()
            v3 = funnel[funnel['环节'] == funnel_n[2]][:].filter(regex="[^环节]").iloc[0, :].tolist()
            v4 = funnel[funnel['环节'] == funnel_n[3]][:].filter(regex="[^环节]").iloc[0, :].tolist()
            v5 = funnel[funnel['环节'] == funnel_n[4]][:].filter(regex="[^环节]").iloc[0, :].tolist()
            v6 = funnel[funnel['环节'] == funnel_n[5]][:].filter(regex="[^环节]").iloc[0, :].tolist()
            mov_mean1 = Funnel_utils.flow_means(v1)
            mov_mean2 = Funnel_utils.flow_means(v2)
            mov_mean3 = Funnel_utils.flow_means(v3)
            mov_mean4 = Funnel_utils.flow_means(v4)
            mov_mean5 = Funnel_utils.flow_means(v5)
            mov_mean6 = Funnel_utils.flow_means(v6)
            mov_mean_up1 = []
            mov_mean_down1 = []
            mov_mean_up2 = []
            mov_mean_down2 = []
            mov_mean_up3 = []
            mov_mean_down3 = []
            mov_mean_up4 = []
            mov_mean_down4 = []
            mov_mean_up5 = []
            mov_mean_down5 = []
            mov_mean_up6 = []
            mov_mean_down6 = []
            for i in range(len(v1)):
                mov_mean_up1.append(mov_mean1[i]*1.1)
                mov_mean_down1.append(mov_mean1[i]*0.9)
                mov_mean_up2.append(mov_mean2[i] * 1.1)
                mov_mean_down2.append(mov_mean2[i] * 0.9)
                mov_mean_up3.append(mov_mean3[i]*1.1)
                mov_mean_down3.append(mov_mean3[i]*0.9)
                mov_mean_up4.append(mov_mean4[i]*1.1)
                mov_mean_down4.append(mov_mean4[i]*0.9)
                mov_mean_up5.append(mov_mean5[i]*1.1)
                mov_mean_down5.append(mov_mean5[i]*0.9)
                mov_mean_up6.append(mov_mean6[i]*1.1)
                mov_mean_down6.append(mov_mean6[i]*0.9)
            has_files = os.path.exists(funnel_name)
            if not has_files:
                os.mkdir('./' + funnel_name)
            line1 = Line("%s pv_total转化率" % funnel_name)
            line1.add("平均下限", attr, mov_mean_down1,
                      yaxis_min=0,
                      yaxis_max="dataMax",
                      yaxis_name='转化率',
                      is_yaxis_show=True,
                      is_stack=True,
                      is_label_show=True)
            line1.add(funnel_n[0], attr, v1,
                      yaxis_min=0,
                      yaxis_max="dataMax",
                      yaxis_name='转化率',
                      is_yaxis_show=True,
                      is_stack=True,
                      is_label_show=True)
            line1.add("平均上限", attr, mov_mean_up1,
                      yaxis_min=0,
                      yaxis_max="dataMax",
                      yaxis_name='转化率',
                      is_yaxis_show=True,
                      is_stack=True,
                      is_label_show=True)
            line1.render(path='./%s/pv_total转化率.gif' % funnel_name)
            page.add(line1)
            line2 = Line("%s uv_total转化率" % funnel_name)
            line2.add("平均下限", attr, mov_mean_down2,
                      yaxis_min=0,
                      yaxis_max="dataMax",
                      yaxis_name='转化率',
                      is_yaxis_show=True,
                      is_stack=True,
                      is_label_show=True)
            line2.add(funnel_n[1], attr, v2,
                      yaxis_min=0,
                      yaxis_max="dataMax",
                      yaxis_name='转化率',
                      is_yaxis_show=True,
                      is_stack=True,
                      is_label_show=True)
            line2.add("平均上限", attr, mov_mean_up2,
                      yaxis_min=0,
                      yaxis_max="dataMax",
                      yaxis_name='转化率',
                      is_yaxis_show=True,
                      is_stack=True,
                      is_label_show=True)
            line2.render(path='./%s/uv_total转化率.gif' % funnel_name)
            page.add(line2)
            line3 = Line("%s uv_m_total转化率" % funnel_name)
            line3.add("平均下限", attr, mov_mean_down3,
                      yaxis_min=0,
                      yaxis_max="dataMax",
                      yaxis_name='转化率',
                      is_yaxis_show=True,
                      is_stack=True,
                      is_label_show=True)
            line3.add(funnel_n[2], attr, v3,
                      yaxis_min=0,
                      yaxis_max="dataMax",
                      yaxis_name='转化率',
                      is_yaxis_show=True,
                      is_stack=True,
                      is_label_show=True)
            line3.add("平均上限", attr, mov_mean_up3,
                      yaxis_min=0,
                      yaxis_max="dataMax",
                      yaxis_name='转化率',
                      is_yaxis_show=True,
                      is_stack=True,
                      is_label_show=True)
            line3.render(path='./%s/uv_m_total转化率.gif' % funnel_name)
            page.add(line3)
            line4 = Line("%s pv_single转化率" % funnel_name)
            line4.add("平均下限", attr, mov_mean_down4,
                      yaxis_min=0,
                      yaxis_max="dataMax",
                      yaxis_name='转化率',
                      is_yaxis_show=True,
                      is_stack=True,
                      is_label_show=True)
            line4.add(funnel_n[3], attr, v4,
                      yaxis_min=0,
                      yaxis_max="dataMax",
                      yaxis_name='转化率',
                      is_yaxis_show=True,
                      is_stack=True,
                      is_label_show=True)
            line4.add("平均上限", attr, mov_mean_up4,
                      yaxis_min=0,
                      yaxis_max="dataMax",
                      yaxis_name='转化率',
                      is_yaxis_show=True,
                      is_stack=True,
                      is_label_show=True)
            line4.render(path='./%s/pv_single转化率.gif' % funnel_name)
            page.add(line4)
            line5 = Line("%s uv_single转化率" % funnel_name)
            line5.add("平均下限", attr, mov_mean_down5,
                      yaxis_min=0,
                      yaxis_max="dataMax",
                      yaxis_name='转化率',
                      is_yaxis_show=True,
                      is_stack=True,
                      is_label_show=True)
            line5.add(funnel_n[4], attr, v5,
                      yaxis_min=0,
                      yaxis_max="dataMax",
                      yaxis_name='转化率',
                      is_yaxis_show=True,
                      is_stack=True,
                      is_label_show=True)
            line5.add("平均上限", attr, mov_mean_up5,
                      yaxis_min=0,
                      yaxis_max="dataMax",
                      yaxis_name='转化率',
                      is_yaxis_show=True,
                      is_stack=True,
                      is_label_show=True)
            line5.render(path='./%s/uv_single转化率.gif' % funnel_name)
            page.add(line5)
            line6 = Line("%s uv_m_single转化率" % funnel_name)
            line6.add("平均下限", attr, mov_mean_down6,
                      yaxis_min=0,
                      yaxis_max="dataMax",
                      yaxis_name='转化率',
                      is_yaxis_show=True,
                      is_stack=True,
                      is_label_show=True)
            line6.add(funnel_n[5], attr, v6,
                      yaxis_min=0,
                      yaxis_max="dataMax",
                      yaxis_name='转化率',
                      is_yaxis_show=True,
                      is_stack=True,
                      is_label_show=True)
            line6.add("平均上限", attr, mov_mean_up6,
                      yaxis_min=0,
                      yaxis_max="dataMax",
                      yaxis_name='转化率',
                      is_yaxis_show=True,
                      is_stack=True,
                      is_label_show=True)
            line6.render(path='./%s/uv_m_single转化率.gif' % funnel_name)
            page.add(line6)
        page.render("./check/%s.html" % funnel_name)

5)转化率的监督平台:Funnel_check

#!/usr/bin/env python
# _*_ UTF-8 _*_
import pandas as pd
# 将某一个漏斗每天的转化率统计成一个点,转化成趋势图,然后呈现在plot上;
# 输入的是每天的转化率,横轴是时间,纵轴是转化率。
# 以漏斗为单位,一个漏斗构建一个检测图,一个漏斗分为最多六个步骤,
# 获取一个漏斗一段时间的转化率,得出对应的趋势图
class Funnel_check(object):
    def __init__(self, path):
        self.path = path

    def get_data(self):
        path = self.path
        result_data = pd.read_excel(path, dtype='O').fillna('nan')
        return result_data

    def get_funnels_name(self):
        path = self.path
        result_data = pd.read_excel(path, dtype='O').fillna('nan')
        funnel_names = result_data['funnel_name'].drop_duplicates().tolist()
        return funnel_names

    # 计算本漏斗、本节点、每一天的漏斗。
    # 单步转化率:本漏斗、本节点、每一天的求和。
    def day_result_calculate_single(self):
        result_data = self.get_data()
        funnel_data = result_data[['funnel_name', 'step', 'datetime', 'pv', 'uv', 'uv_m']]
        sum_pv = funnel_data['pv'].groupby([funnel_data['funnel_name'],
                                            funnel_data['step'],
                                            funnel_data['datetime']]).sum()
        sum_uv = funnel_data['uv'].groupby([funnel_data['funnel_name'],
                                            funnel_data['step'],
                                            funnel_data['datetime']]).sum()
        sum_uv_m = funnel_data['uv_m'].groupby([funnel_data['funnel_name'],
                                                funnel_data['step'],
                                                funnel_data['datetime']]).sum()
        sum_data = pd.DataFrame(pd.concat([sum_pv, sum_uv, sum_uv_m], axis=1)).reset_index()
        funnel_names = sum_data['funnel_name'].drop_duplicates()
        date_times = sum_data['datetime'].drop_duplicates()
        funnel_pv_rates = {}
        funnel_uv_rates = {}
        funnel_uv_m_rates = {}
        for funnel_name in funnel_names:
            day_funnel_pv_rates = {}
            day_funnel_uv_rates = {}
            day_funnel_uv_m_rates = {}
            for datetime in date_times:
                data = sum_data[(sum_data['funnel_name'] == funnel_name) & (sum_data['datetime'] == datetime)]
                # pv计算:
                pv_rates = []
                for i in range(1, 7):
                    t1 = data[data['step'] == i + 1]['pv']
                    t2 = data[data['step'] == i]['pv']
                    if (len(t1.values) != 0):
                        pv_rate = float(t1) / float(t2)
                        pv_rates.append(round(pv_rate, 6))
                    else:
                        pv_rate = 0
                        pv_rates.append(pv_rate)
                day_funnel_pv_rates[datetime] = pv_rates
                # uv计算:
                uv_rates = []
                for i in range(1, 7):
                    t1 = data[data['step'] == i + 1]['uv']
                    t2 = data[data['step'] == i]['uv']
                    if (len(t1.values) != 0):
                        uv_rate = float(t1) / float(t2)
                        uv_rates.append(round(uv_rate, 6))
                    else:
                        uv_rate = 0
                        uv_rates.append(uv_rate)
                day_funnel_uv_rates[datetime] = uv_rates
                # uv_m计算:
                uv_m_rates = []
                for i in range(1, 7):
                    t1 = data[data['step'] == i + 1]['uv_m']
                    t2 = data[data['step'] == i]['uv_m']
                    if (len(t1.values) != 0):
                        uv_m_rate = float(t1) / float(t2)
                        uv_m_rates.append(round(uv_m_rate, 6))
                    else:
                        uv_m_rate = 0
                        uv_m_rates.append(uv_m_rate)
                day_funnel_uv_m_rates[datetime] = uv_m_rates
            funnel_pv_rates[funnel_name] = day_funnel_pv_rates
            funnel_uv_rates[funnel_name] = day_funnel_uv_rates
            funnel_uv_m_rates[funnel_name] = day_funnel_uv_m_rates
        funnel_pv_singles = {}
        funnel_uv_singles = {}
        funnel_uv_m_singles = {}
        for day_funnel_pv_rate in funnel_pv_rates.keys():
            funnel_pv = pd.DataFrame(funnel_pv_rates[day_funnel_pv_rate])
            funnel_pv_single = funnel_pv.rename(
                {0: '1-->2', 1: '2-->3', 2: '3-->4', 3: '4-->5', 4: '5-->6', 5: '6-->7'},
                axis='index')
            funnel_pv_single.index.name = '环节'
            funnel_pv_single = funnel_pv_single.reset_index()
            funnel_pv_singles[day_funnel_pv_rate] = funnel_pv_single
        for day_funnel_uv_rate in funnel_uv_rates.keys():
            funnel_uv = pd.DataFrame(funnel_uv_rates[day_funnel_uv_rate])
            funnel_uv_single = funnel_uv.rename(
                {0: '1-->2', 1: '2-->3', 2: '3-->4', 3: '4-->5', 4: '5-->6', 5: '6-->7'},
                axis='index')
            funnel_uv_single.index.name = '环节'
            funnel_uv_single = funnel_uv_single.reset_index()
            funnel_uv_singles[day_funnel_uv_rate] = funnel_uv_single
        for day_funnel_uv_m_rate in funnel_uv_m_rates.keys():
            funnel_uv_m = pd.DataFrame(funnel_uv_m_rates[day_funnel_uv_m_rate])
            funnel_uv_m_single = funnel_uv_m.rename(
                {0: '1-->2', 1: '2-->3', 2: '3-->4', 3: '4-->5', 4: '5-->6', 5: '6-->7'},
                axis='index')
            funnel_uv_m_single.index.name = '环节'
            funnel_uv_m_single = funnel_uv_m_single.reset_index()
            funnel_uv_m_singles[day_funnel_uv_m_rate] = funnel_uv_m_single
        return funnel_pv_singles, funnel_uv_singles, funnel_uv_m_singles

    # 汇总转化率:本漏斗、本节点、每一天的求和。
    def day_result_calculate_total(self):
        result_data = self.get_data()
        funnel_data = result_data[['funnel_name', 'step', 'datetime', 'pv', 'uv', 'uv_m']]
        sum_pv = funnel_data['pv'].groupby([funnel_data['funnel_name'],
                                            funnel_data['step'],
                                            funnel_data['datetime']]).sum()
        sum_uv = funnel_data['uv'].groupby([funnel_data['funnel_name'],
                                            funnel_data['step'],
                                            funnel_data['datetime']]).sum()
        sum_uv_m = funnel_data['uv_m'].groupby([funnel_data['funnel_name'],
                                                funnel_data['step'],
                                                funnel_data['datetime']]).sum()
        sum_data = pd.DataFrame(pd.concat([sum_pv, sum_uv, sum_uv_m], axis=1)).reset_index()
        funnel_names = sum_data['funnel_name'].drop_duplicates()
        date_times = sum_data['datetime'].drop_duplicates()
        funnel_pv_total_rates = {}
        funnel_uv_total_rates = {}
        funnel_uv_m_total_rates = {}
        for funnel_name in funnel_names:
            day_funnel_pv_total_rates = {}
            day_funnel_uv_total_rates = {}
            day_funnel_uv_m_total_rates = {}
            for datetime in date_times:
                data = sum_data[(sum_data['funnel_name'] == funnel_name) & (sum_data['datetime'] == datetime)]
                # pv计算:
                pv_rates = []
                for i in range(1, 7):
                    t1 = data[data['step'] == i + 1]['pv']
                    t2 = data[data['step'] == 1]['pv']
                    if (len(t1.values) != 0):
                        pv_rate = float(t1) / float(t2)
                        pv_rates.append(round(pv_rate, 6))
                    else:
                        pv_rate = 0
                        pv_rates.append(pv_rate)
                day_funnel_pv_total_rates[datetime] = pv_rates
                # uv计算:
                uv_rates = []
                for i in range(1, 7):
                    t1 = data[data['step'] == i + 1]['uv']
                    t2 = data[data['step'] == 1]['uv']
                    if (len(t1.values) != 0):
                        uv_rate = float(t1) / float(t2)
                        uv_rates.append(round(uv_rate, 6))
                    else:
                        uv_rate = 0
                        uv_rates.append(uv_rate)
                day_funnel_uv_total_rates[datetime] = uv_rates
                # uv_m计算:
                uv_m_rates = []
                for i in range(1, 7):
                    t1 = data[data['step'] == i + 1]['uv_m']
                    t2 = data[data['step'] == 1]['uv_m']
                    if (len(t1.values) != 0):
                        uv_m_rate = float(t1) / float(t2)
                        uv_m_rates.append(round(uv_m_rate, 6))
                    else:
                        uv_m_rate = 0
                        uv_m_rates.append(uv_m_rate)
                day_funnel_uv_m_total_rates[datetime] = uv_m_rates
            funnel_pv_total_rates[funnel_name] = day_funnel_pv_total_rates
            funnel_uv_total_rates[funnel_name] = day_funnel_uv_total_rates
            funnel_uv_m_total_rates[funnel_name] = day_funnel_uv_m_total_rates
        funnel_pv_totals = {}
        funnel_uv_totals = {}
        funnel_uv_m_totals = {}
        for day_funnel_pv_rate in funnel_pv_total_rates.keys():
            funnel_pv = pd.DataFrame(funnel_pv_total_rates[day_funnel_pv_rate])
            funnel_pv_total = funnel_pv.rename(
                {0: '1-->2', 1: '2-->3', 2: '3-->4', 3: '4-->5', 4: '5-->6', 5: '6-->7'},
                axis='index')
            funnel_pv_total.index.name = '环节'
            funnel_pv_total = funnel_pv_total.reset_index()
            funnel_pv_totals[day_funnel_pv_rate] = funnel_pv_total

        for day_funnel_uv_rate in funnel_uv_total_rates.keys():
            funnel_uv = pd.DataFrame(funnel_uv_total_rates[day_funnel_uv_rate])
            funnel_uv_total = funnel_uv.rename(
                {0: '1-->2', 1: '2-->3', 2: '3-->4', 3: '4-->5', 4: '5-->6', 5: '6-->7'},
                axis='index')
            funnel_uv_total.index.name = '环节'
            funnel_uv_total = funnel_uv_total.reset_index()
            funnel_uv_totals[day_funnel_uv_rate] = funnel_uv_total

        for day_funnel_uv_m_rate in funnel_uv_m_total_rates.keys():
            funnel_uv_m = pd.DataFrame(funnel_uv_m_total_rates[day_funnel_uv_m_rate])
            funnel_uv_m_total = funnel_uv_m.rename(
                {0: '1-->2', 1: '2-->3', 2: '3-->4', 3: '4-->5', 4: '5-->6', 5: '6-->7'},
                axis='index')
            funnel_uv_m_total.index.name = '环节'
            funnel_uv_m_total = funnel_uv_m_total.reset_index()
            funnel_uv_m_totals[day_funnel_uv_m_rate] = funnel_uv_m_total
        return funnel_pv_totals, funnel_uv_totals, funnel_uv_m_totals

    # 单个漏斗的数据:本漏斗、本节点、每一天的求和。
    def day_data_combine_funnel(self, funnel_name):
        funnel_pv_totals, funnel_uv_totals, funnel_uv_m_totals = self.day_result_calculate_total()
        funnel_pv_singles, funnel_uv_singles, funnel_uv_m_singles = self.day_result_calculate_single()
        # 会产生六个字典:
        funnel_data = []
        funnel_pv_total = funnel_pv_totals[funnel_name]
        funnel_uv_total = funnel_uv_totals[funnel_name]
        funnel_uv_m_total = funnel_uv_m_totals[funnel_name]
        funnel_pv_single = funnel_pv_singles[funnel_name]
        funnel_uv_single = funnel_uv_singles[funnel_name]
        funnel_uv_m_single = funnel_uv_m_singles[funnel_name]
        funnel_data.append(funnel_pv_total)
        funnel_data.append(funnel_uv_total)
        funnel_data.append(funnel_uv_m_total)
        funnel_data.append(funnel_pv_single)
        funnel_data.append(funnel_uv_single)
        funnel_data.append(funnel_uv_m_single)
        return funnel_data

6)主函数main:

#!/usr/bin/env python
# _*_ UTF-8 _*_

from Funnel_livan import Funnel_build, Funnel_plot, Funnel_data, Funnel_check

if __name__ == '__main__':
    # 1、sql拼写过程。
    # path = '/Users/livan/PycharmProjects/offices/data/train_data.xlsx'
    # funnels = Funnel_build.Funnel_build(path=path)
    # sqls = funnels.built_sql()
    # with open('sqls.sql', 'w+') as f:
    #     f.write(sqls)

    # 2、获取数据,计算转化率,求所有天的漏斗:
    # path = '/Users/livan/PycharmProjects/offices/data/result_data.xlsx'
    # result = Funnel_data.Funnel_data(path=path)
    # funnels_name = result.get_funnels_name()
    # for i in range(len(funnels_name)):
    #     funnel_pv, funnel_uv, funnel_uv_m = result.data_combine_funnel(funnels_name[i])
    #     # 经过上一步共生成六组需要计算的漏斗:
    #     funnel_pv_s = funnel_pv[['环节', '单一环节转化率']]
    #     funnel_uv_s = funnel_uv[['环节', '单一环节转化率']]
    #     funnel_uv_m_s = funnel_uv_m[['环节', '单一环节转化率']]
    #     funnel_pv_t = funnel_pv[['环节', '总体转化率']]
    #     funnel_uv_t = funnel_uv[['环节', '总体转化率']]
    #     funnel_uv_m_t = funnel_uv_m[['环节', '总体转化率']]
    #     funnel_d = [funnel_pv_s,
    #                 funnel_uv_s,
    #                 funnel_uv_m_s,
    #                 funnel_pv_t,
    #                 funnel_uv_t,
    #                 funnel_uv_m_t]
    #     Funnel_plot.Funnel_plot(funnels_name[i], funnel_d).draw_plot()

    # 3、转化率趋势分析,求每天的漏斗:
    path = '/Users/livan/PycharmProjects/offices/data/result_data.xlsx'
    result = Funnel_check.Funnel_check(path=path)
    funnels_name = result.get_funnels_name()
    for i in range(len(funnels_name)):
        funnel_data = result.day_data_combine_funnel(funnels_name[i])
        Funnel_plot.Funnel_plot(funnels_name[i], funnel_data).check_plot()

生成的HQL为:

SELECT
  'login' funnel_name,'20190203' funnel_time_start,
  '20190305' funnel_time_end,'login_page' funnel_node_name,'1' step,
  'page_level' event_type,'页面级别' event_name,'5' action,
  'page_login_1' page_id,'login_page' page_name,'nan' product_id,
  'nan' inner_id,'nan' push_id,'nan' last_page_id,
  'nan' label_id,'nan' label,'nan' context_id,'nan' context,
  t.dt datetime,count(1) pv,
  size(collect_set(t.becif_no)) uv,
  count(distinct t.mid) uv_m
FROM mid.tracker_action_event t
WHERE
  t.dt='20190203' 
and  t.page_id='page_login_1'
union all……

文中的图形展示使用的pyecharts,方便好用,下面会做一些相应的介绍。

发布了137 篇原创文章 · 获赞 93 · 访问量 16万+

猜你喜欢

转载自blog.csdn.net/livan1234/article/details/88759317