综合案例之漏斗转化分析
业务目标、到达路径,路径步骤、步骤人数,步骤之间的相对转换率和绝对转换率
每一种业务都有他的核心任务和流程,而流程的每一个步骤,都可能有用户流失。
所以如果把每一个步骤及其对应的数据(如UV)拼接起来,就会形成一个上大下小的漏斗形态,这就是漏斗模型。
漏斗模型示例:
不同的业务场景有不同的业务路径 : 有先后顺序, 事件可以出现多次
注册转化漏斗 : 启动APP --> APP注册页面—>注册结果 -->提交订单–>支付成功
搜购转化漏斗 : 搜索商品–> 点击商品—>加入购物车–>提交订单–>支付成功
秒杀活动选购转化漏斗: 点击秒杀活动–>参加活动—>参与秒杀–>秒杀成功—>成功支付
电商的购买转化漏斗模型图
处理步骤 :
明确漏斗名称:购买转化漏斗
起始事件:浏览了商品的详情页
目标事件:支付
业务流程事件链路:详情页->购物车->下单页->支付
[事件之间有没有时间间隔要求 , 链路中相邻的两个事件是否可以有其他事件]
需求:求购买转化漏斗模型的转换率(事件和事件之间没有时间间隔要求,并且相邻两个事件可以去干其他的事)
1.每一个步骤的uv
2.相对的转换率(下一个步骤的uv/上一个步骤的UV),绝对的转换率(当前步骤的UV第一步骤的UV)
关心的事件:e1,e2,e4,e5 ==> 先后顺序不能乱
-- 准备数据
user_id event_id event_action event_time
u001,e1,view_detail_page,2022-11-01 01:10:21
u001,e2,add_bag_page,2022-11-01 01:11:13
u001,e3,collect_goods_page,2022-11-01 02:07:11
u002,e3,collect_goods_page,2022-11-01 01:10:21
u002,e4,order_detail_page,2022-11-01 01:11:13
u002,e5,pay_detail_page,2022-11-01 02:07:11
u002,e6,click_adver_page,2022-11-01 13:07:23
u002,e7,home_page,2022-11-01 08:18:12
u002,e8,list_detail_page,2022-11-01 23:34:29
u002,e1,view_detail_page,2022-11-01 11:25:32
u002,e2,add_bag_page,2022-11-01 12:41:21
u002,e3,collect_goods_page,2022-11-01 16:21:15
u002,e4,order_detail_page,2022-11-01 21:41:12
u003,e5,pay_detail_page,2022-11-01 01:10:21
u003,e6,click_adver_page,2022-11-01 01:11:13
u003,e7,home_page,2022-11-01 02:07:11
u001,e4,order_detail_page,2022-11-01 13:07:23
u001,e5,pay_detail_page,2022-11-01 08:18:12
u001,e6,click_adver_page,2022-11-01 23:34:29
u001,e7,home_page,2022-11-01 11:25:32
u001,e8,list_detail_page,2022-11-01 12:41:21
u001,e1,view_detail_page,2022-11-01 16:21:15
u001,e2,add_bag_page,2022-11-01 21:41:12
u003,e8,list_detail_page,2022-11-01 13:07:23
u003,e1,view_detail_page,2022-11-01 08:18:12
u003,e2,add_bag_page,2022-11-01 23:34:29
u003,e3,collect_goods_page,2022-11-01 11:25:32
u003,e4,order_detail_page,2022-11-01 12:41:21
u003,e5,pay_detail_page,2022-11-01 16:21:15
u003,e6,click_adver_page,2022-11-01 21:41:12
u004,e7,home_page,2022-11-01 01:10:21
u004,e8,list_detail_page,2022-11-01 01:11:13
u004,e1,view_detail_page,2022-11-01 02:07:11
u004,e2,add_bag_page,2022-11-01 13:07:23
u004,e3,collect_goods_page,2022-11-01 08:18:12
u004,e4,order_detail_page,2022-11-01 23:34:29
u004,e5,pay_detail_page,2022-11-01 11:25:32
u004,e6,click_adver_page,2022-11-01 12:41:21
u004,e7,home_page,2022-11-01 16:21:15
u004,e8,list_detail_page,2022-11-01 21:41:12
u005,e1,view_detail_page,2022-11-01 01:10:21
u005,e2,add_bag_page,2022-11-01 01:11:13
u005,e3,collect_goods_page,2022-11-01 02:07:11
u005,e4,order_detail_page,2022-11-01 13:07:23
u005,e5,pay_detail_page,2022-11-01 08:18:12
u005,e6,click_adver_page,2022-11-01 23:34:29
u005,e7,home_page,2022-11-01 11:25:32
u005,e8,list_detail_page,2022-11-01 12:41:21
u005,e1,view_detail_page,2022-11-01 16:21:15
u005,e2,add_bag_page,2022-11-01 21:41:12
u005,e3,collect_goods_page,2022-11-01 01:10:21
u006,e4,order_detail_page,2022-11-01 01:11:13
u006,e5,pay_detail_page,2022-11-01 02:07:11
u006,e6,click_adver_page,2022-11-01 13:07:23
u006,e7,home_page,2022-11-01 08:18:12
u006,e8,list_detail_page,2022-11-01 23:34:29
u006,e1,view_detail_page,2022-11-01 11:25:32
u006,e2,add_bag_page,2022-11-01 12:41:21
u006,e3,collect_goods_page,2022-11-01 16:21:15
u006,e4,order_detail_page,2022-11-01 21:41:12
u006,e5,pay_detail_page,2022-11-01 23:10:21
u006,e6,click_adver_page,2022-11-01 01:11:13
u007,e7,home_page,2022-11-01 02:07:11
u007,e8,list_detail_page,2022-11-01 13:07:23
u007,e1,view_detail_page,2022-11-01 08:18:12
u007,e2,add_bag_page,2022-11-01 23:34:29
u007,e3,collect_goods_page,2022-11-01 11:25:32
u007,e4,order_detail_page,2022-11-01 12:41:21
u007,e5,pay_detail_page,2022-11-01 16:21:15
u007,e6,click_adver_page,2022-11-01 21:41:12
u007,e7,home_page,2022-11-01 01:10:21
u008,e8,list_detail_page,2022-11-01 01:11:13
u008,e1,view_detail_page,2022-11-01 02:07:11
u008,e2,add_bag_page,2022-11-01 13:07:23
u008,e3,collect_goods_page,2022-11-01 08:18:12
u008,e4,order_detail_page,2022-11-01 23:34:29
u008,e5,pay_detail_page,2022-11-01 11:25:32
u008,e6,click_adver_page,2022-11-01 12:41:21
u008,e7,home_page,2022-11-01 16:21:15
u008,e8,list_detail_page,2022-11-01 21:41:12
u008,e1,view_detail_page,2022-11-01 01:10:21
u009,e2,add_bag_page,2022-11-01 01:11:13
u009,e3,collect_goods_page,2022-11-01 02:07:11
u009,e4,order_detail_page,2022-11-01 13:07:23
u009,e5,pay_detail_page,2022-11-01 08:18:12
u009,e6,click_adver_page,2022-11-01 23:34:29
u009,e7,home_page,2022-11-01 11:25:32
u009,e8,list_detail_page,2022-11-01 12:41:21
u009,e1,view_detail_page,2022-11-01 16:21:15
u009,e2,add_bag_page,2022-11-01 21:41:12
u009,e3,collect_goods_page,2022-11-01 01:10:21
u010,e4,order_detail_page,2022-11-01 01:11:13
u010,e5,pay_detail_page,2022-11-01 02:07:11
u010,e6,click_adver_page,2022-11-01 13:07:23
u010,e7,home_page,2022-11-01 08:18:12
u010,e8,list_detail_page,2022-11-01 23:34:29
u010,e5,pay_detail_page,2022-11-01 11:25:32
u010,e6,click_adver_page,2022-11-01 12:41:21
u010,e7,home_page,2022-11-01 16:21:15
u010,e8,list_detail_page,2022-11-01 21:41:12
-- 创建表
drop table if exists event_info_log;
create table event_info_log
(
user_id varchar(20),
event_id varchar(20),
event_action varchar(20),
event_time datetime
)
DUPLICATE KEY(user_id)
DISTRIBUTED BY HASH(user_id) BUCKETS 1;
-- 通过本地文件的方式导入数据
curl \
-u root: \
-H "label:event_info_log" \
-H "column_separator:," \
-T /root/data/event_log.txt \
http://linux01:8040/api/test/event_info_log/_stream_load
逻辑分析:
--1. 先将用户的事件序列,按照漏斗模型定义的条件进行过滤,留下满足条件的事件
--2. 将同一个人的满足条件的事件ID收集到数组,按时间先后排序,拼接成字符串
--3. 将拼接好的字符串,匹配漏斗模型抽象出来的正则表达式
1.筛选时间条件,确定每个人的事件序列
select
user_id,
max(event_ll) as event_seq
from
(
select
user_id,
group_concat(event_id)over(partition by user_id order by report_date) as event_ll
from
(
select
user_id,event_id,report_date
from event_info_log
where event_id in ('e1','e2','e4','e5')
and to_date(report_date) = '2022-11-01'
order by user_id,report_date
) as temp
) as temp2
group by user_id;
+---------+------------------------+
| user_id | event_ll |
+---------+------------------------+
| u006 | e4, e5, e1, e2, e4, e5 |
| u007 | e1, e4, e5, e2 |
| u005 | e1, e2, e5, e4, e1, e2 |
| u004 | e1, e5, e2, e4 |
| u010 | e4, e5, e5 |
| u001 | e1, e2, e5, e4, e1, e2 |
| u003 | e5, e1, e4, e5, e2 |
| u002 | e4, e5, e1, e2, e4 |
| u008 | e1, e1, e5, e2, e4 |
| u009 | e2, e5, e4, e1, e2 |
+---------+------------------------+
2.确定匹配规则模型
select
user_id,
'购买转化漏斗' as funnel_name ,
case
-- 正则匹配,先触发过e1,在触发过e2,在触发过e4,在触发过e5
when event_seq rlike('e1.*e2.*e4.*e5') then 4
-- 正则匹配,先触发过e1,在触发过e2,在触发过e4
when event_seq rlike('e1.*e2.*e4') then 3
-- 正则匹配,先触发过e1,在触发过e2
when event_seq rlike('e1.*e2') then 2
-- 正则匹配,只触发过e1
when event_seq rlike('e1') then 1
else 0 end step
from
(
select
user_id,
max(event_ll) as event_seq
from
(
select
user_id,
group_concat(event_id)over(partition by user_id order by report_date) as event_ll
from
(
select
user_id,event_id,report_date
from event_info_log
where event_id in ('e1','e2','e4','e5')
and to_date(report_date) = '2022-11-01'
order by user_id,report_date
) as temp
) as temp2
group by user_id
) as tmp3;
+---------+--------------------+------+
| user_id | funnel_name | step |
+---------+--------------------+------+
| u006 | 购买转化漏斗 | 4 |
| u007 | 购买转化漏斗 | 2 |
| u005 | 购买转化漏斗 | 3 |
| u004 | 购买转化漏斗 | 3 |
| u010 | 购买转化漏斗 | 0 |
| u001 | 购买转化漏斗 | 3 |
| u003 | 购买转化漏斗 | 2 |
| u002 | 购买转化漏斗 | 3 |
| u008 | 购买转化漏斗 | 3 |
| u009 | 购买转化漏斗 | 2 |
+---------+--------------------+------+
-- 最后计算转换率
select
funnel_name,
sum(if(step >= 1 ,1,0)) as step1,
sum(if(step >= 2 ,1,0)) as step2,
sum(if(step >= 3 ,1,0)) as step3,
sum(if(step >= 4 ,1,0)) as step4,
round(sum(if(step >= 2 ,1,0))/sum(if(step >= 1 ,1,0)),2) as 'step1->step2_radio',
round(sum(if(step >= 3 ,1,0))/sum(if(step >= 2 ,1,0)),2) as 'step2->step3_radio',
round(sum(if(step >= 4 ,1,0))/sum(if(step >= 3 ,1,0)),2) as 'step3->step4_radio'
from
(
select
'购买转化漏斗' as funnel_name ,
case
-- 正则匹配,先触发过e1,在触发过e2,在触发过e4,在触发过e5
when event_seq regexp('e1.*e2.*e4.*e5') then 4
-- 正则匹配,先触发过e1,在触发过e2,在触发过e4
when event_seq regexp('e1.*e2.*.*e4') then 3
-- 正则匹配,先触发过e1,在触发过e2
when event_seq regexp('e1.*e2') then 2
-- 正则匹配,只触发过e1
when event_seq regexp('e1') then 1
else 0 end step
from
(
select
user_id,
max(event_seq) as event_seq
from
-- 因为在doris1.1版本中还不支持数组,所以拼接字符串的时候还没办法排序
(
select
user_id,
-- 用开窗的方式进行排序,然后在有序的按照时间升序,将事件拼接
group_concat(concat(report_date,'_',event_id),'|')over(partition by user_id order by report_date) as event_seq
from event_info_log
where to_date(report_date) = '2022-11-01'
and event_id in('e1','e4','e5','e2')
) as tmp
group by user_id
) as t1
) as t2
group by funnel_name;
+--------------------+-------+-------+-------+-------+--------------------+--------------------+--------------------+
| funnel_name | step1 | step2 | step3 | step4 | step1->step2_radio | step2->step3_radio | step3->step4_radio |
+--------------------+-------+-------+-------+-------+--------------------+--------------------+--------------------+
| 购买转化漏斗 | 9 | 9 | 6 | 1 | 1 | 0.67 | 0.17 |
+--------------------+-------+-------+-------+-------+--------------------+--------------------+--------------------+
漏斗模型分析函数window_funnel
封装、要素(时间范围,事件的排序时间依据,漏斗模型的事件链)
语法:
window_funnel(window, mode, timestamp_column, event1, event2, ... , eventN)
漏斗分析函数搜索滑动时间窗口内最大的发生的最大事件序列长度。
-- window :滑动时间窗口大小,单位为秒。
-- mode :保留,目前只支持default。-- 相邻两个事件之间没有时间间隔要求,并且相邻两个事件中可以做其他的事件
-- timestamp_column :指定时间列,类型为DATETIME, 滑动窗口沿着此列工作。
-- eventN :表示事件的布尔表达式。
select
user_id,
window_funnel(3600*24, 'default', event_time, event_id='e1', event_id='e2' , event_id='e4', event_id='e5') as step
from event_info_log
group by user_id
+---------+------+
| user_id | step |
+---------+------+
| u006 | 4 |
| u007 | 2 |
| u005 | 3 |
| u004 | 3 |
| u010 | 0 |
| u001 | 3 |
| u003 | 2 |
| u002 | 3 |
| u008 | 3 |
| u009 | 2 |
+---------+------+
-- 算每一层级的转换率
select
'购买转化漏斗' as funnel_name,
sum(if(step >= 1 ,1,0)) as step1,
sum(if(step >= 2 ,1,0)) as step2,
sum(if(step >= 3 ,1,0)) as step3,
sum(if(step >= 4 ,1,0)) as step4,
round(sum(if(step >= 2 ,1,0))/sum(if(step >= 1 ,1,0)),2) as 'step1->step2_radio',
round(sum(if(step >= 3 ,1,0))/sum(if(step >= 2 ,1,0)),2) as 'step2->step3_radio',
round(sum(if(step >= 4 ,1,0))/sum(if(step >= 3 ,1,0)),2) as 'step3->step4_radio'
from
(
select
user_id,
window_funnel(3600*24, 'default', report_date, event_id='e1', event_id='e2' , event_id='e4', event_id='e5') as step
from event_info_log
where to_date(report_date) = '2022-11-01'
and event_id in('e1','e4','e5','e2')
group by user_id
) as t1
-- res
+--------------------+-------+-------+-------+-------+--------------------+--------------------+--------------------+
| funnel_name | step1 | step2 | step3 | step4 | step1->step2_radio | step2->step3_radio | step3->step4_radio |
+--------------------+-------+-------+-------+-------+--------------------+--------------------+--------------------+
| 购买转化漏斗 | 9 | 9 | 6 | 1 | 1 | 0.67 | 0.17 |
+--------------------+-------+-------+-------+-------+--------------------+--------------------+--------------------+