游戲行業(yè)玩家行為實(shí)時分析
構(gòu)建基于阿里云大數(shù)據(jù)產(chǎn)品的實(shí)時數(shù)據(jù)分析平臺,實(shí)現(xiàn)玩家行為日志的實(shí)時處理、分析及通過Quick BI進(jìn)行數(shù)據(jù)可視化展示,本文為您介紹具體的操作流程。
背景信息
基于多款阿里云大數(shù)據(jù)產(chǎn)品構(gòu)建一個實(shí)時數(shù)據(jù)分析平臺,該平臺能夠收集玩家的行為日志,進(jìn)行實(shí)時處理和分析,并最終將分析結(jié)果通過圖表的形式展現(xiàn)給業(yè)務(wù)人員。由DLF提供底層元數(shù)據(jù)管理和表的數(shù)據(jù)讀寫能力,通過EMR Serverless StarRocks實(shí)現(xiàn)實(shí)時數(shù)據(jù)處理和分析,最后使用Quick BI完成數(shù)據(jù)可視化。
前提條件
已創(chuàng)建DLF 2.0數(shù)據(jù)目錄。如未創(chuàng)建,詳情請參見創(chuàng)建數(shù)據(jù)目錄。
說明如果是RAM用戶,在進(jìn)行數(shù)據(jù)操作之前,需要先授予相應(yīng)的資源權(quán)限。詳情請參見授權(quán)管理。
已創(chuàng)建Serverless StarRocks實(shí)例,實(shí)例版本需要不低于3.2。如未創(chuàng)建,詳情請參見創(chuàng)建實(shí)例。
已創(chuàng)建DataWorks工作空間,并綁定Serverless StarRocks實(shí)例的計算資源。
操作流程
步驟1:載入Notebook案例
找到對應(yīng)的案例卡片,單擊卡片中的載入案例。
選擇載入到的工作空間和實(shí)例,單擊確認(rèn),進(jìn)入DataWorks數(shù)據(jù)開發(fā)頁面。
步驟2:參數(shù)初始化
# 參數(shù)初始化
# 1. 在DLF中創(chuàng)建Catalog,通過DLF控制臺頁面創(chuàng)建,獲取[your_dlf_catalog_id}]
# DLF控制臺地址:https://dlf-next.console.aliyun.com/
DLF_CATALOG_ID="[your_dlf_catalog_id]"
# 2.將[your-region]替換為您當(dāng)前Demo的Region,比如 cn-beijing,cn-hangzhou,cn-shanghai,cn-shenzhen
REGION="[your-region]"
# 切記,一定要執(zhí)行該腳本,以使得變量生效。
步驟3:創(chuàng)建StarRocks表,用于接收導(dǎo)入的OSS數(shù)據(jù)
運(yùn)行以下SQL,創(chuàng)建用戶畫像(user_profile)與用戶行為表(user_event)。
CREATE DATABASE IF NOT EXISTS game_db;
use game_db;
--用戶信息表
CREATE TABLE IF NOT EXISTS ods_user_profile (
user_id INT NOT NULL,
registration_date DATE NOT NULL,
last_login_date DATE,
age_group VARCHAR(20),
gender VARCHAR(10),
location VARCHAR(50),
game_hours INT,
favorite_game_mode VARCHAR(20),
play_frequency VARCHAR(20),
device_type VARCHAR(20),
os_version VARCHAR(20),
current_level INT,
total_deaths INT,
active_time VARCHAR(20),
language_preference VARCHAR(10)
)
PRIMARY KEY (user_id)
DISTRIBUTED BY HASH(user_id)
PROPERTIES (
"replication_num" = "1"
);
-- 用戶事件表
CREATE TABLE IF NOT EXISTS ods_user_event (
`user_id` INT,
`event_type` STRING,
`timestamp` datetime,
`location` STRING,
`level` INT,
`event_details` STRING
)
DISTRIBUTED BY HASH(user_id)
PROPERTIES (
"replication_num" = "1"
);
步驟4:使用Broker Load將OSS數(shù)據(jù)導(dǎo)入到StarRocks表中
運(yùn)行以下SQL,進(jìn)行數(shù)據(jù)導(dǎo)入。
use game_db;
--導(dǎo)入新的數(shù)據(jù)
LOAD LABEL game_db.user_profile_20240902_22
(
DATA INFILE("oss://emr-starrocks-benchmark-resource-${REGION}/sr_game_demo/user_profile/*")
INTO TABLE ods_user_profile
FORMAT AS "parquet"
)
WITH BROKER
(
"fs.oss.endpoint" = "oss-${REGION}-internal.aliyuncs.com"
)
PROPERTIES
(
"timeout" = "3600"
);
LOAD LABEL game_db.user_event_20240902_22
(
DATA INFILE("oss://emr-starrocks-benchmark-resource-${REGION}/sr_game_demo/user_event/*")
INTO TABLE ods_user_event
FORMAT AS "parquet"
)
WITH BROKER
(
"fs.oss.endpoint" = "oss-${REGION}-internal.aliyuncs.com"
)
PROPERTIES
(
"timeout" = "3600"
);
步驟5:即席查詢分析玩家留存率
StarRocks是極速的湖倉新范式計算引擎,針對ODS層的海量數(shù)據(jù)查詢整體查詢性能極高,有時候一些場景可以直接即席查詢ODS表,直接進(jìn)行日常分析。
USE game_db;
WITH daily_new_users AS (
SELECT
user_id,
registration_date
FROM
ods_user_profile
WHERE
registration_date BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY) AND CURRENT_DATE()
),
daily_login_events AS (
SELECT
user_id,
DATE(timestamp) AS login_date
FROM
ods_user_event
WHERE
timestamp BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 31 DAY) AND CURRENT_DATE()
),
retention AS (
SELECT
n.user_id,
n.registration_date,
l.login_date
FROM
daily_new_users n
LEFT JOIN
daily_login_events l ON n.user_id = l.user_id AND l.login_date = DATE_ADD(n.registration_date, INTERVAL 1 DAY)
)
SELECT
registration_date,
COUNT(DISTINCT user_id) AS new_users,
COUNT(DISTINCT CASE WHEN login_date IS NOT NULL THEN user_id END) AS retained_users,
COUNT(DISTINCT CASE WHEN login_date IS NOT NULL THEN user_id END) / COUNT(DISTINCT user_id) * 100.0 AS retention_rate
FROM
retention
GROUP BY
registration_date
ORDER BY
registration_date;
步驟6:使用StarRocks物化視圖,自動化構(gòu)建數(shù)倉DWD和ADS層
構(gòu)建DWD層
為簡化邏輯,此處直接將ODS層數(shù)據(jù)插入DWD。然而,實(shí)際情況中應(yīng)考慮更多業(yè)務(wù)邏輯的處理。
use game_db;
DROP MATERIALIZED VIEW IF EXISTS dwd_mv_user_profile;
CREATE MATERIALIZED VIEW IF NOT EXISTS dwd_mv_user_profile
DISTRIBUTED BY RANDOM
REFRESH ASYNC EVERY(INTERVAL 1 HOUR) -- 每隔小時刷新一次
AS
SELECT * FROM ods_user_profile;
DROP MATERIALIZED VIEW IF EXISTS dwd_mv_user_event;
CREATE MATERIALIZED VIEW IF NOT EXISTS dwd_mv_user_event
DISTRIBUTED BY RANDOM
REFRESH ASYNC EVERY(INTERVAL 1 HOUR) -- 每隔小時刷新一次
AS
SELECT * FROM ods_user_event;
構(gòu)建ADS層
use game_db;
--1. 創(chuàng)建ADS_MV_USER_RETENTION (用戶留存率)
CREATE MATERIALIZED VIEW IF NOT EXISTS ADS_MV_USER_RETENTION
DISTRIBUTED BY RANDOM
REFRESH ASYNC EVERY(INTERVAL 1 HOUR)
AS
SELECT
DATE_TRUNC('day', registration_date) AS registration_day,
DATE_TRUNC('day', last_login_date) AS last_login_day,
COUNT(DISTINCT user_id) AS users_retained
FROM dwd_mv_user_profile
GROUP BY
DATE_TRUNC('day', registration_date),
DATE_TRUNC('day', last_login_date);
-- 2. ADS_MV_USER_GEOGRAPHIC_DISTRIBUTION (用戶地理分布)
CREATE MATERIALIZED VIEW IF NOT EXISTS ADS_MV_USER_GEOGRAPHIC_DISTRIBUTION
DISTRIBUTED BY RANDOM
REFRESH ASYNC EVERY(INTERVAL 1 HOUR)
AS
SELECT
location AS geographic_location,
COUNT(DISTINCT user_id) AS total_users
FROM dwd_mv_user_profile
GROUP BY
location;
-- 3. ADS_MV_USER_DEVICE_PREFERENCE (設(shè)備使用習(xí)慣)
CREATE MATERIALIZED VIEW IF NOT EXISTS ADS_MV_USER_DEVICE_PREFERENCE
DISTRIBUTED BY RANDOM
REFRESH ASYNC EVERY(INTERVAL 1 HOUR)
AS
SELECT
device_type,
COUNT(DISTINCT user_id) AS total_users
FROM dwd_mv_user_profile
GROUP BY
device_type;
-- 4. ADS_MV_USER_PURCHASE_TRENDS (用戶購買趨勢)
-- 該視圖用于分析玩家每天的購買趨勢變化
CREATE MATERIALIZED VIEW IF NOT EXISTS ADS_MV_USER_PURCHASE_TRENDS
DISTRIBUTED BY RANDOM
REFRESH ASYNC EVERY(INTERVAL 1 HOUR)
AS
SELECT
DATE(timestamp) AS purchase_date,
COUNT(user_id) AS daily_purchase_events
FROM dwd_mv_user_event
WHERE event_type = '購買'
GROUP BY
purchase_date
ORDER BY
purchase_date;
步驟7:向數(shù)據(jù)湖中寫入數(shù)據(jù)(Paimon格式)
在StarRocks中創(chuàng)建External Catalog。
-- myfirstcatalog可以根據(jù)您的實(shí)際情況調(diào)整。 -- DROP CATALOG `myfirstcatalog`; CREATE EXTERNAL CATALOG `myfirstcatalog` PROPERTIES ( "type" = "paimon", "paimon.catalog.type" = "dlf-paimon", "dlf.catalog.id" = "${DLF_CATALOG_ID}" ); -- 如出現(xiàn):Unexpected exception: Catalog 'myfirstcatalog' doesn't exist,您可以注釋掉 -- DROP CATALOG `myfirstcatalog`; 重新執(zhí)行再試一次。
dlf.catalog.id
為您在數(shù)據(jù)湖構(gòu)建控制臺中創(chuàng)建的Catalog ID。寫數(shù)據(jù)到數(shù)據(jù)湖中(Paimon格式)。
CREATE DATABASE IF NOT EXISTS myfirstcatalog.game_db; CREATE TABLE IF NOT EXISTS myfirstcatalog.game_db.ADS_USER_PURCHASE_TRENDS( purchase_date DATE COMMENT '購買日期', daily_purchase_events INT COMMENT '每日購買事件數(shù)量' ); -- ADS:ETL加工數(shù)據(jù) INSERT INTO myfirstcatalog.game_db.ADS_USER_PURCHASE_TRENDS SELECT * from ADS_MV_USER_PURCHASE_TRENDS;
步驟8:通過Quick BI進(jìn)行報表分析和展示
通過Quick BI可以直接查詢StarRocks中最終ADS層的數(shù)據(jù),進(jìn)行報表頁面展示。
登錄Quick BI控制臺。
配置StarRocks數(shù)據(jù)源,詳情請參見阿里云數(shù)據(jù)源StarRocks。
創(chuàng)建數(shù)據(jù)集并分析數(shù)據(jù),詳情請參見創(chuàng)建并管理數(shù)據(jù)集。
數(shù)據(jù)集SQL1:
select * from game_db.ADS_MV_USER_RETENTION;
數(shù)據(jù)集SQL2:
select * from game_db.ADS_MV_USER_GEOGRAPHIC_DISTRIBUTION;
數(shù)據(jù)集SQL3:
select * from game_db.ADS_MV_USER_DEVICE_PREFERENCE;
數(shù)據(jù)集SQL4:
select * from game_db.ADS_MV_USER_PURCHASE_TRENDS;