TPC-H測試
本文詳細(xì)介紹了PolarDB-X的TPC-H測試設(shè)計、測試過程和測試結(jié)果。
背景信息
TPC-H是業(yè)界常用的一套Benchmark,由TPC委員會制定發(fā)布,用于評測數(shù)據(jù)庫的分析型查詢能力。TPC-H查詢包含8張數(shù)據(jù)表、22條復(fù)雜的SQL查詢,大多數(shù)查詢包含若干表Join、子查詢和Group-by聚合等。
本文中TPC-H的實(shí)現(xiàn)基于TPC-H的基準(zhǔn)測試,并不能與已發(fā)布的TPC-H基準(zhǔn)測試結(jié)果相比較,本文中的測試并不符合TPC-H基準(zhǔn)測試的所有要求。
測試設(shè)計
測試數(shù)據(jù)量
測試基于100 GB數(shù)據(jù)量(Scalar Factor=100),其中主要表數(shù)據(jù)量如下:
LINEITEM表約6億行
ORDERS表1.5億行
PART_SUPP表8000萬行
測試所用實(shí)例規(guī)格
節(jié)點(diǎn)規(guī)格
節(jié)點(diǎn)數(shù)
數(shù)據(jù)集大小
8C64G
6
100 GB
測試所用壓力機(jī)規(guī)格
ecs.g7.4xlarge(16 vCPU,64 GB內(nèi)存,存儲盤大于200 GB)
測試方法
準(zhǔn)備壓力機(jī)ECS
準(zhǔn)備一個ECS(存儲盤要求大于200 GB,需要存放工具生成的csv格式數(shù)據(jù)集),后續(xù)操作步驟中涉及的數(shù)據(jù)準(zhǔn)備、運(yùn)行壓測等使用的都是這臺ECS機(jī)器。
說明測試所用ECS需要部署在VPC網(wǎng)絡(luò)內(nèi)。請記錄該VPC的名稱和ID,后續(xù)的所有實(shí)例都將部署在該VPC內(nèi)。
準(zhǔn)備壓測所用PolarDB-X實(shí)例
創(chuàng)建PolarDB-X實(shí)例,詳細(xì)操作步驟請參見創(chuàng)建實(shí)例。
說明需保證ECS和PolarDB-X實(shí)例在同一個VPC中。
在實(shí)例中創(chuàng)建一個待壓測的數(shù)據(jù)庫(本測試中數(shù)據(jù)庫名為tpch_100g),詳細(xì)操作步驟請參見創(chuàng)建數(shù)據(jù)庫。
CREATE DATABASE tpch_100g;
在數(shù)據(jù)庫tpch_100g中創(chuàng)建對應(yīng)的表,方法如下:
CREATE TABLE `customer` ( `c_custkey` int(11) NOT NULL, `c_name` varchar(25) NOT NULL, `c_address` varchar(40) NOT NULL, `c_nationkey` int(11) NOT NULL, `c_phone` varchar(15) NOT NULL, `c_acctbal` decimal(15,2) NOT NULL, `c_mktsegment` varchar(10) NOT NULL, `c_comment` varchar(117) NOT NULL, PRIMARY KEY (`c_custkey`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 dbpartition by hash(`c_custkey`) tbpartition by hash(`c_custkey`) tbpartitions 4; CREATE TABLE `lineitem` ( `l_orderkey` bigint(20) NOT NULL, `l_partkey` int(11) NOT NULL, `l_suppkey` int(11) NOT NULL, `l_linenumber` bigint(20) NOT NULL, `l_quantity` decimal(15,2) NOT NULL, `l_extendedprice` decimal(15,2) NOT NULL, `l_discount` decimal(15,2) NOT NULL, `l_tax` decimal(15,2) NOT NULL, `l_returnflag` varchar(1) NOT NULL, `l_linestatus` varchar(1) NOT NULL, `l_shipdate` date NOT NULL, `l_commitdate` date NOT NULL, `l_receiptdate` date NOT NULL, `l_shipinstruct` varchar(25) NOT NULL, `l_shipmode` varchar(10) NOT NULL, `l_comment` varchar(44) NOT NULL, KEY `IDX_LINEITEM_SUPPKEY` (`l_suppkey`), KEY `IDX_LINEITEM_PARTKEY` (`l_partkey`), KEY `IDX_LINEITEM_SHIPDATE` (`l_shipdate`), PRIMARY KEY (`l_orderkey`,`l_linenumber`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 dbpartition by RIGHT_SHIFT(`l_orderkey`,6) tbpartition by RIGHT_SHIFT(`l_orderkey`,6) tbpartitions 4; CREATE TABLE `orders` ( `o_orderkey` bigint(20) NOT NULL, `o_custkey` int(11) NOT NULL, `o_orderstatus` varchar(1) NOT NULL, `o_totalprice` decimal(15,2) NOT NULL, `o_orderdate` date NOT NULL, `o_orderpriority` varchar(15) NOT NULL, `o_clerk` varchar(15) NOT NULL, `o_shippriority` bigint(20) NOT NULL, `o_comment` varchar(79) NOT NULL, PRIMARY KEY (`O_ORDERKEY`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 dbpartition by RIGHT_SHIFT(`O_ORDERKEY`,6) tbpartition by RIGHT_SHIFT(`O_ORDERKEY`,6) tbpartitions 4; CREATE TABLE `part` ( `p_partkey` int(11) NOT NULL, `p_name` varchar(55) NOT NULL, `p_mfgr` varchar(25) NOT NULL, `p_brand` varchar(10) NOT NULL, `p_type` varchar(25) NOT NULL, `p_size` int(11) NOT NULL, `p_container` varchar(10) NOT NULL, `p_retailprice` decimal(15,2) NOT NULL, `p_comment` varchar(23) NOT NULL, PRIMARY KEY (`p_partkey`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 dbpartition by hash(`p_partkey`) tbpartition by hash(`p_partkey`) tbpartitions 4; CREATE TABLE `partsupp` ( `ps_partkey` int(11) NOT NULL, `ps_suppkey` int(11) NOT NULL, `ps_availqty` int(11) NOT NULL, `ps_supplycost` decimal(15,2) NOT NULL, `ps_comment` varchar(199) NOT NULL, KEY `IDX_PARTSUPP_SUPPKEY` (`PS_SUPPKEY`), PRIMARY KEY (`ps_partkey`,`ps_suppkey`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 dbpartition by hash(`ps_partkey`) tbpartition by hash(`ps_partkey`) tbpartitions 4; CREATE TABLE `supplier` ( `s_suppkey` int(11) NOT NULL, `s_name` varchar(25) NOT NULL, `s_address` varchar(40) NOT NULL, `s_nationkey` int(11) NOT NULL, `s_phone` varchar(15) NOT NULL, `s_acctbal` decimal(15,2) NOT NULL, `s_comment` varchar(101) NOT NULL, PRIMARY KEY (`s_suppkey`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 dbpartition by hash(`s_suppkey`) tbpartition by hash(`s_suppkey`) tbpartitions 4; CREATE TABLE `nation` ( `n_nationkey` int(11) NOT NULL, `n_name` varchar(25) NOT NULL, `n_regionkey` int(11) NOT NULL, `n_comment` varchar(152) DEFAULT NULL, PRIMARY KEY (`n_nationkey`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 broadcast; CREATE TABLE `region` ( `r_regionkey` int(11) NOT NULL, `r_name` varchar(25) NOT NULL, `r_comment` varchar(152) DEFAULT NULL, PRIMARY KEY (`r_regionkey`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 broadcast;
調(diào)整實(shí)例參數(shù)
說明為了在壓測場景下達(dá)到最佳性能,需要調(diào)整PolarDB-X計算層實(shí)例參數(shù)。
修改參數(shù)XPROTO_MAX_DN_CONCURRENT和XPROTO_MAX_DN_WAIT_CONNECTION的值為4000,詳細(xì)操作步驟請參見參數(shù)設(shè)置。
通過命令行連接到PolarDB-X實(shí)例,在同一會話內(nèi)執(zhí)行如下SQL語句,關(guān)閉日志記錄與CPU采樣統(tǒng)計:
set GLOBAL RECORD_SQL = false; set GLOBAL ENABLE_HTAP=true; set GLOBAL ENABLE_MASTER_MPP=true; set GLOBAL MPP_METRIC_LEVEL = 0; set GLOBAL ENABLE_CPU_PROFILE = false; set GLOBAL ENABLE_SORT_AGG=false; set GLOBAL MPP_PARALLELISM=192; set GLOBAL GROUP_PARALLELISM=8;
數(shù)據(jù)準(zhǔn)備
下載腳本tpchData.tar.gz至壓力機(jī)ECS上,并解壓:
tar xzvf tpchData.tar.gz cd tpchData/ vi params.conf
修改params.conf配置文件,填入PolarDB-X實(shí)例的連接信息:
#!/bin/bash ### remote generating directory export remoteGenDir=./ ### target path export targetPath=../tpch/tpchRaw ### cores per worker, default value is 1 export coresPerWorker=`cat /proc/cpuinfo| grep "processor"| wc -l` ### threads per worker, default value is 1 export threadsPerWorker=`cat /proc/cpuinfo| grep "processor"| wc -l` #export threadsPerWorker=1 export hint="" export insertMysql="mysql -h{HOST} -P{PORT} -u{USER} -p{PASSWORD} -Ac --local-infile tpch_100g -e"
具體填入的值包括:
{HOST}:主機(jī)名
{PORT}:端口號
{USER}:用戶名
{PASSWORD}:密碼
如果希望更高效地生成數(shù)據(jù),可調(diào)大腳本中threadsPerWorker的值(如調(diào)整為壓測機(jī)的CPU核數(shù))。
執(zhí)行腳本,多進(jìn)程生成100 GB的數(shù)據(jù):
cd datagen sh generateTPCH.sh 100
可以在tpch/tpchRaw/SF100/目錄下查看到生成的數(shù)據(jù):
ls ../tpch/tpchRaw/SF100/ customer lineitem nation orders part partsupp region supplier
導(dǎo)入數(shù)據(jù)到PolarDB-X實(shí)例:
cd ../loadTpch sh loadTpch.sh 100
校驗數(shù)據(jù)完整性
通過命令行連接到PolarDB-X實(shí)例,查詢每張表的數(shù)據(jù)量是否符合預(yù)期:
MySQL [tpch_100g]> select (select count(*) from customer) as customer_cnt, (select count(*) from lineitem) as lineitem_cnt, (select count(*) from nation) as nation_cnt, (select count(*) from orders) as order_cnt, (select count(*) from part) as part_cnt, (select count(*) from partsupp) as partsupp_cnt, (select count(*) from region) as region_cnt, (select count(*) from supplier) as supplier_cnt; +--------------+--------------+------------+-----------+----------+--------------+------------+--------------+ | customer_cnt | lineitem_cnt | nation_cnt | order_cnt | part_cnt | partsupp_cnt | region_cnt | supplier_cnt | +--------------+--------------+------------+-----------+----------+--------------+------------+--------------+ | 15000000 | 600037902 | 25 | 150000000 | 20000000 | 80000000 | 5 | 1000000 | +--------------+--------------+------------+-----------+----------+--------------+------------+--------------+
采集統(tǒng)計信息
通過命令行連接到PolarDB-X實(shí)例,執(zhí)行analyze table收集表的統(tǒng)計信息:
analyze table customer; analyze table lineitem; analyze table nation; analyze table orders; analyze table part; analyze table partsupp; analyze table region; analyze table supplier;
進(jìn)行測試
下載測試腳本tpch-queries.tar.gz并解壓:
tar xzvf tpch-queries.tar.gz
運(yùn)行腳本,執(zhí)行查詢并計時:
cd tpch-queries 'time' -f "%e" sh all_query.sh {HOST} {USER} {PASSWORD} {DB} {PORT}
測試結(jié)果
引擎版本MySQL 5.7
版本號:polardb-2.4.0_5.4.19-20240718_xcluster5.4.19-20240630,詳情請參見:版本發(fā)布說明。
表格中SQL列為tpch-queries.tar.gz中對應(yīng)的SQL文件。
SQL | 執(zhí)行耗時(秒) |
01.sql | 38.93 |
02.sql | 1.57 |
03.sql | 11.83 |
04.sql | 2.63 |
05.sql | 7.07 |
06.sql | 7.49 |
07.sql | 24.43 |
08.sql | 9.22 |
09.sql | 38.88 |
10.sql | 6.78 |
11.sql | 2.93 |
12.sql | 10.2 |
13.sql | 3.02 |
14.sql | 1.67 |
15.sql | 5.1 |
16.sql | 1.59 |
17.sql | 1.71 |
18.sql | 13.78 |
19.sql | 2.82 |
20.sql | 9.29 |
21.sql | 14.54 |
22.sql | 2.41 |
合計 | 217.89 |
引擎版本MySQL 8.0
版本號:polardb-2.4.0_5.4.19-20240718_xcluster8.4.19-20240630,詳情請參見:版本發(fā)布說明。
表格中SQL列為tpch-queries.tar.gz中對應(yīng)的SQL文件。
SQL | 執(zhí)行耗時(秒) |
01.sql | 35.34 |
02.sql | 1.92 |
03.sql | 12.82 |
04.sql | 17.11 |
05.sql | 15.6 |
06.sql | 9.07 |
07.sql | 22.04 |
08.sql | 10.92 |
09.sql | 28.65 |
10.sql | 12.14 |
11.sql | 3.14 |
12.sql | 9.62 |
13.sql | 2.87 |
14.sql | 1.57 |
15.sql | 4.77 |
16.sql | 3.7 |
17.sql | 1.54 |
18.sql | 22.1 |
19.sql | 3.11 |
20.sql | 11.07 |
21.sql | 13.76 |
22.sql | 2.09 |
合計 | 244.95 |