Bloom Filter For HashJoin下推
當(dāng)需要查詢大量數(shù)據(jù)的時候,引擎層訪問數(shù)據(jù)、SQL層的交互和計算都會占用大量的開銷。PolarDB MySQL版會將Bloom Filter下推到引擎層進(jìn)行計算,極大的減少性能開銷,提升查詢性能。
前提條件
集群版本需為PolarDB MySQL版8.0版本且修訂版本需為8.0.2.2.3或以上。如何查看集群版本,請參見查詢版本號。
當(dāng)前僅INT類型支持Bloom Filter For HashJoin下推。
背景信息
Bloom Filter是一種減少存儲訪問,提升計算效率的成熟方法。PolarDB MySQL版會利用Bloom Filter對HashJoin進(jìn)行加速?;诖鷥r會對大數(shù)量場景,在build hash table的時候創(chuàng)建Bloom Filter,然后下推到Probe端的引擎中,在Probe的時候利用Bloom Filter過濾掉SQL層計算不需要的數(shù)據(jù)。這可以使得引擎層和SQL層的數(shù)據(jù)轉(zhuǎn)換、SQL層的計算都大幅減少,提升查詢性能。
使用方法
您可以通過loose_bloom_filter_enabled參數(shù)開啟Bloom Filter優(yōu)化功能。具體操作請參見設(shè)置集群參數(shù)和節(jié)點(diǎn)參數(shù)。
參數(shù)名稱 | 級別 | 描述 |
loose_bloom_filter_enabled | Global、Session | Bloom Filter優(yōu)化開關(guān)。取值范圍如下:
|
示例
本文以TPCH不創(chuàng)建Primary key和Index的Schema為例。下文分別展示了TPCH中Q3、Q11和Q16的執(zhí)行計劃,在Extra
列可以看到hash join with bloom filter
的信息。
Q3:
EXPLAIN SELECT l_orderkey, SUM(l_extendedprice * (1 - l_discount)) AS revenue, o_orderdate, o_shippriority FROM customer, orders, lineitem WHERE c_mktsegment = 'MACHINERY' AND c_custkey = o_custkey AND l_orderkey = o_orderkey AND o_orderdate < '1995-03-10' AND l_shipdate > '1995-03-10' GROUP BY l_orderkey, o_orderdate, o_shippriority ORDER BY revenue DESC, o_orderdate LIMIT 10\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: customer partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 148463 filtered: 10.00 Extra: Using where; Using temporary; Using filesort *************************** 2. row *************************** id: 1 select_type: SIMPLE table: orders partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 1486962 filtered: 3.33 Extra: Using where; Using join buffer (hash join with bloom filter) *************************** 3. row *************************** id: 1 select_type: SIMPLE table: lineitem partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5948979 filtered: 3.33 Extra: Using where; Using join buffer (hash join with bloom filter)
Q11:
EXPLAIN SELECT ps_partkey, SUM(ps_supplycost * ps_availqty) AS value FROM partsupp, supplier, nation WHERE ps_suppkey = s_suppkey AND s_nationkey = n_nationkey AND n_name = 'INDIA' GROUP BY ps_partkey HAVING SUM(ps_supplycost * ps_availqty) > (SELECT SUM(ps_supplycost * ps_availqty) * 0.0001000000 FROM partsupp, supplier, nation WHERE ps_suppkey = s_suppkey AND s_nationkey = n_nationkey AND n_name = 'INDIA' ) ORDER BY value DESC\G *************************** 1. row *************************** id: 1 select_type: PRIMARY table: nation partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 25 filtered: 10.00 Extra: Using where; Using temporary; Using filesort *************************** 2. row *************************** id: 1 select_type: PRIMARY table: supplier partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 10000 filtered: 10.00 Extra: Using where; Using join buffer (hash join with bloom filter) *************************** 3. row *************************** id: 1 select_type: PRIMARY table: partsupp partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 791815 filtered: 10.00 Extra: Using where; Using join buffer (hash join with bloom filter) *************************** 4. row *************************** id: 2 select_type: SUBQUERY table: nation partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 25 filtered: 10.00 Extra: Using where *************************** 5. row *************************** id: 2 select_type: SUBQUERY table: supplier partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 10000 filtered: 10.00 Extra: Using where; Using join buffer (hash join with bloom filter) *************************** 6. row *************************** id: 2 select_type: SUBQUERY table: partsupp partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 791815 filtered: 10.00 Extra: Using where; Using join buffer (hash join with bloom filter)
Q16:
EXPLAIN SELECT p_brand, p_type, p_size, COUNT(DISTINCT ps_suppkey) AS supplier_cnt FROM partsupp, part WHERE p_partkey = ps_partkey AND p_brand <> 'Brand#33' AND p_type NOT LIKE 'PROMO POLISHED%' AND p_size IN (34, 45, 33, 42, 9, 24, 26, 7) AND ps_suppkey NOT IN (SELECT s_suppkey FROM supplier WHERE s_comment LIKE '%Customer%Complaints%' ) GROUP BY p_brand, p_type, p_size ORDER BY supplier_cnt DESC, p_brand, p_type, p_size\G *************************** 1. row *************************** id: 1 select_type: PRIMARY table: part partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 198116 filtered: 40.00 Extra: Using where; Using temporary; Using filesort *************************** 2. row *************************** id: 1 select_type: PRIMARY table: partsupp partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 791815 filtered: 10.00 Extra: Using where; Using join buffer (hash join with bloom filter) *************************** 3. row *************************** id: 2 select_type: SUBQUERY table: supplier partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 10000 filtered: 11.11 Extra: Using where
性能效果
以TPCH不創(chuàng)建Primary key和Index為例,基于scale 1的數(shù)據(jù),針對上文所述的查詢示例Q3、Q11、Q16。開啟與關(guān)閉Bloom Filter功能的性能對比如下: