窗口函數
窗口函數常用于計算分組排名,移動平均,累計和等復雜計算。本文介紹云原生數據倉庫 AnalyticDB MySQL 版窗口函數的用法與示例。
排序函數
CUME_DIST:返回一組數值中每個值的累計分布。
RANK:返回數據集中每個值的排名。
DENSE_RANK:返回一組數值中每個數值的排名。
NTILE:將每個窗口分區的數據分散到桶號從1到n的n個桶中。
ROW_NUMBER:根據行在窗口分區內的順序,為每行數據返回一個唯一的有序行號,行號從1開始。
PERCENT_RANK:返回數據集中每個數據的排名百分比,其結果由
(r - 1) / (n - 1)
計算得出。其中r為RANK()計算的當前行排名, n為當前窗口分區內總的行數。
值函數
FIRST_VALUE:返回窗口分區第1行的值。
LAST_VALUE返回窗口分區最后1行的值。
LAG:返回窗口內距離當前行之前偏移offset后的值。
LEAD:返回窗口內距離當前行偏移offset后的值。
NTH_VALUE:返回窗口內偏移指定offset后的值,偏移量從1開始。
概述
窗口函數基于查詢結果的行數據進行計算,窗口函數運行在HAVING
子句之后、 ORDER BY
子句之前。窗口函數需要特殊的關鍵字OVER
子句來指定窗口即觸發一個窗口函數。
分析型數據庫MySQL版支持三種類型的窗口函數:聚合函數、排序函數和值函數。
語法
function over ([partition by a] order by b RANGE|ROWS BETWEEN start AND end)
窗口函數包含以下三個部分。
分區規范(可選):用于將輸入行分散到不同的分區中,過程和
GROUP BY
子句的分散過程相似。排序規范:決定輸入數據行在窗口函數中執行的順序。
窗口區間:指定計算數據的窗口邊界。
窗口區間支持
RANGE
、ROWS
兩種模式:RANGE
按照計算列值的范圍進行定義。ROWS
按照計算列的行數進行范圍定義。RANGE
、ROWS
中可以使用BETWEEN start AND end
指定邊界可取值。BETWEEN start AND end
取值為:CURRENT ROW
,當前行。N PRECEDING
,前n
行。UNBOUNDED PRECEDING
,直到第1
行。N FOLLOWING
,后n
行。UNBOUNDED FOLLOWING
,直到最后1
行。
例如,以下查詢根據當前窗口的每行數據計算profit
的部分總和。
select year,country,profit,sum(profit) over (partition by country order by year ROWS BETWEEN UNBOUNDED PRECEDING and CURRENT ROW) as slidewindow from testwindow;
+------+---------+--------+-------------+
| year | country | profit | slidewindow |
+------+---------+--------+-------------+
| 2001 | USA | 50 | 50 |
| 2001 | USA | 1500 | 1550 |
| 2000 | Germany | 75 | 75 |
| 2000 | Germany | 75 | 150 |
| 2001 | Germany | 79 | 229 |
| 2000 | Finland | 1500 | 1500 |
| 2001 | Finland | 10 | 1510 |
而以下查詢只能計算出profit
的總和。
select country,sum(profit) over (partition by country) from testwindow;
+---------+-----------------------------------------+
| country | sum(profit) OVER (PARTITION BY country) |
+---------+-----------------------------------------+
| Germany | 229 |
| Germany | 229 |
| Germany | 229 |
| USA | 1550 |
| USA | 1550 |
| Finland | 1510 |
| Finland | 1510 |
注意事項
邊界值的取值有如下要求:
start
不能為UNBOUNDED FOLLOWING
,否則提示Window frame start cannot be UNBOUNDED FOLLOWING
錯誤。end
不能為UNBOUNDED PRECEDING
,否則提示Window frame end cannot be UNBOUNDED PRECEDING
錯誤。start
為CURRENT ROW
并且end
為N PRECEDING
時,將提示Window frame starting from CURRENT ROW cannot end with PRECEDING
錯誤。start
為N FOLLOWING
并且end
為N PRECEDING
時,將提示Window frame starting from FOLLOWING cannot end with PRECEDING
錯誤。start
為N FOLLOWING
并且end
為CURRENT ROW
,將提示Window frame starting from FOLLOWING cannot end with CURRENT ROW
錯誤。
當模式為RANGE
時:
start
或者end
為N PRECEDING
時,將提示Window frame RANGE PRECEDING is only supported with UNBOUNDED
錯誤。start
或者end
為N FOLLOWING
時,將提示Window frame RANGE FOLLOWING is only supported with UNBOUNDED
錯誤。
準備工作
本文中的窗口函數均以testwindow
表為測試數據。
create table testwindow(year int, country varchar(20), product varchar(20), profit int) distributed by hash(year);
insert into testwindow values (2000,'Finland','Computer',1500);
insert into testwindow values (2001,'Finland','Phone',10);
insert into testwindow values (2000,'Germany','Calculator',75);
insert into testwindow values (2000,'Germany','Calculator',75);
insert into testwindow values (2001,'Germany','Calculator',79);
insert into testwindow values (2001,'USA','Calculator',50);
insert into testwindow values (2001,'USA','Computer',1500);
SELECT * FROM testwindow;
+------+---------+------------+--------+
| year | country | product | profit |
+------+---------+------------+--------+
| 2000 | Finland | Computer | 1500 |
| 2001 | Finland | Phone | 10 |
| 2000 | Germany | Calculator | 75 |
| 2000 | Germany | Calculator | 75 |
| 2001 | Germany | Calculator | 79 |
| 2001 | USA | Calculator | 50 |
| 2001 | USA | Computer | 1500 |
聚合函數
所有聚合函數都可以通過添加OVER
子句來作為窗口函數使用,聚合函數將基于當前滑動窗口內的數據行計算每一行數據。
例如,通過以下查詢循環顯示每個店員每天的訂單額總和。
SELECT clerk, orderdate, orderkey, totalprice,sum(totalprice) OVER (PARTITION BY clerk ORDER BY orderdate) AS rolling_sum FROM orders ORDER BY clerk, orderdate, orderkey
CUME_DIST
CUME_DIST()
命令說明:返回一組數值中每個值的累計分布。
返回結果:在窗口分區中對窗口進行排序后的數據集,包括當前行和當前行之前的數據行數。排序中任何關聯值均會計算成相同的分布值。
返回值類型:DOUBLE。
示例:
select year,country,product,profit,cume_dist() over (partition by country order by profit) as cume_dist from testwindow; +------+---------+------------+--------+--------------------+ | year | country | product | profit | cume_dist | +------+---------+------------+--------+--------------------+ | 2001 | USA | Calculator | 50 | 0.5 | | 2001 | USA | Computer | 1500 | 1.0 | | 2001 | Finland | Phone | 10 | 0.5 | | 2000 | Finland | Computer | 1500 | 1.0 | | 2000 | Germany | Calculator | 75 | 0.6666666666666666 | | 2000 | Germany | Calculator | 75 | 0.6666666666666666 | | 2001 | Germany | Calculator | 79 | 1.0 |
RANK
RANK()
命令說明:返回數據集中每個值的排名。
排名值是將當前行之前的行數加1,不包含當前行。因此,排序的關聯值可能產生順序上的空隙,而且這個排名會對每個窗口分區進行計算。
返回值類型:BIGINT。
示例:
select year,country,product,profit,rank() over (partition by country order by profit) as rank from testwindow; +------+---------+------------+--------+------+ | year | country | product | profit | rank | +------+---------+------------+--------+------+ | 2001 | Finland | Phone | 10 | 1 | | 2000 | Finland | Computer | 1500 | 2 | | 2001 | USA | Calculator | 50 | 1 | | 2001 | USA | Computer | 1500 | 2 | | 2000 | Germany | Calculator | 75 | 1 | | 2000 | Germany | Calculator | 75 | 1 | | 2001 | Germany | Calculator | 79 | 3 |
DENSE_RANK
DENSE_RANK()
命令說明:返回一組數值中每個數值的排名。
DENSE_RANK()
與RANK()
功能相似,但是DENSE_RANK()
關聯值不會產生順序上的空隙。返回值類型:BIGINT。
示例:
select year,country,product,profit,dense_rank() over (partition by country order by profit) as dense_rank from testwindow; +------+---------+------------+--------+------------+ | year | country | product | profit | dense_rank | +------+---------+------------+--------+------------+ | 2001 | Finland | Phone | 10 | 1 | | 2000 | Finland | Computer | 1500 | 2 | | 2001 | USA | Calculator | 50 | 1 | | 2001 | USA | Computer | 1500 | 2 | | 2000 | Germany | Calculator | 75 | 1 | | 2000 | Germany | Calculator | 75 | 1 | | 2001 | Germany | Calculator | 79 | 2 |
NTILE
NTILE(n)
命令說明:將每個窗口分區的數據分散到桶號從
1
到n
的n
個桶中。桶號值最多間隔
1
,如果窗口分區中的數據行數不能均勻地分散到每一個桶中,則剩余值將從第1
個桶開始,每1
個桶分1
行數據。例如,有6行數據和4個桶, 最終桶號值為1 1 2 2 3 4
。返回值類型:BIGINT。
示例:
select year,country,product,profit,ntile(2) over (partition by country order by profit) as ntile2 from testwindow; +------+---------+------------+--------+--------+ | year | country | product | profit | ntile2 | +------+---------+------------+--------+--------+ | 2001 | USA | Calculator | 50 | 1 | | 2001 | USA | Computer | 1500 | 2 | | 2001 | Finland | Phone | 10 | 1 | | 2000 | Finland | Computer | 1500 | 2 | | 2000 | Germany | Calculator | 75 | 1 | | 2000 | Germany | Calculator | 75 | 1 | | 2001 | Germany | Calculator | 79 | 2 |
ROW_NUMBER
ROW_NUMBER()
命令說明:根據行在窗口分區內的順序,為每行數據返回一個唯一的有序行號,行號從
1
開始。返回值類型:BIGINT。
示例:
SELECT year, country, product, profit, ROW_NUMBER() OVER(PARTITION BY country) AS row_num1 FROM testwindow; +------+---------+------------+--------+----------+ | year | country | product | profit | row_num1 | +------+---------+------------+--------+----------+ | 2001 | USA | Calculator | 50 | 1 | | 2001 | USA | Computer | 1500 | 2 | | 2000 | Germany | Calculator | 75 | 1 | | 2000 | Germany | Calculator | 75 | 2 | | 2001 | Germany | Calculator | 79 | 3 | | 2000 | Finland | Computer | 1500 | 1 | | 2001 | Finland | Phone | 10 | 2 |
PERCENT_RANK
PERCENT_RANK()
命令說明:返回數據集中每個數據的排名百分比,其結果由
(r - 1) / (n - 1)
計算得出。其中,r
為RANK()
計算的當前行排名,n
為當前窗口分區內總的行數。返回值類型:DOUBLE。
示例:
select year,country,product,profit,PERCENT_RANK() over (partition by country order by profit) as ntile3 from testwindow; +------+---------+------------+--------+--------+ | year | country | product | profit | ntile3 | +------+---------+------------+--------+--------+ | 2001 | Finland | Phone | 10 | 0.0 | | 2000 | Finland | Computer | 1500 | 1.0 | | 2001 | USA | Calculator | 50 | 0.0 | | 2001 | USA | Computer | 1500 | 1.0 | | 2000 | Germany | Calculator | 75 | 0.0 | | 2000 | Germany | Calculator | 75 | 0.0 | | 2001 | Germany | Calculator | 79 | 1.0 |
FIRST_VALUE
FIRST_VALUE(x)
命令說明:返回窗口分區第一行的值。
返回值類型:與輸入參數類型相同。
示例:
select year,country,product,profit,first_value(profit) over (partition by country order by profit) as firstValue from testwindow; +------+---------+------------+--------+------------+ | year | country | product | profit | firstValue | +------+---------+------------+--------+------------+ | 2000 | Germany | Calculator | 75 | 75 | | 2000 | Germany | Calculator | 75 | 75 | | 2001 | Germany | Calculator | 79 | 75 | | 2001 | USA | Calculator | 50 | 50 | | 2001 | USA | Computer | 1500 | 50 | | 2001 | Finland | Phone | 10 | 10 | | 2000 | Finland | Computer | 1500 | 10 |
LAST_VALUE
LAST_VALUE(x)
命令說明:返回窗口分區最后一行的值。LAST_VALUE默認統計范圍是 rows between unbounded preceding and current row,即取當前行數據與當前行之前的數據進行比較。如果像FIRST_VALUE那樣直接在每行數據中顯示最后一行數據,需要在 order by 條件的后面加上語句:rows between unbounded preceding and unbounded following。
返回值類型:與輸入參數類型相同。
示例1:
select year,country,product,profit,last_value(profit) over (partition by country order by profit) as firstValue from testwindow; +----------------+-------------------+-------------------+------------------+----------------------+ | year | country | product | profit | firstValue | +----------------+-------------------+-------------------+------------------+----------------------+ | 2001 | USA | Calculator | 50 | 50 | | 2001 | USA | Computer | 1500 | 1500 | | 2001 | Finland | Phone | 10 | 10 | | 2000 | Finland | Computer | 1500 | 1500 | | 2000 | Germany | Calculator | 75 | 75 | | 2000 | Germany | Calculator | 75 | 75 | | 2001 | Germany | Calculator | 79 | 79 |
示例2:
select year,country,product,profit,last_value(profit) over (partition by country order by profitrows between unbounded preceding and unbounded following) as lastValue from testwindow; +------+---------+-----------+--------+-----------+ | year | country | product | profit | lastValue | +------+---------+-----------+--------+-----------+ | 2001 | Finland | Phone | 10 | 1500 | | 2000 | Finland | Computer | 1500 | 1500 | | 2000 | Germany | Calculator| 75 | 79 | | 2000 | Germany | Calculator| 75 | 79 | | 2001 | Germany | Calculator| 79 | 79 | | 2001 | USA | Calculator| 50 | 1500 | | 2001 | USA | Computer | 1500 | 1500 | +------+---------+-----------+--------+-----------+
LAG
LAG(x[, offset[, default_value]])
命令說明:返回窗口內距離當前行之前偏移
offset
后的值。偏移量起始值是
0
,也就是當前數據行。偏移量可以是標量表達式,默認offset
是1
。如果偏移量的值是
null
或者大于窗口長度,則返回default_value
;如果沒有指定default_value
,則返回null
。返回值類型:與輸入參數類型相同。
示例:
select year,country,product,profit,lag(profit) over (partition by country order by profit) as lag from testwindow; +------+---------+------------+--------+------+ | year | country | product | profit | lag | +------+---------+------------+--------+------+ | 2001 | USA | Calculator | 50 | NULL | | 2001 | USA | Computer | 1500 | 50 | | 2000 | Germany | Calculator | 75 | NULL | | 2000 | Germany | Calculator | 75 | 75 | | 2001 | Germany | Calculator | 79 | 75 | | 2001 | Finland | Phone | 10 | NULL | | 2000 | Finland | Computer | 1500 | 10 |
LEAD
LEAD(x[,offset[, default_value]])
命令說明:返回窗口內距離當前行偏移
offset
后的值。偏移量
offset
起始值是0
,也就是當前數據行。偏移量可以是標量表達式,默認offset
是1
。如果偏移量的值是
null
或者大于窗口長度,則返回default_value
;如果沒有指定default_value
,則返回null
。返回值類型:與輸入參數類型相同。
示例:
select year,country,product,profit,lead(profit) over (partition by country order by profit) as lead from testwindow; +------+---------+------------+--------+------+ | year | country | product | profit | lead | +------+---------+------------+--------+------+ | 2000 | Germany | Calculator | 75 | 75 | | 2000 | Germany | Calculator | 75 | 79 | | 2001 | Germany | Calculator | 79 | NULL | | 2001 | Finland | Phone | 10 | 1500 | | 2000 | Finland | Computer | 1500 | NULL | | 2001 | USA | Calculator | 50 | 1500 | | 2001 | USA | Computer | 1500 | NULL |
NTH_VALUE
NTH_VALUE(x, offset)
命令說明:返回窗口內偏移指定
offset
后的值,偏移量從1
開始。如果偏移量
offset
是null
或者大于窗口內值的個數,則返回null
;如果偏移量offset
為0
或者負數,則系統提示報錯。返回值類型:與輸入參數類型相同。
示例:
select year,country,product,profit,nth_value(profit,1) over (partition by country order by profit) as nth_value from testwindow; +------+---------+------------+--------+-----------+ | year | country | product | profit | nth_value | +------+---------+------------+--------+-----------+ | 2001 | Finland | Phone | 10 | 10 | | 2000 | Finland | Computer | 1500 | 10 | | 2001 | USA | Calculator | 50 | 50 | | 2001 | USA | Computer | 1500 | 50 | | 2000 | Germany | Calculator | 75 | 75 | | 2000 | Germany | Calculator | 75 | 75 | | 2001 | Germany | Calculator | 79 | 75 |