對于經常需要對數據進行多維度的聚合分析的場景,您既需要對A列做聚合,也要對B列做聚合,同時要對A、B兩列做聚合,因此需要多次使用union all
。您可以使用grouping sets
快速解決此類問題。本文為您介紹如何使用grouping sets
進行多維聚合。
功能介紹
grouping sets
是對select
語句中group by
子句的擴展,允許您采用多種方式對結果分組,而不必使用多個select
語句再union all
來實現。這樣能夠使MaxCompute的引擎給出更有效的執行計劃,從而提高執行性能。
與grouping sets
相關聯的語法如下。
類型 | 說明 |
| 特殊的
|
| 特殊的
|
|
|
| 接受一個或多個列名作為參數。結果是將參數列的 |
|
說明 MaxCompute和Hive 2.3.0及以上版本兼容該函數,在Hive 2.3.0以下版本中該函數輸出不一致,因此并不推薦您使用此函數。 |
GROUPING SETS使用示例
grouping sets
使用示例如下:
準備數據。
create table requests lifecycle 20 as select * from values (1, 'windows', 'PC', 'Beijing'), (2, 'windows', 'PC', 'Shijiazhuang'), (3, 'linux', 'Phone', 'Beijing'), (4, 'windows', 'PC', 'Beijing'), (5, 'ios', 'Phone', 'Shijiazhuang'), (6, 'linux', 'PC', 'Beijing'), (7, 'windows', 'Phone', 'Shijiazhuang') as t(id, os, device, city);
對數據進行分組。您可以通過如下兩種方式進行分組:
使用多個
select
語句進行分組。select NULL, NULL, NULL, count(*) from requests union all select os, device, NULL, count(*) from requests group by os, device union all select null, null, city, count(*) from requests group by city;
使用
grouping sets
進行分組。select os,device, city ,count(*) from requests group by grouping sets((os, device), (city), ());
返回結果如下:
+------------+------------+------------+------------+ | os | device | city | _c3 | +------------+------------+------------+------------+ | NULL | NULL | NULL | 7 | | NULL | NULL | Beijing | 4 | | NULL | NULL | Shijiazhuang | 3 | | ios | Phone | NULL | 1 | | linux | PC | NULL | 1 | | linux | Phone | NULL | 1 | | windows | PC | NULL | 3 | | windows | Phone | NULL | 1 | +------------+------------+------------+------------+
說明分組集中不使用的表達式,會使用NULL充當占位符,使得這些結果集可以做操作。例如結果第4~8行的city列。
CUBE | ROLLUP使用示例
基于grouping sets
示例表,cube
及rollup
使用示例如下:
示例1:通過
cube
枚舉os、device、city的所有可能列為grouping sets
。命令示例如下:select os,device, city, count(*) from requests group by cube (os, device, city); --等效于如下語句。 select os,device, city, count(*) from requests group by grouping sets ((os, device, city),(os, device),(os, city),(device,city),(os),(device),(city),());
返回結果如下:
+------------+------------+------------+------------+ | os | device | city | _c3 | +------------+------------+------------+------------+ | NULL | NULL | NULL | 7 | | NULL | NULL | Beijing | 4 | | NULL | NULL | Shijiazhuang | 3 | | NULL | PC | NULL | 4 | | NULL | PC | Beijing | 3 | | NULL | PC | Shijiazhuang | 1 | | NULL | Phone | NULL | 3 | | NULL | Phone | Beijing | 1 | | NULL | Phone | Shijiazhuang | 2 | | ios | NULL | NULL | 1 | | ios | NULL | Shijiazhuang | 1 | | ios | Phone | NULL | 1 | | ios | Phone | Shijiazhuang | 1 | | linux | NULL | NULL | 2 | | linux | NULL | Beijing | 2 | | linux | PC | NULL | 1 | | linux | PC | Beijing | 1 | | linux | Phone | NULL | 1 | | linux | Phone | Beijing | 1 | | windows | NULL | NULL | 4 | | windows | NULL | Beijing | 2 | | windows | NULL | Shijiazhuang | 2 | | windows | PC | NULL | 3 | | windows | PC | Beijing | 2 | | windows | PC | Shijiazhuang | 1 | | windows | Phone | NULL | 1 | | windows | Phone | Shijiazhuang | 1 | +------------+------------+------------+------------+
示例2:通過
cube
枚舉(os, device),(device, city)
所有可能列為grouping sets
。命令示例如下:select os,device, city, count(*) from requests group by cube ((os, device), (device, city)); --等效于如下語句。 select os,device, city, count(*) from requests group by grouping sets ((os, device, city),(os, device),(device,city),());
返回結果如下:
+------------+------------+------------+------------+ | os | device | city | _c3 | +------------+------------+------------+------------+ | NULL | NULL | NULL | 7 | | NULL | PC | Beijing | 3 | | NULL | PC | Shijiazhuang | 1 | | NULL | Phone | Beijing | 1 | | NULL | Phone | Shijiazhuang | 2 | | ios | Phone | NULL | 1 | | ios | Phone | Shijiazhuang | 1 | | linux | PC | NULL | 1 | | linux | PC | Beijing | 1 | | linux | Phone | NULL | 1 | | linux | Phone | Beijing | 1 | | windows | PC | NULL | 3 | | windows | PC | Beijing | 2 | | windows | PC | Shijiazhuang | 1 | | windows | Phone | NULL | 1 | | windows | Phone | Shijiazhuang | 1 | +------------+------------+------------+------------+
示例3:通過
rollup
對os、device、city以按層級聚合的方式產生grouping sets
。命令示例如下:select os,device, city, count(*) from requests group by rollup (os, device, city); --等效于如下語句。 select os,device, city, count(*) from requests group by grouping sets ((os, device, city),(os, device),(os),());
返回結果如下:
+------------+------------+------------+------------+ | os | device | city | _c3 | +------------+------------+------------+------------+ | NULL | NULL | NULL | 7 | | ios | NULL | NULL | 1 | | ios | Phone | NULL | 1 | | ios | Phone | Shijiazhuang | 1 | | linux | NULL | NULL | 2 | | linux | PC | NULL | 1 | | linux | PC | Beijing | 1 | | linux | Phone | NULL | 1 | | linux | Phone | Beijing | 1 | | windows | NULL | NULL | 4 | | windows | PC | NULL | 3 | | windows | PC | Beijing | 2 | | windows | PC | Shijiazhuang | 1 | | windows | Phone | NULL | 1 | | windows | Phone | Shijiazhuang | 1 | +------------+------------+------------+------------+
示例4:通過
rollup
對os, (os,device), city
以按層級聚合的方式產生grouping sets
。命令示例如下:select os,device, city, count(*) from requests group by rollup (os, (os,device), city); --等效于如下語句。 select os,device, city, count(*) from requests group by grouping sets ((os, device, city),(os, device),(os),());
返回結果如下:
+------------+------------+------------+------------+ | os | device | city | _c3 | +------------+------------+------------+------------+ | NULL | NULL | NULL | 7 | | ios | NULL | NULL | 1 | | ios | Phone | NULL | 1 | | ios | Phone | Shijiazhuang | 1 | | linux | NULL | NULL | 2 | | linux | PC | NULL | 1 | | linux | PC | Beijing | 1 | | linux | Phone | NULL | 1 | | linux | Phone | Beijing | 1 | | windows | NULL | NULL | 4 | | windows | PC | NULL | 3 | | windows | PC | Beijing | 2 | | windows | PC | Shijiazhuang | 1 | | windows | Phone | NULL | 1 | | windows | Phone | Shijiazhuang | 1 | +------------+------------+------------+------------+
示例5:通過
group by
、cube
、grouping sets
組合產生grouping sets
。命令示例如下:select os,device, city, count(*) from requests group by os, cube(os,device), grouping sets(city); --等效于如下語句。 select os,device, city, count(*) from requests group by grouping sets((os,device,city),(os,city),(os,device,city));
返回結果如下:
+------------+------------+------------+------------+ | os | device | city | _c3 | +------------+------------+------------+------------+ | ios | NULL | Shijiazhuang | 1 | | ios | Phone | Shijiazhuang | 1 | | linux | NULL | Beijing | 2 | | linux | PC | Beijing | 1 | | linux | Phone | Beijing | 1 | | windows | NULL | Beijing | 2 | | windows | NULL | Shijiazhuang | 2 | | windows | PC | Beijing | 2 | | windows | PC | Shijiazhuang | 1 | | windows | Phone | Shijiazhuang | 1 | +------------+------------+------------+------------+
GROUPING | GROUPING_ID使用示例
grouping
及grouping_id
使用示例如下:
select a,b,c,count(*),
grouping(a) ga, grouping(b) gb, grouping(c) gc, grouping_id(a,b,c) groupingid
from values (1,2,3) as t(a,b,c)
group by cube(a,b,c);
返回結果如下:
+------------+------------+------------+------------+------------+------------+------------+------------+
| a | b | c | _c3 | ga | gb | gc | groupingid |
+------------+------------+------------+------------+------------+------------+------------+------------+
| NULL | NULL | NULL | 1 | 1 | 1 | 1 | 7 |
| NULL | NULL | 3 | 1 | 1 | 1 | 0 | 6 |
| NULL | 2 | NULL | 1 | 1 | 0 | 1 | 5 |
| NULL | 2 | 3 | 1 | 1 | 0 | 0 | 4 |
| 1 | NULL | NULL | 1 | 0 | 1 | 1 | 3 |
| 1 | NULL | 3 | 1 | 0 | 1 | 0 | 2 |
| 1 | 2 | NULL | 1 | 0 | 0 | 1 | 1 |
| 1 | 2 | 3 | 1 | 0 | 0 | 0 | 0 |
+------------+------------+------------+------------+------------+------------+------------+------------+
默認情況,group by
列表中不被使用的列,會被填充為NULL。您可以通過grouping
輸出更有實際意義的值。基于grouping sets
示例表,命令示例如下:
select
if(grouping(os) == 0, os, 'ALL') as os,
if(grouping(device) == 0, device, 'ALL') as device,
if(grouping(city) == 0, city, 'ALL') as city,
count(*) as count
from requests
group by os, device, city grouping sets((os, device), (city), ());
返回結果如下:
+------------+------------+------------+------------+
| os | device | city | count |
+------------+------------+------------+------------+
| ALL | ALL | ALL | 7 |
| ALL | ALL | Beijing | 4 |
| ALL | ALL | Shijiazhuang | 3 |
| ios | Phone | ALL | 1 |
| linux | PC | ALL | 1 |
| linux | Phone | ALL | 1 |
| windows | PC | ALL | 3 |
| windows | Phone | ALL | 1 |
+------------+------------+------------+------------+
GROUPING__ID使用示例
grouping__id
不帶參數的命令示例如下:
set odps.sql.hive.compatible=true;
select
a, b, c, count(*), grouping__id
from values (1,2,3) as t(a,b,c)
group by a, b, c grouping sets ((a,b,c), (a));
--等效于如下語句。
select
a, b, c, count(*), grouping_id(a,b,c)
from values (1,2,3) as t(a,b,c)
group by a, b, c grouping sets ((a,b,c), (a));
返回結果如下:
+------------+------------+------------+------------+------------+
| a | b | c | _c3 | _c4 |
+------------+------------+------------+------------+------------+
| 1 | NULL | NULL | 1 | 3 |
| 1 | 2 | 3 | 1 | 0 |
+------------+------------+------------+------------+------------+