數(shù)據(jù)湖元數(shù)據(jù)管理
Databricks 數(shù)據(jù)洞察DBR 7.3, Spark 3.0.1, Scala 2.12及之后版本,在創(chuàng)建集群選擇元數(shù)據(jù)類(lèi)型時(shí)支持?jǐn)?shù)據(jù)湖元數(shù)據(jù)作為Hive數(shù)據(jù)庫(kù)。數(shù)據(jù)湖元數(shù)據(jù)是服務(wù)化高可用并且可擴(kuò)展的元數(shù)據(jù)庫(kù),您無(wú)需額外購(gòu)買(mǎi)獨(dú)立的元數(shù)據(jù)庫(kù),就可以實(shí)現(xiàn)多個(gè)引擎計(jì)算,例如同時(shí)使用Databricks 數(shù)據(jù)洞察和E-MapReduce。多個(gè)Databricks 數(shù)據(jù)洞察集群可以共享統(tǒng)一元數(shù)據(jù)庫(kù)。
前提條件
已在數(shù)據(jù)湖構(gòu)建(Data Lake Formation)控制臺(tái)開(kāi)通數(shù)據(jù)湖構(gòu)建。
數(shù)據(jù)湖元數(shù)據(jù)產(chǎn)品目前只支持華北2(北京)、華東2(上海)和華東1(杭州)三個(gè)地域。
進(jìn)入RAM訪問(wèn)控制臺(tái)給AliyunDDIAccessingOSSRole角色添加一個(gè)AliyunDDIAccessingDLFRolePolicy自定義策略,策略詳情如下:
{
"Version": "1",
"Statement": [
{
"Action": [
"dlf:BatchCreatePartitions",
"dlf:BatchCreateTables",
"dlf:BatchDeletePartitions",
"dlf:BatchDeleteTables",
"dlf:BatchGetPartitions",
"dlf:BatchGetTables",
"dlf:BatchUpdatePartitions",
"dlf:BatchUpdateTables",
"dlf:CreateDatabase",
"dlf:CreateFunction",
"dlf:CreatePartition",
"dlf:CreateTable",
"dlf:DeleteDatabase",
"dlf:DeleteFunction",
"dlf:DeletePartition",
"dlf:DeleteTable",
"dlf:GetDatabase",
"dlf:GetFunction",
"dlf:GetPartition",
"dlf:GetTable",
"dlf:ListCatalogs",
"dlf:ListDatabases",
"dlf:ListFunctionNames",
"dlf:ListFunctions",
"dlf:ListPartitionNames",
"dlf:ListPartitions",
"dlf:ListPartitionsByExpr",
"dlf:ListPartitionsByFilter",
"dlf:ListTableNames",
"dlf:ListTables",
"dlf:RenamePartition",
"dlf:RenameTable",
"dlf:UpdateDatabase",
"dlf:UpdateFunction",
"dlf:UpdateTable",
"dlf:UpdateTableColumnStatistics",
"dlf:GetTableColumnStatistics",
"dlf:DeleteTableColumnStatistics",
"dlf:UpdatePartitionColumnStatistics",
"dlf:GetPartitionColumnStatistics",
"dlf:DeletePartitionColumnStatistics",
"dlf:BatchGetPartitionColumnStatistics"
],
"Resource": "*",
"Effect": "Allow"
}
]
}
背景信息
數(shù)據(jù)湖元數(shù)據(jù)已適配Databricks 數(shù)據(jù)洞察的Spark SQL。
適用場(chǎng)景
數(shù)據(jù)湖元數(shù)據(jù)具有高可用和易維護(hù)的特點(diǎn),因此適合在如下場(chǎng)景下使用數(shù)據(jù)湖元數(shù)據(jù):
Databricks 數(shù)據(jù)洞察集群的生產(chǎn)環(huán)境,您無(wú)需維護(hù)獨(dú)立的元數(shù)據(jù)庫(kù)。
橫向使用多種大數(shù)據(jù)計(jì)算引擎,例如Databricks 數(shù)據(jù)洞察、MaxCompute、EMR等,元數(shù)據(jù)可以集中管理。
多個(gè)Databricks 數(shù)據(jù)洞察集群,可以統(tǒng)一管理元數(shù)據(jù)。
創(chuàng)建集群
創(chuàng)建Databricks 數(shù)據(jù)洞察集群時(shí),如圖元數(shù)據(jù)選擇為數(shù)據(jù)湖元數(shù)據(jù)方式,創(chuàng)建詳情請(qǐng)參見(jiàn)創(chuàng)建集群。
如果需要遷移數(shù)據(jù)庫(kù)的元數(shù)據(jù)信息,請(qǐng)提交工單處理。