本文介紹如何通過Terraform管理Prometheus Monitoring(包括ServiceMonitor、PodMonitor、自定義Job和健康巡檢Probe)配置。
前提條件
已創建Prometheus for容器服務或for ECS實例。具體操作,請參見使用Terraform管理Prometheus實例。
安裝Terraform。
Cloud Shell默認安裝配置了Terraform和阿里云賬號信息,您無需執行任何額外配置。
如果您不使用Cloud Shell,關于安裝Terraform的具體操作,請參見在本地安裝和配置Terraform。
說明請確認Terraform版本不低于v0.12.28,可通過terraform --version命令查看Terraform版本。
資源編排服務為Terraform提供了托管的能力,您可以創建Terraform類型的模板,定義阿里云、AWS或Azure資源,配置資源參數和資源間的依賴關系。更多信息,請參見創建Terraform類型模板、創建Terraform類型資源棧。
配置阿里云賬號信息。有以下兩種方式:
方式一:創建環境變量,用于存放身份認證信息。
export ALICLOUD_ACCESS_KEY="************" export ALICLOUD_SECRET_KEY="************" export ALICLOUD_REGION="cn-beijing"
說明其中,
export ALICLOUD_REGION
參數的值需要您根據實際情況進行替換。方式二:通過在配置文件的Provider代碼塊中指定身份認證信息。
provider "alicloud" { access_key = "************" secret_key = "************" region = "cn-beijing" }
說明其中,
export ALICLOUD_REGION
參數的值需要您根據實際情況進行替換。
使用限制
對于Prometheus for 容器服務實例:支持ServiceMonitor、PodMonitor、自定義Job和健康巡檢Probe。
對于Prometheus for ECS實例:由于實例類型限制,僅支持自定義Job和健康巡檢Probe。
健康巡檢Probe:
暫不支持狀態(Status)設置。
Probe名稱的命名規則:
自定義名-{tcp/http/ping}-blackbox
,例如TCP類型巡檢為xxx-tcp-blackbox
。對于Prometheus for ECS實例,由于是全托管實例,故Probe命名空間必須為空或固定值(
vpcId-userId
,例如vpc-0jl4q1q2of2tagvwxxxx-11032353609xxxx
)。
新增Prometheus實例Monitoring
新增ServiceMonitor
創建一個工作目錄,并在工作目錄中創建名為main.tf的配置文件。
provider "alicloud" { }
執行以下命令,初始化Terraform運行環境。
terraform init
預期輸出:
Initializing the backend... Initializing provider plugins... - Checking for available provider plugins... - Downloading plugin for provider "alicloud" (hashicorp/alicloud) 1.90.1... ... You may now begin working with Terraform. Try running "terraform plan" to see any changes that are required for your infrastructure. All Terraform commands should now work. If you ever set or change modules or backend configuration for Terraform, rerun this command to reinitialize your working directory. If you forget, other commands will detect it and remind you to do so if necessary.
導入Monitoring資源。
將Monitoring資源添加到main.tf文件中。
#Prometheus實例的ServiceMonitor配置。 resource "alicloud_arms_prometheus_monitoring" "myServiceMonitor1" { cluster_id = "c77e1106f429e4b46b0ee1720cxxxxx" #Prometheus實例Id status = "run" #serviceMonitor的狀態 type = "serviceMonitor" config_yaml = <<-EOT apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: tomcat-demo #serviceMonitor名稱 namespace: default #serviceMonitor所在的命名空間 spec: endpoints: - interval: 30s #指標抓取間隔(秒) path: /metrics #指標抓取路徑 port: tomcat-monitor #指標抓取端口名 namespaceSelector: any: true #service命名空間選擇配置 selector: matchLabels: app: tomcat #service label選擇配置 EOT }
執行以下命令,生成資源規劃。
terraform plan
預期輸出:
Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols: + create Terraform will perform the following actions: # alicloud_arms_prometheus_monitoring.myServiceMonitor1 will be created + resource "alicloud_arms_prometheus_monitoring" "myServiceMonitor1" { + cluster_id = "c77e1106f429e4b46b0ee1720cxxxxx" + id = (known after apply) + monitoring_name = (known after apply) + status = "run" + type = "serviceMonitor" + config_yaml = <<-EOT apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: tomcat-demo namespace: default spec: endpoints: - interval: 30s path: /metrics port: tomcat-monitor namespaceSelector: any: true selector: matchLabels: app: tomcat EOT } Plan: 1 to add, 0 to change, 0 to destroy.
執行以下命令,創建ServiceMonitor。
terraform apply
預期輸出:
Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols: + create Terraform will perform the following actions: # alicloud_arms_prometheus_monitoring.myServiceMonitor1 will be created + resource "alicloud_arms_prometheus_monitoring" "myServiceMonitor1" { + cluster_id = "c77e1106f429e4b46b0ee1720c9xxxxx" + id = (known after apply) + monitoring_name = (known after apply) + status = "run" + type = "serviceMonitor" + config_yaml = <<-EOT apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: tomcat-demo namespace: default spec: endpoints: - interval: 30s path: /metrics port: tomcat-monitor namespaceSelector: any: true selector: matchLabels: app: tomcat EOT } Plan: 1 to add, 0 to change, 0 to destroy. Do you want to perform these actions? Terraform will perform the actions described above. Only 'yes' will be accepted to approve. Enter a value: yes
若結果輸出出現
yes
,表示當前Prometheus實例的ServiceMonitor配置創建成功。
結果驗證
您可以登錄可觀測監控 Prometheus 版控制臺,然后在Prometheus實例的集成中心頁面,查看已成功創建的ServiceMonitor配置。具體操作如下:
登錄ARMS控制臺。
在左側導航欄選擇 ,進入可觀測監控 Prometheus 版的實例列表頁面。
- 單擊目標Prometheus實例名稱,進入集成中心頁面。
單擊已安裝區域的自定義組件卡片,然后在彈出的面板中單擊服務發現配置頁簽,查看已成功創建的ServiceMonitor配置。
新增PodMonitor
創建一個工作目錄,并在工作目錄中創建名為main.tf的配置文件。
provider "alicloud" { }
執行以下命令,初始化Terraform運行環境。
terraform init
預期輸出:
Initializing the backend... Initializing provider plugins... - Checking for available provider plugins... - Downloading plugin for provider "alicloud" (hashicorp/alicloud) 1.90.1... ... You may now begin working with Terraform. Try running "terraform plan" to see any changes that are required for your infrastructure. All Terraform commands should now work. If you ever set or change modules or backend configuration for Terraform, rerun this command to reinitialize your working directory. If you forget, other commands will detect it and remind you to do so if necessary.
導入Monitoring資源。
將Monitoring資源添加到main.tf文件中。
#Prometheus實例的PodMonitor配置。 resource "alicloud_arms_prometheus_monitoring" "myPodMonitor1" { cluster_id = "c77e1106f429e4b46b0ee1720cxxxxx" #Prometheus實例Id status = "run" #podMonitor的狀態 type = "podMonitor" config_yaml = <<-EOT apiVersion: "monitoring.coreos.com/v1" kind: "PodMonitor" metadata: name: "podmonitor-demo" #podMonitor名稱 namespace: "default" #podMonitor所在的命名空間 spec: namespaceSelector: any: true #pod命名空間選擇配置 podMetricsEndpoints: - interval: "30s" #指標抓取間隔(秒) path: "/metrics" #指標抓取路徑 port: "tomcat-monitor" #指標抓取端口名 selector: matchLabels: app: "nginx2-exporter" #pod label選擇配置 EOT }
執行以下命令,生成資源規劃。
terraform plan
預期輸出:
Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols: + create Terraform will perform the following actions: # alicloud_arms_prometheus_monitoring.myPodMonitor1 will be created + resource "alicloud_arms_prometheus_monitoring" "myPodMonitor1" { + cluster_id = "c77e1106f429e4b46b0ee1720cxxxxx" + id = (known after apply) + monitoring_name = (known after apply) + status = "run" + type = "podMonitor" + config_yaml = <<-EOT apiVersion: "monitoring.coreos.com/v1" kind: "PodMonitor" metadata: name: "podmonitor-demo" namespace: "default" spec: namespaceSelector: any: true podMetricsEndpoints: - interval: "30s" path: "/metrics" port: "tomcat-monitor" selector: matchLabels: app: "nginx2-exporter" EOT } Plan: 1 to add, 0 to change, 0 to destroy.
執行以下命令,創建PodMonitor。
terraform apply
預期輸出:
Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols: + create Terraform will perform the following actions: # alicloud_arms_prometheus_monitoring.myPodMonitor1 will be created + resource "alicloud_arms_prometheus_monitoring" "myPodMonitor1" { + cluster_id = "c77e1106f429e4b46b0ee1720c9xxxxx" + id = (known after apply) + monitoring_name = (known after apply) + status = "run" + type = "podMonitor" + config_yaml = <<-EOT apiVersion: "monitoring.coreos.com/v1" kind: "PodMonitor" metadata: name: "podmonitor-demo" namespace: "default" spec: namespaceSelector: any: true podMetricsEndpoints: - interval: "30s" path: "/metrics" port: "tomcat-monitor" selector: matchLabels: app: "nginx2-exporter" EOT } Plan: 1 to add, 0 to change, 0 to destroy. Do you want to perform these actions? Terraform will perform the actions described above. Only 'yes' will be accepted to approve. Enter a value: yes
若結果輸出出現
yes
,表示當前Prometheus實例的PodMonitor配置創建成功。
結果驗證
您可以登錄可觀測監控 Prometheus 版控制臺,然后在Prometheus實例的集成中心頁面,查看已成功創建的PodMonitor配置。具體操作如下:
登錄ARMS控制臺。
在左側導航欄選擇 ,進入可觀測監控 Prometheus 版的實例列表頁面。
- 單擊目標Prometheus實例名稱,進入集成中心頁面。
單擊已安裝區域的自定義組件卡片,然后在彈出的面板中單擊服務發現配置頁簽,查看已成功創建的PodMonitor配置。
新增自定義Job(CustomJob)
創建一個工作目錄,并在工作目錄中創建名為main.tf的配置文件。
provider "alicloud" { }
執行以下命令,初始化Terraform運行環境。
terraform init
預期輸出:
Initializing the backend... Initializing provider plugins... - Checking for available provider plugins... - Downloading plugin for provider "alicloud" (hashicorp/alicloud) 1.90.1... ... You may now begin working with Terraform. Try running "terraform plan" to see any changes that are required for your infrastructure. All Terraform commands should now work. If you ever set or change modules or backend configuration for Terraform, rerun this command to reinitialize your working directory. If you forget, other commands will detect it and remind you to do so if necessary.
導入Monitoring資源。
將Monitoring資源添加到main.tf文件中。
#Prometheus實例的自定義Job配置。 resource "alicloud_arms_prometheus_monitoring" "myCustomJob1" { cluster_id = "c77e1106f429e4b46b0ee1720cxxxxx" #Prometheus實例Id status = "run" #customJob的狀態 type = "customJob" config_yaml = <<-EOT scrape_configs: - job_name: prometheus1 #customJob名稱 honor_timestamps: false honor_labels: false scheme: http metrics_path: /metric static_configs: - targets: - 127.0.0.1:9090 EOT }
執行以下命令,生成資源規劃。
terraform plan
預期輸出:
Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols: + create Terraform will perform the following actions: # alicloud_arms_prometheus_monitoring.myCustomJob1 will be created + resource "alicloud_arms_prometheus_monitoring" "myCustomJob1" { + cluster_id = "c77e1106f429e4b46b0ee1720cxxxxx" + id = (known after apply) + monitoring_name = (known after apply) + status = "run" + type = "customJob" + config_yaml = <<-EOT scrape_configs: - job_name: prometheus1 honor_timestamps: false honor_labels: false scheme: http metrics_path: /metric static_configs: - targets: - 127.0.0.1:9090 EOT } Plan: 1 to add, 0 to change, 0 to destroy.
執行以下命令,創建自定義Job。
terraform apply
預期輸出:
Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols: + create Terraform will perform the following actions: # alicloud_arms_prometheus_monitoring.myCustomJob1 will be created + resource "alicloud_arms_prometheus_monitoring" "myCustomJob1" { + cluster_id = "c77e1106f429e4b46b0ee1720c9xxxxx" + id = (known after apply) + monitoring_name = (known after apply) + status = "run" + type = "customJob" + config_yaml = <<-EOT scrape_configs: - job_name: prometheus1 honor_timestamps: false honor_labels: false scheme: http metrics_path: /metric static_configs: - targets: - 127.0.0.1:9090 EOT } Plan: 1 to add, 0 to change, 0 to destroy. Do you want to perform these actions? Terraform will perform the actions described above. Only 'yes' will be accepted to approve. Enter a value: yes
若結果輸出出現
yes
,表示當前Prometheus實例的自定義Job配置創建成功。
結果驗證
您可以登錄可觀測監控 Prometheus 版控制臺,然后在Prometheus實例的集成中心頁面,查看已成功創建自定義Job配置。具體操作如下:
登錄ARMS控制臺。
在左側導航欄選擇 ,進入可觀測監控 Prometheus 版的實例列表頁面。
- 單擊目標Prometheus實例名稱,進入集成中心頁面。
單擊已安裝區域的自定義組件卡片,然后在彈出的面板中單擊服務發現配置頁簽,查看已成功創建的自定義Job配置。
新增健康巡檢Probe
創建一個工作目錄,并在工作目錄中創建名為main.tf的配置文件。
provider "alicloud" { }
執行以下命令,初始化Terraform運行環境。
terraform init
預期輸出:
Initializing the backend... Initializing provider plugins... - Checking for available provider plugins... - Downloading plugin for provider "alicloud" (hashicorp/alicloud) 1.90.1... ... You may now begin working with Terraform. Try running "terraform plan" to see any changes that are required for your infrastructure. All Terraform commands should now work. If you ever set or change modules or backend configuration for Terraform, rerun this command to reinitialize your working directory. If you forget, other commands will detect it and remind you to do so if necessary.
導入Monitoring資源。
將Monitoring資源添加到main.tf文件中。
#Prometheus實例的Probe配置。 resource "alicloud_arms_prometheus_monitoring" "myProbe1" { cluster_id = "c77e1106f429e4b46b0ee1720cxxxxx" #Prometheus實例Id type = "probe" config_yaml = <<-EOT apiVersion: monitoring.coreos.com/v1 kind: Probe metadata: name: name1-tcp-blackbox #健康巡檢名稱,規則:xxx-{tcp/http/ping}-blackbox namespace: arms-prom #可選 spec: interval: 30s #健康巡檢間隔 jobName: blackbox #固定值 module: tcp_connect prober: #prober配置,固定值 path: /blackbox/probe scheme: http url: 'localhost:9335' targets: staticConfig: static: - 'arms-prom-admin.arms-prom:9335' #健康巡檢目標地址 EOT }
執行以下命令,生成資源規劃。
terraform plan
預期輸出:
Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols: + create Terraform will perform the following actions: # alicloud_arms_prometheus_monitoring.myProbe1 will be created + resource "alicloud_arms_prometheus_monitoring" "myProbe1" { + cluster_id = "c77e1106f429e4b46b0ee1720cxxxxx" + id = (known after apply) + monitoring_name = (known after apply) + type = "probe" + config_yaml = <<-EOT apiVersion: monitoring.coreos.com/v1 kind: Probe metadata: name: name1-tcp-blackbox namespace: arms-prom spec: interval: 30s jobName: blackbox module: tcp_connect prober: path: /blackbox/probe scheme: http url: 'localhost:9335' targets: staticConfig: static: - 'arms-prom-admin.arms-prom:9335' EOT } Plan: 1 to add, 0 to change, 0 to destroy. Plan: 1 to add, 0 to change, 0 to destroy.
執行以下命令,創建健康巡檢Probe。
terraform apply
預期輸出:
Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols: + create Terraform will perform the following actions: # alicloud_arms_prometheus_monitoring.myProbe1 will be created + resource "alicloud_arms_prometheus_monitoring" "myProbe1" { + cluster_id = "c77e1106f429e4b46b0ee1720c9xxxxx" + id = (known after apply) + monitoring_name = (known after apply) + type = "probe" + config_yaml = <<-EOT apiVersion: monitoring.coreos.com/v1 kind: Probe metadata: name: name1-tcp-blackbox namespace: arms-prom spec: interval: 30s jobName: blackbox module: tcp_connect prober: path: /blackbox/probe scheme: http url: 'localhost:9335' targets: staticConfig: static: - 'arms-prom-admin.arms-prom:9335' EOT } Plan: 1 to add, 0 to change, 0 to destroy. Do you want to perform these actions? Terraform will perform the actions described above. Only 'yes' will be accepted to approve. Enter a value: yes
若結果輸出出現
yes
,表示當前Prometheus實例的健康巡檢Probe配置創建成功。
結果驗證
您可以登錄可觀測監控 Prometheus 版控制臺,然后在Prometheus實例的集成中心頁面,查看已成功創建的健康巡檢Probe配置。具體操作如下:
登錄ARMS控制臺。
在左側導航欄選擇 ,進入可觀測監控 Prometheus 版的實例列表頁面。
- 單擊目標Prometheus實例名稱,進入集成中心頁面。
單擊已安裝區域的健康巡檢組件卡片,然后在巡檢頁簽,查看已成功創建的健康巡檢Probe配置。
刪除Prometheus實例Monitoring
操作步驟
您可以執行以下命令刪除通過Terraform創建的集群。
terraform destroy
預期輸出
...
Do you really want to destroy all resources?
Terraform will destroy all your managed infrastructure, as shown above.
There is no undo. Only 'yes' will be accepted to confirm.
Enter a value: yes
...
Destroy complete! Resources: 1 destroyed.
結果驗證
您可以登錄可觀測監控 Prometheus 版控制臺,然后在Prometheus實例的集成中心頁面,查看Monitoring配置已被成功刪除。
登錄ARMS控制臺。
在左側導航欄選擇 ,進入可觀測監控 Prometheus 版的實例列表頁面。
- 單擊目標Prometheus實例名稱,進入集成中心頁面。
單擊已安裝區域的自定義/健康巡檢組件卡片,然后在服務發現配置/巡檢頁簽,您可以看到已不存在目標Monitoring配置信息,表示該Monitoring配置已被成功刪除。