日本熟妇hd丰满老熟妇,中文字幕一区二区三区在线不卡 ,亚洲成片在线观看,免费女同在线一区二区

Apache Impala(CDH6)查詢OSS數(shù)據(jù)

CDHCloudera提供的包含Apache Hadoop核心組件的企業(yè)級大數(shù)據(jù)發(fā)行版,已支持Hadoop 3.0.0。本文將詳解如何配置CDH6環(huán)境下的Hadoop、Hive、Spark、Impala等組件,以實現(xiàn)對接阿里云OSS存儲服務(wù)進行數(shù)據(jù)查詢操作。

前提條件

已搭建CDH6 集群。具體操作,請參見安裝指南。本文以CDH6.0.1版本為例。

步驟一:增加OSS配置

  1. 通過集群管理工具CM來增加配置。

    若沒有CM管理的集群,可以修改core-site.xml。以CM為例,需要增加如下配置:

    配置項

    說明

    fs.oss.endpoint

    填寫需要連接的OSSEndpoint,示例值為oss-cn-zhangjiakou-internal.aliyuncs.com。

    fs.oss.accessKeyId

    填寫訪問密鑰中的AccessKeyId,訪問密鑰要求擁有訪問OSS的權(quán)限。

    fs.oss.accessKeySecret

    填寫訪問密鑰中的AccessKeySecret,訪問密鑰要求擁有訪問OSS的權(quán)限。

    fs.oss.impl

    Hadoop OSS文件系統(tǒng)實現(xiàn)類。固定值為org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem

    fs.oss.buffer.dir

    填寫臨時文件目錄。 建議值為/tmp/oss

    fs.oss.connection.secure.enabled

    是否開啟 HTTPS。開啟HTTPS可能會影響性能。 建議值為false

    fs.oss.connection.maximum

    OSS的連接數(shù)。建議值為2048

    更多參數(shù)解釋,請參見 Hadoop-Aliyun module

  2. 根據(jù)CM提示重啟集群。

  3. 測試讀寫OSS。

    • 測試讀:

      hadoop fs -ls oss://${your-bucket-name}/
    • 測試寫:

      hadoop fs -mkdir oss://${your-bucket-name}/hadoop-test

      若測試可以讀寫OSS,則配置成功。若無法讀寫OSS,請檢查配置。

      說明

      本文中所有 ${} 的內(nèi)容為環(huán)境變量,請根據(jù)您實際的環(huán)境修改。

步驟二:配置Apache Impala

為了使CDH6版本中的Impala能夠支持OSS,您需要手動將OSS的相關(guān)JAR包添加到所有Impala節(jié)點的CLASSPATH環(huán)境變量中。具體操作步驟如下:

  1. 進入${CDH_HOME}/lib/impala目錄,創(chuàng)建符號鏈接:

    [root@cdh-master impala]# cd lib/
    [root@cdh-master lib]# ln -s ../../../jars/hadoop-aliyun-3.0.0-cdh6.0.1.jar hadoop-aliyun.jar
    [root@cdh-master lib]# ln -s ../../../jars/aliyun-sdk-oss-2.8.3.jar aliyun-sdk-oss-2.8.3.jar
    [root@cdh-master lib]# ln -s ../../../jars/jdom-1.1.jar jdom-1.1.jar
  2. 進入${CDH_HOME}/bin目錄,修改impaladstatestoredcatalogd三個文件,在文件最后一行exec命令前增加如下內(nèi)容:

    export CLASSPATH=${CLASSPATH}:${IMPALA_HOME}/lib/hadoop-aliyun.jar:${IMPALA_HOME}/lib/aliyun-sdk-oss-2.8.3.jar:${IMPALA_HOME}/lib/jdom-1.1.jar
  3. 重啟所有節(jié)點的Impala相關(guān)進程。

步驟三:驗證配置

嘗試使用Apache Impala查詢OSS,并且運行TPC-DS的查詢語句。更多信息,請參見Apache Impala介紹TPC-DS查詢

說明

TPC-DS設(shè)計了一系列基于Hive QL的復(fù)雜查詢場景。考慮到Impala SQLHive QL之間具有較高的兼容性,大部分TPC-DS查詢可以直接在Impala上執(zhí)行。然而,由于兩者間仍存在一些細微差異,部分TPC-DS查詢可能因兼容性問題無法直接在Impala環(huán)境中運行。

以下以選取大量能在Impala中順利執(zhí)行的TPC-DS查詢作為實驗案例,通過這些案例來觀察并分析Impala在大規(guī)模數(shù)據(jù)處理上的性能表現(xiàn)。

  1. 生成sample數(shù)據(jù)到OSS。

    [root@cdh-master ~]# git clone https://github.com/hortonworks/hive-testbench.git
    [root@cdh-master ~]# cd hive-testbench
    [root@cdh-master hive-testbench]# git checkout hive14
    [root@cdh-master hive-testbench]# ./tpcds-build.sh
    [root@cdh-master hive-testbench]# FORMAT=textfile ./tpcds-setup.sh 50 oss://{your-bucket-name}/
  2. 建表。

    Benchmark的建表語句與Apache Impala的建表語句兼容。您只需要復(fù)制ddl-tpcds/text/alltables.sql中的建表語句,然后修改${LOCATION}即可。

    [root@cdh-master hive-testbench]# impala-shell -i cdh-slave01 -d default
    Starting Impala Shell without Kerberos authentication
    Connected to cdh-slave01:21000
    Server version: impalad version 3.0.0-cdh6.0.1 RELEASE(build9a74a5053de5f7b8dd983802e6d75e58d31472db)
    ***********************************************************************************
    Welcome to the Impala shell.
    (Impala Shell v3.0.0-cdh6.0.1(9a74a50) built on Wed Sep 1911:27:37 PDT 2018)
    
    Want to know what version of Impala you're connected to? Run the VERSION command to
    find out!
    ***********************************************************************************
    Query: use `default`
    [cdh-slave01:21000] default> create external table call_center(
                               >       cc_call_center_sk         bigint
                               > ,     cc_call_center_id         string
                               > ,     cc_rec_start_date        string
                               > ,     cc_rec_end_date          string
                               > ,     cc_closed_date_sk         bigint
                               > ,     cc_open_date_sk           bigint
                               > ,     cc_name                   string
                               > ,     cc_class                  string
                               > ,     cc_employees              int
                               > ,     cc_sq_ft                  int
                               > ,     cc_hours                  string
                               > ,     cc_manager                string
                               > ,     cc_mkt_id                 int
                               > ,     cc_mkt_class              string
                               > ,     cc_mkt_desc               string
                               > ,     cc_market_manager         string
                               > ,     cc_division               int
                               > ,     cc_division_name          string
                               > ,     cc_company                int
                               > ,     cc_company_name           string
                               > ,     cc_street_number          string
                               > ,     cc_street_name            string
                               > ,     cc_street_type            string
                               > ,     cc_suite_number           string
                               > ,     cc_city                   string
                               > ,     cc_county                 string
                               > ,     cc_state                  string
                               > ,     cc_zip                    string
                               > ,     cc_country                string
                               > ,     cc_gmt_offset             double
                               > ,     cc_tax_percentage         double
                               > )
                               > row format delimited fields terminated by '|'
                               > location 'oss://{your-bucket-name}/50/call_center';
    Query: create external table call_center(
          cc_call_center_sk         bigint
    ,     cc_call_center_id         string
    ,     cc_rec_start_date        string
    ,     cc_rec_end_date          string
    ,     cc_closed_date_sk         bigint
    ,     cc_open_date_sk           bigint
    ,     cc_name                   string
    ,     cc_class                  string
    ,     cc_employees              int
    ,     cc_sq_ft                  int
    ,     cc_hours                  string
    ,     cc_manager                string
    ,     cc_mkt_id                 int
    ,     cc_mkt_class              string
    ,     cc_mkt_desc               string
    ,     cc_market_manager         string
    ,     cc_division               int
    ,     cc_division_name          string
    ,     cc_company                int
    ,     cc_company_name           string
    ,     cc_street_number          string
    ,     cc_street_name            string
    ,     cc_street_type            string
    ,     cc_suite_number           string
    ,     cc_city                   string
    ,     cc_county                 string
    ,     cc_state                  string
    ,     cc_zip                    string
    ,     cc_country                string
    ,     cc_gmt_offset             double
    ,     cc_tax_percentage         double
    )
    row format delimited fields terminated by '|'
    location 'oss://{your-bucket-name}/50/call_center'
    +-------------------------+
    | summary                 |
    +-------------------------+
    | Table has been created. |
    +-------------------------+
    Fetched 1 row(s) in 4.10s
  3. 檢查建表內(nèi)容。

    [cdh-slave01:21000] default> show tables;
    Query: show tables
    +------------------------+
    | name                   |
    +------------------------+
    | call_center            |
    | catalog_page           |
    | catalog_returns        |
    | catalog_sales          |
    | customer               |
    | customer_address       |
    | customer_demographics  |
    | date_dim               |
    | household_demographics |
    | income_band            |
    | inventory              |
    | item                   |
    | promotion              |
    | reason                 |
    | ship_mode              |
    | store                  |
    | store_returns          |
    | store_sales            |
    | time_dim               |
    | warehouse              |
    | web_page               |
    | web_returns            |
    | web_sales              |
    | web_site               |
    +------------------------+
    Fetched 24 row(s) in 0.03s
  4. Impala上執(zhí)行TPC-DS查詢,已經(jīng)可以正常查詢OSS的數(shù)據(jù)。

    1. 進入待查詢的SQL所在的 sample-queries-tpcds目錄。

      [root@cdh-master hive-testbench]# ls sample-queries-tpcds
      query12.sql  query20.sql  query27.sql  query39.sql  query46.sql  query54.sql  query64.sql  query71.sql  query7.sql   query87.sql  query93.sql  README.md
      query13.sql  query21.sql  query28.sql  query3.sql   query48.sql  query55.sql  query65.sql  query72.sql  query80.sql  query88.sql  query94.sql  testbench.settings
      query15.sql  query22.sql  query29.sql  query40.sql  query49.sql  query56.sql  query66.sql  query73.sql  query82.sql  query89.sql  query95.sql  testbench-withATS.settings
      query17.sql  query24.sql  query31.sql  query42.sql  query50.sql  query58.sql  query67.sql  query75.sql  query83.sql  query90.sql  query96.sql
      query18.sql  query25.sql  query32.sql  query43.sql  query51.sql  query60.sql  query68.sql  query76.sql  query84.sql  query91.sql  query97.sql
      query19.sql  query26.sql  query34.sql  query45.sql  query52.sql  query63.sql  query70.sql  query79.sql  query85.sql  query92.sql  query98.sql
    2. 執(zhí)行TPC-DS查詢 query13.sql

      [root@cdh-master hive-testbench]# impala-shell -i cdh-slave01 -d default -f sample-queries-tpcds/query13.sql
      Starting Impala Shell without Kerberos authentication
      Connected to cdh-slave01:21000
      Server version: impalad version 3.0.0-cdh6.0.1 RELEASE(build9a74a5053de5f7b8dd983802e6d75e58d31472db)
      Query: use`default`Query: selectavg(ss_quantity)
             ,avg(ss_ext_sales_price)
             ,avg(ss_ext_wholesale_cost)
             ,sum(ss_ext_wholesale_cost)
       from store_sales
           ,store
           ,customer_demographics
           ,household_demographics
           ,customer_address
           ,date_dim
       where store.s_store_sk = store_sales.ss_store_sk
       and  store_sales.ss_sold_date_sk = date_dim.d_date_sk and date_dim.d_year = 2001and((store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk
        and customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk
        and customer_demographics.cd_marital_status = 'M'and customer_demographics.cd_education_status = '4 yr Degree'and store_sales.ss_sales_price between100.00and150.00and household_demographics.hd_dep_count = 3
           )or
          (store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk
        and customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk
        and customer_demographics.cd_marital_status = 'D'and customer_demographics.cd_education_status = 'Primary'and store_sales.ss_sales_price between50.00and100.00and household_demographics.hd_dep_count = 1
           ) or
          (store_sales.ss_hdemo_sk=household_demographics.hd_demo_sk
        and customer_demographics.cd_demo_sk = ss_cdemo_sk
        and customer_demographics.cd_marital_status = 'U'and customer_demographics.cd_education_status = 'Advanced Degree'and store_sales.ss_sales_price between150.00and200.00and household_demographics.hd_dep_count = 1
           ))
       and((store_sales.ss_addr_sk = customer_address.ca_address_sk
        and customer_address.ca_country = 'United States'and customer_address.ca_state in('KY', 'GA', 'NM')
        and store_sales.ss_net_profit between100and200
           ) or
          (store_sales.ss_addr_sk = customer_address.ca_address_sk
        and customer_address.ca_country = 'United States'and customer_address.ca_state in('MT', 'OR', 'IN')
        and store_sales.ss_net_profit between150and300
           ) or
          (store_sales.ss_addr_sk = customer_address.ca_address_sk
        and customer_address.ca_country = 'United States'and customer_address.ca_state in('WI', 'MO', 'WV')
        and store_sales.ss_net_profit between50and250
           ))
      Query submitted at: 2018-10-3011:44:47(Coordinator: http://cdh-slave01:25000)
      Query progress can be monitored at: http://cdh-slave01:25000/query_plan?query_id=ff4b3157eddfc3c4:8a31c6500000000
      +-------------------+-------------------------+----------------------------+----------------------------+
      | avg(ss_quantity)  | avg(ss_ext_sales_price) | avg(ss_ext_wholesale_cost) | sum(ss_ext_wholesale_cost) |
      +-------------------+-------------------------+----------------------------+----------------------------+
      | 30.87106918238994 | 2352.642327044025       | 2162.600911949685          | 687707.09                  |
      +-------------------+-------------------------+----------------------------+----------------------------+
      Fetched 1 row(s) in 353.16s

    本次查詢涉及到6張表:storestore_salescustomer_demographicshousehold_demographicscustomer_addressdate_dim

    [cdh-slave01:21000] default> showtable STATS store;
    Query: showtable STATS store
    +-------+--------+---------+--------------+-------------------+--------+-------------------+-------------------------------------------+
    | #Rows | #Files | Size    | Bytes Cached | CacheReplication | Format | Incremental stats | Location                                  |
    +-------+--------+---------+--------------+-------------------+--------+-------------------+-------------------------------------------+
    | -1    | 1      | 37.56KB | NOT CACHED   | NOT CACHED        | TEXT   | false             | oss://{your-bucket-name}/50/store |
    +-------+--------+---------+--------------+-------------------+--------+-------------------+-------------------------------------------+
    Fetched 1 row(s) in 0.01s
    [cdh-slave01:21000] default> showtable STATS store_sales;
    Query: showtable STATS store_sales
    +-------+--------+---------+--------------+-------------------+--------+-------------------+-------------------------------------------------+
    | #Rows | #Files | Size    | Bytes Cached | CacheReplication | Format | Incremental stats | Location                                        |
    +-------+--------+---------+--------------+-------------------+--------+-------------------+-------------------------------------------------+
    | -1    | 50     | 18.75GB | NOT CACHED   | NOT CACHED        | TEXT   | false             | oss://{your-bucket-name}/50/store_sales |
    +-------+--------+---------+--------------+-------------------+--------+-------------------+-------------------------------------------------+
    Fetched 1 row(s) in 0.01s
    [cdh-slave01:21000] default> showtable STATS customer_demographics;
    Query: showtable STATS customer_demographics
    +-------+--------+---------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+
    | #Rows | #Files | Size    | Bytes Cached | CacheReplication | Format | Incremental stats | Location                                                  |
    +-------+--------+---------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+
    | -1    | 50     | 76.92MB | NOT CACHED   | NOT CACHED        | TEXT   | false             | oss://{your-bucket-name}/50/customer_demographics |
    +-------+--------+---------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+
    Fetched 1 row(s) in 0.01s
    [cdh-slave01:21000] default> showtable STATS household_demographics;
    Query: showtable STATS household_demographics
    +-------+--------+----------+--------------+-------------------+--------+-------------------+------------------------------------------------------------+
    | #Rows | #Files | Size     | Bytes Cached | CacheReplication | Format | Incremental stats | Location                                                   |
    +-------+--------+----------+--------------+-------------------+--------+-------------------+------------------------------------------------------------+
    | -1    | 1      | 148.10KB | NOT CACHED   | NOT CACHED        | TEXT   | false             | oss://{your-bucket-name}/50/household_demographics |
    +-------+--------+----------+--------------+-------------------+--------+-------------------+------------------------------------------------------------+
    Fetched 1 row(s) in 0.01s
    [cdh-slave01:21000] default> showtable STATS customer_address;
    Query: showtable STATS customer_address
    +-------+--------+---------+--------------+-------------------+--------+-------------------+------------------------------------------------------+
    | #Rows | #Files | Size    | Bytes Cached | CacheReplication | Format | Incremental stats | Location                                             |
    +-------+--------+---------+--------------+-------------------+--------+-------------------+------------------------------------------------------+
    | -1    | 1      | 40.54MB | NOT CACHED   | NOT CACHED        | TEXT   | false             | oss://{your-bucket-name}/50/customer_address |
    +-------+--------+---------+--------------+-------------------+--------+-------------------+------------------------------------------------------+
    Fetched 1 row(s) in 0.01s
    [cdh-slave01:21000] default> showtable STATS date_dim;
    Query: showtable STATS date_dim
    +-------+--------+--------+--------------+-------------------+--------+-------------------+----------------------------------------------+
    | #Rows | #Files | Size   | Bytes Cached | CacheReplication | Format | Incremental stats | Location                                     |
    +-------+--------+--------+--------------+-------------------+--------+-------------------+----------------------------------------------+
    | -1    | 1      | 9.84MB | NOT CACHED   | NOT CACHED        | TEXT   | false             | oss://{your-bucket-name}/50/date_dim |
    +-------+--------+--------+--------------+-------------------+--------+-------------------+----------------------------------------------+
    Fetched 1 row(s) in 0.01s

相關(guān)文檔