日本熟妇hd丰满老熟妇,中文字幕一区二区三区在线不卡 ,亚洲成片在线观看,免费女同在线一区二区

遷移Hive表和分區(qū)數(shù)據(jù)到OSS-HDFS服務(wù)

更新時(shí)間:

本文介紹如何使用JindoTable MoveTo命令將Hive表和分區(qū)數(shù)據(jù)遷移至OSS-HDFS服務(wù)。

前提條件

  • 已創(chuàng)建EMR-3.36.0及以上版本(除3.39.x版本以外)或EMR-5.2.0(除5.5.x版本以外)及以上版本的集群。具體步驟,請參見創(chuàng)建集群

  • 已通過Hive命令創(chuàng)建分區(qū)表,且表中已寫入數(shù)據(jù)。本教程以創(chuàng)建名為test_table的表,分區(qū)名稱為dt,分區(qū)值為value為例。

  • 已開通并授權(quán)訪問OSS-HDFS服務(wù)。具體步驟,請參見EMR集群接入OSS-HDFS服務(wù)快速入門

背景信息

MoveTo命令可以在拷貝底層數(shù)據(jù)結(jié)束后,自動(dòng)更新元數(shù)據(jù),使表和分區(qū)的數(shù)據(jù)完整地遷移到新路徑;可以通過條件篩選,一次拷貝大量分區(qū)。在數(shù)據(jù)遷移過程中,還使用了多種措施保護(hù)數(shù)據(jù)的完整性,確保數(shù)據(jù)安全。

操作步驟

重要

集群上每次僅允許運(yùn)行一個(gè)MoveTo進(jìn)程。如果集群上有正在運(yùn)行的MoveTo進(jìn)程,啟動(dòng)新的MoveTo進(jìn)程時(shí)會(huì)因?yàn)楂@取不到配置鎖而退出,并告知正在運(yùn)行的MoveTo進(jìn)程。此時(shí),您可以終止掉正在運(yùn)行的MoveTo進(jìn)程,啟動(dòng)新的MoveTo進(jìn)程,或者等待正在運(yùn)行的MoveTo進(jìn)程結(jié)束。

  1. 通過SSH方式登錄集群,詳情請參見登錄集群
  2. 執(zhí)行以下命令,獲取幫助信息。

    sudo jindo table -help moveTo

    幫助信息如下所示。

    <dbName.tableName>      The table to move.
    <destination path>      The destination base directory which is always at the
                              same level of a 'table location', where the moved
                              partitions or un-partitioned data would located in.
    <condition>/-fullTable  A filter condition to determine which partitions should
                              be moved, supporting common operators (like '>') and
                              built-in UDFs (like to_date) (UDFs not supported
                              yet...), while -fullTable means that all partitions (or
                              a whole un-partitioned table) should be moved. One but
                              only one option must be specified among -c
                              "<condition>" and -fullTable.
    <before days>           Optional, saying that table/partitions should be moved
                              only when they are created (not updated or modified)
                              more than some days before from now.
    <parallelism>           The maximum concurrency when copying partitions, 1 by
                              default.
    <OSS storage policy>    Storage policy for OSS destination, which can be Standard
                              (by default), IA, Archive, or ColdArchive. Not applicable for destinations other
                              than OSS. NOTE: if you are willing to use ColdArchive storage policy, please
                              make sure that Cold Archive has been enabled for your OSS bucket.
    
    -o/-overWrite           Overwriting the final paths where the data would be moved.
                              For partitioned tables this overwrites partitions locations
                              which are subdirectories of <destination path>; for
                              un-partitioned table this overwrites the <destination path>
                              itself.
    -r/-removeSource        Let the source data be removed when the corresponding
                              table/partition is successfully moved to the new destination.
                              Otherwise (by default), the source data would be left as it
                              was.
    -skipTrash              Applicable only when [-r/-removeSource] is enabled. If
                              present, source data would be immediately deleted from the
                              file system, bypassing the trash.
    -e/-explain             If present, the command would not really move data, but only
                              prints the table/partitions that would be moved for given
                              conditions.
    <log directory>         A directory to locate log files, '/tmp/<current user>/' by
                              default.
    • 命令格式

      sudo jindo table -moveTo \
        -t <dbName.tableName> \
        -d <destination path> \
        [-c "<condition>" | -fullTable] \
        [-b/-before <before days>] \
        [-p/-parallel <parallelism>] \
        [-s/-storagePolicy <OSS storage policy>] \
        [-o/-overWrite] \
        [-r/-removeSource] \
        [-skipTrash] \
        [-e/-explain] \
        [-l/-logDir <log directory>]
    • 命令說明

      參數(shù)

      是否必選

      描述

      -t <dbName.tableName>

      待移動(dòng)的表名稱,格式為數(shù)據(jù)庫名.表名

      數(shù)據(jù)庫和表名之前以半角句號(hào)(.)分隔。表可以是分區(qū)表或非分區(qū)表。

      -d <destination path>

      待移動(dòng)的目標(biāo)位置。無論是移動(dòng)分區(qū)還是移動(dòng)非分區(qū)表的整表,該位置都對應(yīng) "表" 一級(jí)的位置。如果移動(dòng)的是分區(qū),則分區(qū)的完整路徑是該路徑+分區(qū)名。例如<destination path>/p1=v1/p2=v2/

      -c "<condition>" | -fullTable

      兩者必須指定其中一個(gè)。即您可以指定-c "<condition>",或者指定-fullTable

      • 指定-fullTable時(shí),則為移動(dòng)整表,既可以是非分區(qū)表也可以是分區(qū)表。
      • 指定-c "<condition>"時(shí),則提供了一個(gè)過濾條件,用來選擇希望移動(dòng)的分區(qū),支持常見運(yùn)算符,例如大于號(hào)(>)。

        例如,數(shù)據(jù)類型為String的分區(qū)ds,希望分區(qū)名大于 'd',則代碼為-c " ds > 'd' "

      -b/before <before days>

      僅創(chuàng)建時(shí)間距離當(dāng)前時(shí)間超過一定天數(shù)的表或者分區(qū)才會(huì)被移動(dòng)。

      -p/-parallel <parallelism>

      遷移操作的并行度。

      -s/-storagePolicy <OSS storage policy>

      OSS-HDFS服務(wù)不支持該選項(xiàng)。

      -o/-overWrite

      是否強(qiáng)制覆蓋目標(biāo)寫入路徑。如果是分區(qū)表,則只會(huì)清空待移動(dòng)分區(qū)的分區(qū)路徑,不會(huì)清空整個(gè)表路徑。

      -r/-removeSource

      移動(dòng)完成,元數(shù)據(jù)也同步更新后,是否清理源路徑。如果是分區(qū)表,則只會(huì)清理成功移動(dòng)的分區(qū)的源路徑。

      -skipTrash

      清理源路徑時(shí)是否跳過Trash。

      說明

      該選項(xiàng)需與-r/-removeSource選項(xiàng)同時(shí)使用。

      -e/-explain

      如果出現(xiàn)該選項(xiàng),則為解釋(explain )模式,只會(huì)顯示待移動(dòng)的分區(qū)列表,而不會(huì)真正移動(dòng)數(shù)據(jù)。

      -l/-logDir <log directory>

      指定Log文件目錄。

      默認(rèn)值:/tmp/<current user>/

  3. 將分區(qū)數(shù)據(jù)遷移至OSS-HDFS服務(wù)。

    1. 查看待遷移的分區(qū)是否符合預(yù)期。

      結(jié)合-e 選項(xiàng)僅列舉待遷移的分區(qū),但不會(huì)真正執(zhí)行遷移任務(wù)。

      sudo jindotable -moveTo -t tdb.test_table -d oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/data/tdb.test_table -c " dt > 'v' " -e

      返回結(jié)果如下:

      Found 1 partitions to move:
            dt=value-2
      MoveTo finished for table tdb.test_table to destination oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/data/tdb.test_table with condition " dt > 'v' " (explain only).
    2. 將分區(qū)遷移至OSS-HDFS服務(wù)。

      sudo jindotable -moveTo -t tdb.test_table -d oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/data/tdb.test_table  -c " dt > 'v' " 

      返回結(jié)果如下:

      Found 1 partitions in total, and all are successfully moved.
      Successfully moved partitions:
          dt=value-2
      No failed partition.
      MoveTo finished for table tdb.test_table to destination oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/data/tdb.test_table with condition " dt > 'v' ".
    3. 通過查看Location屬性,驗(yàn)證分區(qū)是否成功遷移。

      sudo hive> desc formatted test_table partition (dt='value-2');

      返回結(jié)果如下:

      OK
      # col_name              data_type               comment
      id                      int
      content                 string
      
      # Partition Information
      # col_name              data_type               comment
      dt                      string
      
      # Detailed Partition Information
      Partition Value:        [value-2]
      Database:               tdb
      Table:                  test_table
      CreateTime:             UNKNOWN
      LastAccessTime:         UNKNOWN
      Location:               oss://examplebucket.cn-hangzhou.oss-dls.aliyuncs.com/data/tdb.test_table/dt=value-2
    4. 可選:將分區(qū)從OSS-HDFS遷移至HDFS。

      sudo jindotable -moveTo -t tdb.test_table -d hdfs://<hdfs-path>/user/hive/warehouse/tdb.db/test_table  -c " dt > 'v' "

      返回結(jié)果如下:

      No successfully moved partition.
      Failed partitions:
          dt=value-2    New location is not empty but -overWrite is not enabled.
      MoveTo finished for table tdb.test_table to destination hdfs://<hdfs-path>/user/hive/warehouse/tdb.db/test_table with condition -c " dt > 'v' ".

      返回結(jié)果提示No successfully moved partition.,原因是HDFS目標(biāo)目錄非空。如果確認(rèn)目標(biāo)目錄可以丟棄,您可以使用-overWrite選項(xiàng)強(qiáng)制覆蓋目標(biāo)目錄,確保將分區(qū)從OSS-HDFS遷移至HDFS。

      sudo jindotable -moveTo -t tdb.test_table -d hdfs://<hdfs-path>/user/hive/warehouse/tdb.db/test_table  -c " dt > 'v' "

      遷移成功后,返回結(jié)果如下:

      Found 1 partitions in total, and all are successfully moved.
      Successfully moved partitions:
          dt=value-2
      No failed partition.
      MoveTo finished for table tdb.test_table to destination hdfs:///user/hive/warehouse/tdb.db/test_table with condition " dt > 'v' ", overwriting new locations.

異常處理

如果遷移表或分區(qū)時(shí)遷移失敗并提示Conflicts found,請通過以下方法處理該問題。

  • 確保同一時(shí)間不存在其他命令向相同的目標(biāo)路徑遷移數(shù)據(jù),例如DistCp、JindoDistCp等分布式拷貝命令。

  • 刪除目標(biāo)目錄。對于非分區(qū)表,刪除表一級(jí)目錄。對于分區(qū)表,刪除存在沖突的分區(qū)級(jí)目錄。

  • 請勿刪除源目錄。