欧美成人精品视频在线播放,天天做天天爱天天综合网,欧美午夜成午夜成年片在线观看

PyODPS支持對MaxCompute表的基本操作，包括創建表、創建表的Schema、同步表更新、獲取表數據、刪除表、表分區操作以及如何將表轉換為DataFrame對象。

背景信息

PyODPS提供對MaxCompute表的基本操作方法。

操作	說明
基本操作	列出項目空間下的所有表、判斷表是否存在、獲取表等基本操作。
創建表的Schema	使用PyODPS創建表的Schema。
創建表	使用PyODPS創建表。
同步表更新	使用PyODPS同步表更新。
寫入表數據	使用PyODPS向表中寫入數據。
向表中插入一行記錄	使用PyODPS向表中插入一行記錄。
獲取表數據	使用PyODPS獲取表中數據。
刪除表	使用PyODPS刪除表。
轉換表為DataFrame	使用PyODPS轉換表為DataFrame。
表分區	使用PyODPS判斷是否為分區表、遍歷表全部分區、判斷分區是否存在、創建分區等。
數據上傳下載通道	使用PyODPS操作Tunnel向MaxCompute中上傳或者下載數據。

說明

更多PyODPS方法說明，請參見Python SDK方法說明。

前提條件：準備運行環境

PyODPS支持在DataWorks的PyODPS節點或本地PC環境中運行，運行前您需先選擇運行工具并準備好運行環境。

使用DataWorks：創建好PyODPS 2節點或PyODPS 3節點，詳情請參見通過DataWorks使用PyODPS。
使用本地PC環境：安裝好PyODPS并初始化ODPS入口對象。

基本操作

當前項目內的表操作

列出項目空間下的所有表：
o.list_tables()方法可以列出項目空間下的所有表。
```
for table in o.list_tables():
    print(table)
```
可以通過prefix參數只列舉給定前綴的表：
```
for table in o.list_tables(prefix="table_prefix"):
    print(table.name)
```
通過該方法獲取的 Table 對象不會自動加載表名以外的屬性，此時獲取這些屬性（例如table_schema或者creation_time）可能導致額外的請求并造成額外的時間開銷。如果需要在列舉表的同時讀取這些屬性，在 PyODPS 0.11.5 及后續版本中，可以為list_tables添加extended=True參數：
```
for table in o.list_tables(extended=True):
    print(table.name, table.creation_time)
```
如果需要按類型列舉表，可以指定type參數。不同類型的表列舉方法如下：
```
managed_tables = list(o.list_tables(type="managed_table"))  # 列舉內置表
external_tables = list(o.list_tables(type="external_table"))  # 列舉外表
virtual_views = list(o.list_tables(type="virtual_view"))  # 列舉視圖
materialized_views = list(o.list_tables(type="materialized_view"))  # 列舉物化視圖
```

判斷表是否存在：

o.exist_table()方法可以判斷表是否存在。

print(o.exist_table('pyodps_iris'))
# 返回True表示表pyodps_iris存在。

獲取表：

入口對象的o.get_table()方法可以獲取表。

獲取表的schema信息。

t = o.get_table('pyodps_iris')
print(t.schema)  # 獲取表pyodps_iris的schema

返回值示例如下。

odps.Schema {
  sepallength           double      # 片長度(cm)
  sepalwidth            double      # 片寬度(cm)
  petallength           double      # 瓣長度(cm)
  petalwidth            double      # 瓣寬度(cm)
  name                  string      # 種類
}

獲取表列信息。

t = o.get_table('pyodps_iris')
print(t.schema.columns)  # 獲取表pyodps_iris的schema中的列信息

返回值示例如下。

[<column sepallength, type double>,
 <column sepalwidth, type double>,
 <column petallength, type double>,
 <column petalwidth, type double>,
 <column name, type string>]

獲取表的某個列信息。

t = o.get_table('pyodps_iris')
print(t.schema['sepallength'])  # 獲取表pyodps_iris的sepallength列信息

返回值示例如下。

<column sepallength, type double>

獲取表的某個列的備注信息。

t = o.get_table('pyodps_iris')
print(t.schema['sepallength'].comment)  # 獲取表pyodps_iris的sepallength列的備注信息

返回示例如下。

片長度(cm)

獲取表的生命周期。

t = o.get_table('pyodps_iris')
print(t.lifecycle)  # 獲取表pyodps_iris的生命周期

返回值示例如下。

-1

獲取表的創建時間。

t = o.get_table('pyodps_iris')
print(t.creation_time)  # 獲取表pyodps_iris的創建時間

獲取表是否是虛擬視圖。

t = o.get_table('pyodps_iris')
print(t.is_virtual_view)  # 獲取表pyodps_iris是否是虛擬視圖，返回False，表示不是。

與上述示例類似，您也可以通過t.size、t.comment來獲取表的大小、表備注等信息。

跨項目的表操作

您可以通過project參數，跨項目獲取表。

t = o.get_table('table_name', project='other_project')

其中other_project為所跨的項目，table_name為跨項目獲取的表名稱。

創建表的Schema

初始化方法有如下兩種：

通過表的列以及可選的分區進行初始化。

from odps.models import Schema, Column, Partition
columns = [
    Column(name='num', type='bigint', comment='the column'),
    Column(name='num2', type='double', comment='the column2'),
]
partitions = [Partition(name='pt', type='string', comment='the partition')]
schema = Schema(columns=columns, partitions=partitions)

初始化后，您可獲取字段信息、分區信息等。

獲取所有字段信息。

print(schema.columns)

返回示例如下。

[<column num, type bigint>,
 <column num2, type double>,
 <partition pt, type string>]

獲取分區字段。

print(schema.partitions)

返回示例如下。

[<partition pt, type string>]

獲取非分區字段名稱。
```
print(schema.names)
```
返回示例如下。
```
['num', 'num2']
```
獲取非分區字段類型。
```
print(schema.types)
```
返回示例如下。
```
[bigint, double]
```

使用Schema.from_lists()方法。該方法更容易調用，但無法直接設置列和分區的注釋。

from odps.models import Schema
schema = Schema.from_lists(['num', 'num2'], ['bigint', 'double'], ['pt'], ['string'])
print(schema.columns)

返回值示例如下。

[<column num, type bigint>,
 <column num2, type double>,
 <partition pt, type string>]

創建表

您可以使用o.create_table()方法創建表，使用方式有兩種：使用表Schema方式、使用字段名和字段類型方式。同時創建表時表字段的數據類型有一定的限制條件，詳情如下。

使用表Schema創建表

使用表Schema創建表時，您需要先創建表的Schema，然后通過Schema創建表。

#創建表的schema
from odps.models import Schema
schema = Schema.from_lists(['num', 'num2'], ['bigint', 'double'], ['pt'], ['string'])

#通過schema創建表
table = o.create_table('my_new_table', schema)

#只有不存在表時，才創建表。
table = o.create_table('my_new_table', schema, if_not_exists=True)

#設置生命周期。
table = o.create_table('my_new_table', schema, lifecycle=7)

表創建完成后，您可以通過print(o.exist_table('my_new_table'))驗證表是否創建成功，返回True表示表創建成功。

使用字段名及字段類型創建表

#創建分區表my_new_table，可傳入（表字段列表，分區字段列表）。
table = o.create_table('my_new_table', ('num bigint, num2 double', 'pt string'), if_not_exists=True)

#創建非分區表my_new_table02。
table = o.create_table('my_new_table02', 'num bigint, num2 double', if_not_exists=True)

表創建完成后，您可以通過print(o.exist_table('my_new_table'))驗證表是否創建成功，返回True表示表創建成功。

使用字段名及字段類型創建表：新數據類型

未打開新數據類型開關時（默認關閉），創建表的數據類型只允許為BIGINT、DOUBLE、DECIMAL、STRING、DATETIME、BOOLEAN、MAP和ARRAY類型。如果您需要創建TINYINT和STRUCT等新數據類型字段的表，可以打開options.sql.use_odps2_extension = True開關，示例如下。

from odps import options
options.sql.use_odps2_extension = True
table = o.create_table('my_new_table', 'cat smallint, content struct<title:varchar(100), body:string>')

同步表更新

當一個表被其他程序更新，例如改變了Schema，可以調用reload()方法同步表的更新。

#表schema變更
from odps.models import Schema
schema = Schema.from_lists(['num', 'num2'], ['bigint', 'double'], ['pt'], ['string'])

#通過reload()同步表更新
table = o.create_table('my_new_table', schema)
table.reload()

寫入表數據

使用入口對象的write_table()方法寫入數據。
重要
對于分區表，如果分區不存在，可以使用create_partition參數指定創建分區。
```
records = [[111, 1.0],                 # 此處可以是list。
          [222, 2.0],
          [333, 3.0],
          [444, 4.0]]
o.write_table('my_new_table', records, partition='pt=test', create_partition=True)  #創建pt=test分區并寫入數據
```
說明
- 每次調用write_table()方法，MaxCompute都會在服務端生成一個文件。該操作耗時較長，同時文件過多會降低后續的查詢效率。因此，建議您在使用此方法時，一次性寫入多組數據，或者傳入一個生成器對象。
- 調用write_table()方法向表中寫入數據時會追加到原有數據中。PyODPS不提供覆蓋數據的選項，如果需要覆蓋數據，請手動清除原有數據。對于非分區表，需要調用table.truncate()方法；對于分區表，需要刪除分區后再建立新的分區。

對表對象調用open_writer()方法寫入數據。

t = o.get_table('my_new_table')
with t.open_writer(partition='pt=test02', create_partition=True) as writer:  #創建pt=test02分區并寫入數據
    records = [[1, 1.0],                 # 此處可以是List。
              [2, 2.0],
              [3, 3.0],
              [4, 4.0]]
    writer.write(records)  # 這里Records可以是可迭代對象。

如果是多級分區表，寫入示例如下。

t = o.get_table('test_table')
with t.open_writer(partition='pt1=test1,pt2=test2') as writer:  # 多級分區寫法。
    records = [t.new_record([111, 'aaa', True]),   # 也可以是Record對象。
               t.new_record([222, 'bbb', False]),
               t.new_record([333, 'ccc', True]),
               t.new_record([444, '中文', False])]
    writer.write(records)

使用多進程并行寫數據。

每個進程寫數據時共享同一個Session_ID，但是有不同的Block_ID。每個Block對應服務端的一個文件。主進程執行Commit，完成數據上傳。

import random
from multiprocessing import Pool
from odps.tunnel import TableTunnel
def write_records(tunnel, table, session_id, block_id):
    # 對使用指定的ID創建Session。
    local_session = tunnel.create_upload_session(table.name, upload_id=session_id)
    # 創建Writer時指定Block_ID。
    with local_session.open_record_writer(block_id) as writer:
        for i in range(5):
            # 生成數據并寫入對應Block。
            record = table.new_record([random.randint(1, 100), random.random()])
            writer.write(record)

if __name__ == '__main__':
    N_WORKERS = 3

    table = o.create_table('my_new_table', 'num bigint, num2 double', if_not_exists=True)
    tunnel = TableTunnel(o)
    upload_session = tunnel.create_upload_session(table.name)

    # 每個進程使用同一個Session_ID。
    session_id = upload_session.id

    pool = Pool(processes=N_WORKERS)
    futures = []
    block_ids = []
    for i in range(N_WORKERS):
        futures.append(pool.apply_async(write_records, (tunnel, table, session_id, i)))
        block_ids.append(i)
    [f.get() for f in futures]

    # 最后執行Commit，并指定所有Block。
    upload_session.commit(block_ids)

向表中插入一行記錄

Record表示表的一行記錄，對表對象調用new_record()方法即可創建一個新的Record。

t = o.get_table('test_table')
r = t.new_record(['val0', 'val1'])  # 值的個數必須等于表Schema的字段數。
r2 = t.new_record()     # 可以不傳入值。
r2[0] = 'val0' # 通過偏移設置值。
r2['field1'] = 'val1'  # 通過字段名設置值。
r2.field1 = 'val1'  # 通過屬性設置值。

print(record[0])  # 取第0個位置的值。
print(record['c_double_a'])  # 通過字段取值。
print(record.c_double_a)  # 通過屬性取值。
print(record[0: 3])  # 切片操作。
print(record[0, 2, 3])  # 取多個位置的值。
print(record['c_int_a', 'c_double_a'])  # 通過多個字段取值。

獲取表數據

獲取表數據的方法有多種，常用方法如下：

使用入口對象的read_table()方法。

# 處理一條記錄。
for record in o.read_table('my_new_table', partition='pt=test'):
    print(record)

如果您僅需要查看每個表最開始的小于1萬條數據，可以對表對象調用head()方法。
```
t = o.get_table('my_new_table')
# 處理每個Record對象。
for record in t.head(3):
    print(record)
```

調用open_reader()方法讀取數據。

使用with表達式的寫法如下。

t = o.get_table('my_new_table')
with t.open_reader(partition='pt=test') as reader:
count = reader.count
for record in reader[5:10]:  # 可以執行多次，直到將Count數量的Record讀完，此處可以改造成并行操作。
    print(record)  # 處理一條記錄，例如打印記錄本身

不使用with表達式的寫法如下。

reader = t.open_reader(partition='pt=test')
count = reader.count
for record in reader[5:10]:  # 可以執行多次，直到將Count數量的Record讀完，此處可以改造成并行操作。
    print(record)  # 處理一條記錄，例如打印記錄本身

刪除表

使用delete_table()方法刪除已經存在的表。

o.delete_table('my_table_name', if_exists=True)  # 只有表存在時，才刪除表。
t.drop()  # Table對象存在時，直接調用Drop方法刪除。

轉換表為DataFrame

PyODPS提供了DataFrame框架，支持以更方便的方式查詢和操作MaxCompute數據。使用to_df()方法，即可轉化為DataFrame對象。

table = o.get_table('my_table_name')
df = table.to_df()

表分區

判斷是否為分區表。

table = o.get_table('my_new_table')
if table.schema.partitions:
    print('Table %s is partitioned.' % table.name)

遍歷表全部分區。

table = o.get_table('my_new_table')
for partition in table.partitions:  # 遍歷所有分區
    print(partition.name)  # 具體的遍歷步驟，這里是打印分區名
for partition in table.iterate_partitions(spec='pt=test'):  # 遍歷 pt=test 分區下的二級分區
    print(partition.name)  # 具體的遍歷步驟，這里是打印分區名
for partition in table.iterate_partitions(spec='dt>20230119'):  # 遍歷 dt>20230119 分區下的二級分區
    print(partition.name)  # 具體的遍歷步驟，這里是打印分區名

重要

PyODPS自0.11.3版本開始，支持為iterate_partitions指定邏輯表達式，如上述示例中的dt>20230119。

判斷分區是否存在。

table = o.get_table('my_new_table')
table.exist_partition('pt=test,sub=2015')

獲取分區。

table = o.get_table('my_new_table')
partition = table.get_partition('pt=test')
print(partition.creation_time)
partition.size

創建分區。

t = o.get_table('my_new_table')
t.create_partition('pt=test', if_not_exists=True)  # 指定if_not_exists參數，分區不存在時才創建分區。

刪除分區。

t = o.get_table('my_new_table')
t.delete_partition('pt=test', if_exists=True)  # 自定if_exists參數，分區存在時才刪除分區。
partition.drop()  # 分區對象存在時，直接對分區對象調用Drop方法刪除。

數據上傳下載通道

Tunnel是MaxCompute的數據通道，用戶可以通過Tunnel向MaxCompute中上傳或者下載數據。

上傳數據示例

from odps.tunnel import TableTunnel

table = o.get_table('my_table')

tunnel = TableTunnel(odps)
upload_session = tunnel.create_upload_session(table.name, partition_spec='pt=test')

with upload_session.open_record_writer(0) as writer:
    record = table.new_record()
    record[0] = 'test1'
    record[1] = 'id1'
    writer.write(record)

    record = table.new_record(['test2', 'id2'])
    writer.write(record)

# 需要在 with 代碼塊外 commit，否則數據未寫入即 commit，會導致報錯
upload_session.commit([0])

下載數據示例

from odps.tunnel import TableTunnel

tunnel = TableTunnel(odps)
download_session = tunnel.create_download_session('my_table', partition_spec='pt=test')
# 處理每條記錄。
with download_session.open_record_reader(0, download_session.count) as reader:
    for record in reader:
        print(record)  # 具體的遍歷步驟，這里是打印記錄對象

說明

PyODPS不支持上傳外部表，例如OSS和OTS的表。
不推薦直接使用Tunnel接口，推薦您直接使用對象的寫和讀接口。
如果您安裝了CPython，在安裝PyODPS時會編譯C代碼，加速Tunnel的上傳和下載。

日本熟妇hd丰满老熟妇,中文字幕一区二区三区在线不卡 ,亚洲成片在线观看,免费女同在线一区二区

背景信息

前提條件：準備運行環境

基本操作

當前項目內的表操作

跨項目的表操作

創建表的Schema

創建表

使用表Schema創建表

使用字段名及字段類型創建表

使用字段名及字段類型創建表：新數據類型

同步表更新

寫入表數據

向表中插入一行記錄

獲取表數據

刪除表

轉換表為DataFrame

表分區

數據上傳下載通道