国产精品视频一区,久久e热在这里只有国产中文精品99,91亚洲中文字幕在线播放

select transform語法允許您啟動一個指定的子進程，將輸入數據按照一定的格式通過標準輸入至子進程，并且通過解析子進程的標準輸出獲取輸出數據。select transform讓您無需編寫UDF，即可實現MaxCompute SQL對其他腳本語言的支持。

功能介紹

select transform與UDTF在不同場景下的性能不同。經過多種場景對比測試，數據量較小時，大多數場景下select transform有優勢，而數據量大時UDTF有優勢。select transform的開發更加簡便，更適合于Ad Hoc（即席查詢）。

select transform不僅僅是語言支持的擴展。一些簡單的功能，例如AWK、Python、Perl、Shell都支持直接在命令中寫腳本，而不需要專門編寫腳本文件、上傳資源等，開發過程更簡單。對于復雜的功能，您可以上傳腳本文件來執行，請參見調用Python腳本使用示例、調用Java腳本使用示例。

UDTF與select transform的優勢對比如下。

類型	`select transform`	UDTF
數據類型	子進程基于標準輸入和標準輸出傳輸數據，所有數據都當做STRING類型處理，因此`select transform`比UDTF多了一步類型轉換。	UDTF的輸出結果和輸入參數支持多種數據類型。
數據傳輸	數據傳輸依賴于操作系統的管道，而管道的緩存僅有4 KB且不能設置。`select transform`讀空或管道寫滿會導致進程被掛起。數據的傳輸通過更底層的系統調用來讀寫，效率比Java高。	無管道緩存限制。
常量參數傳輸	常量參數需要傳輸。	常量參數可以不用傳輸。
線程	子進程和父進程是兩個進程，如果計算占比較高，數據吞吐量較小，`select transform`可以利用服務器的多核特性。	單線程。
性能	`select transform`支持的部分工具，例如AWK是Native代碼實現的。理論上，與Java相比，使用`select transform`會更有性能優勢。	性能不高。

使用限制

由于MaxCompute計算集群上未部署PHP和Ruby，所以不支持調用這兩種腳本。

命令格式

select transform(<arg1>, <arg2> ...) 
[(row format delimited (fields terminated by <field_delimiter> (escaped by <character_escape>)) (null defined as <null_value>))]
using '<unix_command_line>' 
(resources '<res_name>' （',' '<res_name>'）*)
[(as <col1>, <col2> ...)]
(row format delimited (fields terminated by <field_delimiter> (escaped by <character_escape>)) (null defined as <null_value>))

select transform關鍵字：必填。可以用map或reduce關鍵字替換，語義是完全一樣的。為使語法更清晰，推薦您使用select transform。
arg1,arg2...：必填。指定輸入數據。其格式和select語句類似。默認格式下，參數的各個表達式結果在隱式轉換成STRING類型后，用\t拼接，輸入到子進程中。
row format子句：可選。允許自定義輸入輸出的格式。
語法中有兩個row format子句，第一個子句指定輸入數據的格式，第二個子句指定輸出數據的格式。默認情況下使用\t作為列分隔符，\n作為行分隔符，使用\N表示NULL。
說明
- field_delimiter、character_escape只接受一個字符。如果指定的是字符串，則以第一個字符為準。
- MaxCompute支持Hive指定格式的語法，例如inputRecordReader、outputRecordReader、SerDe等，但您需要打開Hive兼容模式才能使用。打開方式為在SQL語句前加set語句set odps.sql.hive.compatible=true;。Hive支持的語法詳情請參見Hive文檔。
- 如果使用Hive的inputRecordReader、outputRecordReader等自定義類，可能會降低執行性能。
using子句：必填。指定要啟動的子進程的命令。
- 大多數的MaxCompute SQL命令中using子句指定的是資源（Resources），但此處使用using子句指定啟動子進程的命令。使用using子句是為了和Hive的語法兼容。
- using子句的格式和Shell語法類似，但并非真的啟動Shell來執行，而是直接根據命令的內容創建子進程。因此，很多Shell的功能不能使用，例如輸入輸出重定向、管道、循環等。如果有需要，Shell本身也可以作為子進程命令來使用。
resources子句：可選。允許指定子進程能夠訪問的資源，支持以下兩種方式指定資源：
- 使用resources子句指定資源。例如using 'sh foo.sh bar.txt' resources 'foo.sh','bar.txt'。
- 使用MaxCompute屬性指定資源。在SQL語句前使用set odps.sql.session.resources=foo.sh,bar.txt;來指定資源。
  此配置是全局配置，即整個SQL中所有的select transform都可以訪問此資源。多個資源文件之間使用英文逗號（,）分隔。
as子句：可選。指定輸出列。例如as(col1 bigint, col2 boolean)。
- 輸出列可以不指定數據類型，默認為STRING類型。例如as(col1, col2)。
- 由于輸出數據實際是解析子進程標準輸出獲取的，如果指定的數據不是STRING類型，系統會隱式調用cast函數進行轉換，轉換過程有可能出現運行異常。
- 輸出列的數據類型不支持部分指定，例如as(col1, col2 bigint)。
- 關鍵字as可以省略，此時默認標準輸出數據中第一個\t之前的字段為Key，后面的部分全部為Value，相當于as(key, value)。

調用Shell命令使用示例

假設通過Shell命令生成50行數據，值是從1到50，輸出為data字段。直接將Shell命令作為transform數據輸入。命令示例如下：

select transform(script) using 'sh' as (data) 
from (
        select  'for i in `seq 1 50`; do echo $i; done' as script
      ) t
;
--等效于如下語句。
select transform('for i in `seq 1 50`; do echo $i; done') using 'sh' as (data);

返回結果如下：

+------------+
| data       |
+------------+
| 1          |
| 2          |
| 3          |
| 4          |
| 5          |
| 6          |
| 7          |
| 8          |
| 9          |
| 10         |
| 11         |
| 12         |
| 13         |
| 14         |
| 15         |
| 16         |
| 17         |
| 18         |
| 19         |
| 20         |
| 21         |
| 22         |
| 23         |
| 24         |
| 25         |
| 26         |
| 27         |
| 28         |
| 29         |
| 30         |
| 31         |
| 32         |
| 33         |
| 34         |
| 35         |
| 36         |
| 37         |
| 38         |
| 39         |
| 40         |
| 41         |
| 42         |
| 43         |
| 44         |
| 45         |
| 46         |
| 47         |
| 48         |
| 49         |
| 50         |
+------------+

調用Python命令使用示例

假設通過Python命令生成50行數據，值是從1到50，輸出為data字段。直接將Python命令作為transform數據輸入。命令示例如下：

select transform(script) using 'python' as (data) 
from (
        select  'for i in xrange(1, 51):  print i;' as script
      ) t
;
--等效于如下語句。
select transform('for i in xrange(1, 51):  print i;') using 'python' as (data);

返回結果如下：

+------------+
| data       |
+------------+
| 1          |
| 2          |
| 3          |
| 4          |
| 5          |
| 6          |
| 7          |
| 8          |
| 9          |
| 10         |
| 11         |
| 12         |
| 13         |
| 14         |
| 15         |
| 16         |
| 17         |
| 18         |
| 19         |
| 20         |
| 21         |
| 22         |
| 23         |
| 24         |
| 25         |
| 26         |
| 27         |
| 28         |
| 29         |
| 30         |
| 31         |
| 32         |
| 33         |
| 34         |
| 35         |
| 36         |
| 37         |
| 38         |
| 39         |
| 40         |
| 41         |
| 42         |
| 43         |
| 44         |
| 45         |
| 46         |
| 47         |
| 48         |
| 49         |
| 50         |
+------------+

調用AWK命令使用示例

創建一張測試表，假設通過AWK命令將測試表的第二列原樣輸出，輸出為data字段。直接將AWK命令作為transform數據輸入。命令示例如下：

--創建測試表。
create table testdata(c1 bigint,c2 bigint);
--測試表中插入測試數據。 
insert into table testdata values (1,4),(2,5),(3,6); 
--執行select transform語句。 
select transform(*) using "awk '//{print $2}'" as (data) from testdata;

返回結果如下：

+------------+
| data       |
+------------+
| 4          |
| 5          |
| 6          |
+------------+

調用Perl命令使用示例

創建一張測試表，假設通過Perl命令將測試表的數據原樣輸出，輸出為data字段。直接將Perl命令作為transform數據輸入。命令示例如下：

--創建測試表。
create table testdata(c1 bigint,c2 bigint);
--測試表中插入測試數據。 
insert into table testdata values (1,4),(2,5),(3,6); 
--執行select transform語句。 
select transform(testdata.c1, testdata.c2) using "perl -e 'while($input = <STDIN>){print $input;}'" from testdata;

返回結果如下：

+------------+------------+
| key        | value      |
+------------+------------+
| 1          | 4          |
| 2          | 5          |
| 3          | 6          |
+------------+------------+

調用Python腳本使用示例

準備Python文件，腳本文件名為myplus.py，命令示例如下。

#!/usr/bin/env python
import sys
line = sys.stdin.readline()
while line:
    token = line.split('\t')
    if (token[0] == '\\N') or (token[1] == '\\N'):
        print '\\N'
    else:
        print str(token[0]) +'\t' + str(token[1])
    line = sys.stdin.readline()

將該Python腳本文件添加為MaxCompute資源（Resource）。
```
add py ./myplus.py -f;
```
說明
您也可通過DataWorks控制臺進行新增資源操作，請參見創建并使用MaxCompute資源。

使用select transform語法調用資源。

--創建測試表。
create table testdata(c1 bigint,c2 bigint);
--測試表中插入測試數據。 
insert into table testdata values (1,4),(2,5),(3,6); 
--執行select transform語句。 
select 
transform (testdata.c1, testdata.c2) 
using 'python myplus.py' resources 'myplus.py' 
as (result1,result2) 
from testdata;
--等效于如下語句。
set odps.sql.session.resources=myplus.py;
select transform (testdata.c1, testdata.c2) 
using 'python myplus.py' 
as (result1,result2) 
from testdata;

返回結果如下：

+------------+------------+
| result1    | result2    |
+------------+------------+
| 1          | 4          |
|            | NULL       |
| 2          | 5          |
|            | NULL       |
| 3          | 6          |
|            | NULL       |
+------------+------------+

調用Java腳本使用示例

準備好JAR文件，腳本文件名為Sum.jar，Java代碼示例如下。

package com.aliyun.odps.test;
import java.util.Scanner;
public class Sum {
    public static void main(String[] args) {
        Scanner sc = new Scanner(System.in);
        while (sc.hasNext()) {
            String s = sc.nextLine();
            String[] tokens = s.split("\t");
            if (tokens.length < 2) {
                throw new RuntimeException("illegal input");
            }
            if (tokens[0].equals("\\N") || tokens[1].equals("\\N")) {
                System.out.println("\\N");
            }
            System.out.println(Long.parseLong(tokens[0]) + Long.parseLong(tokens[1]));
        }
    }
}

將JAR文件添加為MaxCompute的資源。
```
add jar ./Sum.jar -f;
```

使用select transform語法調用資源。

--創建測試表。
create table testdata(c1 bigint,c2 bigint); 
--測試表中插入測試數據。
insert into table testdata values (1,4),(2,5),(3,6); 
--執行select transform語句。
select transform(testdata.c1, testdata.c2) using 'java -cp Sum.jar com.aliyun.odps.test.Sum' resources 'Sum.jar' as cnt from testdata;
--等效于如下語句。
set odps.sql.session.resources=Sum.jar; 
select transform(testdata.c1, testdata.c2) using 'java -cp Sum.jar com.aliyun.odps.test.Sum' as cnt from testdata;

返回結果如下：

+-----+
| cnt |
+-----+
| 5   |
| 7   |
| 9   |
+-----+

說明

Java和Python雖然有現成的UDTF框架，但是用select transform編寫更簡單，不需要額外依賴以及沒有格式要求，甚至可以實現直接使用離線腳本。Java和Python離線腳本的實際路徑，可以從JAVA_HOME和PYTHON_HOME環境變量中得到。

串聯使用示例

select transform還可以串聯使用。例如使用distribute by和sort by對輸入數據做預處理。命令示例如下：

select transform(key, value) using '<cmd2>' from 
(
    select transform(*) using '<cmd1>' from 
    (
        select * from testdata distribute by c2 sort by c1 
    ) t distribute by key sort by value 
) t2;

cmd1、cmd2為要啟動的子進程的命令。

或使用map、reduce關鍵字。

@a := select * from data distribute by col2 sort by col1;
@b := map * using 'cmd1' distribute by col1 sort by col2 from @a;
reduce * using 'cmd2' from @b;

日本熟妇hd丰满老熟妇,中文字幕一区二区三区在线不卡 ,亚洲成片在线观看,免费女同在线一区二区

SELECT TRANSFORM

功能介紹

使用限制

命令格式

調用Shell命令使用示例

調用Python命令使用示例

調用AWK命令使用示例

調用Perl命令使用示例

調用Python腳本使用示例

調用Java腳本使用示例

串聯使用示例