免费av无码无禁止网站,av人摸人人人澡人人超碰小说,成在人线av无码免观看

本文為您介紹如何分別通過Java UDF和Python UDF實現使用正則表達式替換字符串。

命令說明

本示例將注冊一個名稱為UDF_REPLACE_BY_REGEXP的自定義函數，下面對命令格式和入參進行說明。

命令格式：

string UDF_REPLACE_BY_REGEXP(string <s>, string <regex>, string <replacement>)

命令功能：
在字符串s中使用正則表達式regex匹配并替換為字符串replacement。與MaxCompute的內建函數REGEXP_REPLACE函數相比，該函數中正則表達式支持變量。

參數說明：
- s：源字符串，STRING類型，必填。
- regex：正則表達式，STRING類型，必填。
- replacement：替換字符串，將該字符串通過正則表達式替換源字符串，STRING類型，必填。

開發和使用步驟

1. 代碼開發

Java UDF 代碼示例

package com.aliyun.rewrite; //package名稱，可以根據您的情況定義。
import com.aliyun.odps.udf.UDF;
import com.aliyun.odps.udf.annotation.UdfProperty;

import java.util.Objects;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

@UdfProperty(isDeterministic=true)
public class ReplaceByRegExp extends UDF {
    /**
     * 上一次查詢的正則表達式，緩存避免多次編譯
     */
    private String lastRegex = "";
    private Pattern pattern = null;

    /**
     * @param s 源字符串
     * @param regex 正則表達式
     * @param replacement 替換字符串
     */
    public String evaluate(String s, String regex, String replacement) {
        Objects.requireNonNull(s, "原始字符串不能為null");
        Objects.requireNonNull(regex, "正則表達式不能為null");
        Objects.requireNonNull(replacement, "替換字符串不能為null");

        // 如果正則表達式已更改，再次編譯正則表達式。
        if (!regex.equals(lastRegex)) {
            lastRegex = regex;
            pattern = Pattern.compile(regex);
        }
        Matcher m = pattern.matcher(s);
        StringBuffer sb = new StringBuffer();

        // 替換文本
        while (m.find()) {
            m.appendReplacement(sb, replacement);
        }
        m.appendTail(sb);
        return sb.toString();
    }
}

使用Java語言編寫UDF代碼必須繼承UDF類，本例中evaluate方法定義了三個string類型的入參和string類型的返回值，輸入參數和返回值的數據類型將作為SQL語句中UDF的函數簽名Signature，其他代碼規范和要求請參考：UDF開發規范與通用流程（Java）。

Python3 UDF 代碼示例

from odps.udf import annotate
import re

@annotate("string,string,string->string")
class ReplaceByRegExp(object):
    def __init__(self):
        self.lastRegex = ""
        self.pattern = None

    def evaluate(self, s, regex, replacement):
        if not s or not regex or not replacement:
            raise ValueError("Arguments with None")
        # 如果正則表達式已更改，再次編譯正則表達式。
        if regex != self.lastRegex:
            self.lastRegex = regex
            self.pattern = re.compile(regex)
        result = self.pattern.sub(replacement, s)
        return result

MaxCompute默認使用Python 2，可以在Session級別使用命令set odps.sql.python.version=cp37開啟Python 3。更多python3 UDF規范請參考：UDF開發規范與通用流程（Python3）。

Python2 UDF 代碼示例

#coding:utf-8
from odps.udf import annotate
import re

@annotate("string,string,string->string")
class ReplaceByRegExp(object):
    def __init__(self):
        self.lastRegex = ""
        self.pattern = None

    def evaluate(self, s, regex, replacement):
        if not s or not regex or not replacement:
            raise ValueError("Arguments with None")
        # 如果正則表達式已更改，再次編譯正則表達式。
        if regex != self.lastRegex:
            self.lastRegex = regex
            self.pattern = re.compile(regex)
        result = self.pattern.sub(replacement, s)
        return result

當Python 2代碼中出現中文字符時，運行程序會報錯，必須在代碼頭部增加編碼聲明。固定聲明格式為#coding:utf-8或# -*- coding: utf-8 -*-，二者等效。更多python2 UDF規范請參考：UDF開發規范與通用流程（Python2）。

2. 上傳資源和注冊函數

完成UDF代碼開發和調試之后，將資源上傳至MaxCompute并注冊函數，本示例注冊函數名：UDF_REPLACE_BY_REGEXP。Java UDF上傳資源與注冊函數詳情步驟請參見：打包、上傳及注冊，Python UDF請參見：上傳及注冊。

3. 使用示例

成功注冊UDF后，執行以下命令，將字符串中的所有數字替換為"#"。

set odps.sql.python.version=cp37; -- python3 UDF需要使用該命令開啟python3
SELECT UDF_REPLACE_BY_REGEXP('abc 123 def 456', '\\d+', '#');

執行結果如下：

+--------------+
| _c0          |
+--------------+
| abc # def #  |
+--------------+

日本熟妇hd丰满老熟妇,中文字幕一区二区三区在线不卡 ,亚洲成片在线观看,免费女同在线一区二区

UDF示例：使用正則表達式替換字符串