隨著大數(shù)據(jù)平臺(tái)發(fā)展,現(xiàn)已可以處理多類型的非結(jié)構(gòu)化、半結(jié)構(gòu)化數(shù)據(jù),其中將IP地址轉(zhuǎn)換為歸屬地是常見(jiàn)的一種場(chǎng)景。本文為您介紹如何通過(guò)MaxCompute UDF實(shí)現(xiàn)將IPv4或IPv6地址轉(zhuǎn)換為歸屬地。

前提條件

請(qǐng)確認(rèn)已滿足如下條件:

背景信息

要實(shí)現(xiàn)將IPv4或IPv6地址轉(zhuǎn)換為歸屬地,必須要有IP地址庫(kù),您需要下載IP地址庫(kù)文件并以資源形式上傳至MaxCompute項(xiàng)目。開(kāi)發(fā)MaxCompute UDF,并基于IP地址庫(kù)文件注冊(cè)函數(shù),從而在SQL語(yǔ)句中調(diào)用函數(shù)將IP地址轉(zhuǎn)換為歸屬地。

注意事項(xiàng)

本文提供的IP地址庫(kù)文件,僅供驗(yàn)證該最佳實(shí)踐使用,請(qǐng)您結(jié)合實(shí)際業(yè)務(wù)情況,自行維護(hù)IP地址庫(kù)文件。

操作流程

基于MaxCompute UDF將IPv4或IPv6地址轉(zhuǎn)換為歸屬地的操作流程如下:

  1. 步驟一:上傳IP地址庫(kù)文件
    將IP地址庫(kù)文件作為資源上傳至MaxCompute項(xiàng)目,后續(xù)創(chuàng)建的MaxCompute UDF會(huì)依賴此資源。
  2. 步驟二:建立項(xiàng)目連接
    連接MaxCompute項(xiàng)目,并創(chuàng)建MaxCompute Java Module。
  3. 步驟三:編寫(xiě)MaxCompute UDF
    在IntelliJ IDEA上編寫(xiě)MaxCompute UDF代碼。
  4. 步驟四:注冊(cè)MaxCompute UDF
    將MaxCompute UDF注冊(cè)為函數(shù)。
  5. 步驟五:調(diào)用MaxCompute UDF轉(zhuǎn)換IP地址為歸屬地
    在SQL語(yǔ)句中調(diào)用注冊(cè)好的函數(shù)將IP地址轉(zhuǎn)換為歸屬地。

步驟一:上傳IP地址庫(kù)文件

  1. 下載IP地址庫(kù)文件至本地,解壓得到ipv4.txt和ipv6.txt,并放置于MaxCompute客戶端的安裝目錄...\odpscmd_public\bin下。

    本文提供的IP地址庫(kù)文件,僅供驗(yàn)證該最佳實(shí)踐使用,請(qǐng)您結(jié)合實(shí)際業(yè)務(wù)情況,自行維護(hù)IP地址庫(kù)文件。

  2. 登錄MaxCompute客戶端,進(jìn)入目標(biāo)MaxCompute項(xiàng)目。
  3. 執(zhí)行add file命令,將ipv4.txt和ipv6.txt以File類型資源上傳至MaxCompute項(xiàng)目。
    命令示例如下。
    add file ipv4.txt -f;
    add file ipv6.txt -f;
    更多添加資源信息,請(qǐng)參見(jiàn)添加資源
  4. (用于本地調(diào)試)將ipv4.txt和ipv6.txt復(fù)制到本地項(xiàng)目的warehouse/example_project/_resources_目錄下。

步驟二:建立項(xiàng)目連接

  1. 連接MaxCompute項(xiàng)目。操作詳情請(qǐng)參見(jiàn)管理項(xiàng)目連接
  2. 創(chuàng)建MaxCompute Java Module。操作詳情請(qǐng)參見(jiàn)創(chuàng)建MaxCompute Java Module

步驟三:編寫(xiě)MaxCompute UDF

  1. 創(chuàng)建Java Class對(duì)象。
    后續(xù)步驟中編寫(xiě)的MaxCompute UDF代碼會(huì)用到此處創(chuàng)建的Java Class。
    1. 進(jìn)入IntelliJ IDEA界面,在Project區(qū)域,右鍵單擊Module的源碼目錄(即src > main > java),選擇new > Java Class
      創(chuàng)建Java Class
    2. New Java Class對(duì)話框,輸入Class名稱,按下Enter鍵并在代碼編輯區(qū)域輸入代碼。
      您需要依次創(chuàng)建3個(gè)Java Class對(duì)象,Class名稱及對(duì)應(yīng)代碼如下,代碼可直接復(fù)制使用,無(wú)需修改。
      • IpUtils
        package com.aliyun.odps.udf.utils;
        
        import java.math.BigInteger;
        import java.net.Inet4Address;
        import java.net.Inet6Address;
        import java.net.InetAddress;
        import java.net.UnknownHostException;
        import java.util.Arrays;
        
        public class IpUtils {
        
            /**
             * 將字符串形式的ip地址轉(zhuǎn)換為long
             *
             * @param ipInString
             * 字符串形式的ip地址
             * @return 返回long形式的ip地址
             */
            public static long StringToLong(String ipInString) {
        
                ipInString = ipInString.replace(" ", "");
                byte[] bytes;
                if (ipInString.contains(":"))
                    bytes = ipv6ToBytes(ipInString);
                else
                    bytes = ipv4ToBytes(ipInString);
                BigInteger bigInt = new BigInteger(bytes);
        //        System.out.println(bigInt.toString());
                return bigInt.longValue();
            }
        
        
            /**
             * 將字符串形式的ip地址轉(zhuǎn)換為long
             *
             * @param ipInString
             * 字符串形式的ip地址
             * @return bigint的string形式的ip地址
             */
            public static String StringToBigIntString(String ipInString) {
        
                ipInString = ipInString.replace(" ", "");
                byte[] bytes;
                if (ipInString.contains(":"))
                    bytes = ipv6ToBytes(ipInString);
                else
                    bytes = ipv4ToBytes(ipInString);
                BigInteger bigInt = new BigInteger(bytes);
                return bigInt.toString();
            }
        
            /**
             * 將整數(shù)形式的ip地址轉(zhuǎn)換為字符串形式
             *
             * @param ipInBigInt
             * 整數(shù)形式的ip地址
             * @return 字符串形式的ip地址
             */
            public static String BigIntToString(BigInteger ipInBigInt) {
                byte[] bytes = ipInBigInt.toByteArray();
                byte[] unsignedBytes = Arrays.copyOfRange(bytes, 1, bytes.length);
                // 去除符號(hào)位
                try {
                    String ip = InetAddress.getByAddress(unsignedBytes).toString();
                    return ip.substring(ip.indexOf('/') + 1).trim();
                } catch (UnknownHostException e) {
                    throw new RuntimeException(e);
                }
            }
        
            /**
             * ipv6地址轉(zhuǎn)有符號(hào)byte[17]
             */
            private static byte[] ipv6ToBytes(String ipv6) {
                byte[] ret = new byte[17];
                ret[0] = 0;
                int ib = 16;
                boolean comFlag = false;// ipv4混合模式標(biāo)記
                if (ipv6.startsWith(":"))// 去掉開(kāi)頭的冒號(hào)
                    ipv6 = ipv6.substring(1);
                String groups[] = ipv6.split(":");
                for (int ig = groups.length - 1; ig > -1; ig--) {// 反向掃描
                    if (groups[ig].contains(".")) {
                        // 出現(xiàn)ipv4混合模式
                        byte[] temp = ipv4ToBytes(groups[ig]);
                        ret[ib--] = temp[4];
                        ret[ib--] = temp[3];
                        ret[ib--] = temp[2];
                        ret[ib--] = temp[1];
                        comFlag = true;
                    } else if ("".equals(groups[ig])) {
                        // 出現(xiàn)零長(zhǎng)度壓縮,計(jì)算缺少的組數(shù)
                        int zlg = 9 - (groups.length + (comFlag ? 1 : 0));
                        while (zlg-- > 0) {// 將這些組置0
                            ret[ib--] = 0;
                            ret[ib--] = 0;
                        }
                    } else {
                        int temp = Integer.parseInt(groups[ig], 16);
                        ret[ib--] = (byte) temp;
                        ret[ib--] = (byte) (temp >> 8);
                    }
                }
                return ret;
            }
        
            /**
             * IPv4地址轉(zhuǎn)有符號(hào)byte[5]
             */
            private static byte[] ipv4ToBytes(String ipv4) {
                byte[] ret = new byte[5];
                ret[0] = 0;
                // 先找到ip地址字符串中.的位置
                int position1 = ipv4.indexOf(".");
                int position2 = ipv4.indexOf(".", position1 + 1);
                int position3 = ipv4.indexOf(".", position2 + 1);
                // 將每個(gè).之間的字符串轉(zhuǎn)換成整型
                ret[1] = (byte) Integer.parseInt(ipv4.substring(0, position1));
                ret[2] = (byte) Integer.parseInt(ipv4.substring(position1 + 1,
                        position2));
                ret[3] = (byte) Integer.parseInt(ipv4.substring(position2 + 1,
                        position3));
                ret[4] = (byte) Integer.parseInt(ipv4.substring(position3 + 1));
                return ret;
            }
        
        
            /**
             * @param ipAdress ipv4或ipv6字符串
             * @return 4:ipv4, 6:ipv6, 0:地址不對(duì)
             * @throws Exception
             */
            public static int isIpV4OrV6(String ipAdress) throws Exception {
                InetAddress address = InetAddress.getByName(ipAdress);
                if (address instanceof Inet4Address)
                    return 4;
                else if (address instanceof Inet6Address)
                    return 6;
                return 0;
            }
        
        
            /*
             * 驗(yàn)證ip是否屬于某個(gè)IP段
             *
             * ipSection ip段(以'-'分隔)
             *
             * ip 所驗(yàn)證的ip號(hào)碼
             */
        
            public static boolean ipExistsInRange(String ip, String ipSection) {
        
                ipSection = ipSection.trim();
        
                ip = ip.trim();
        
                int idx = ipSection.indexOf('-');
        
                String beginIP = ipSection.substring(0, idx);
        
                String endIP = ipSection.substring(idx + 1);
        
                return getIp2long(beginIP) <= getIp2long(ip)
                        && getIp2long(ip) <= getIp2long(endIP);
        
            }
        
            public static long getIp2long(String ip) {
        
                ip = ip.trim();
        
                String[] ips = ip.split("\\.");
        
                long ip2long = 0L;
        
                for (int i = 0; i < 4; ++i) {
        
                    ip2long = ip2long << 8 | Integer.parseInt(ips[i]);
        
                }
                return ip2long;
        
            }
        
            public static long getIp2long2(String ip) {
        
                ip = ip.trim();
        
                String[] ips = ip.split("\\.");
        
                long ip1 = Integer.parseInt(ips[0]);
        
                long ip2 = Integer.parseInt(ips[1]);
        
                long ip3 = Integer.parseInt(ips[2]);
        
                long ip4 = Integer.parseInt(ips[3]);
        
                long ip2long = 1L * ip1 * 256 * 256 * 256 + ip2 * 256 * 256 + ip3 * 256
                        + ip4;
        
                return ip2long;
        
            }
        
            public static void main(String[] args) {
                System.out.println(StringToLong("2002:7af3:f3be:ffff:ffff:ffff:ffff:ffff"));
                System.out.println(StringToLong("54.38.72.63"));
            }
        
            private class Invalid{
                private Invalid()
                {
        
                }
            }
        }
        
        
                                                
      • IpV4Obj
        package com.aliyun.odps.udf.objects;
        
        public class IpV4Obj {
            public long startIp ;
            public long endIp ;
            public String city;
            public String province;
        
            public IpV4Obj(long startIp, long endIp, String city, String province) {
                this.startIp = startIp;
                this.endIp = endIp;
                this.city = city;
                this.province = province;
            }
        
            @Override
            public String toString() {
                return "IpV4Obj{" +
                        "startIp=" + startIp +
                        ", endIp=" + endIp +
                        ", city='" + city + '\'' +
                        ", province='" + province + '\'' +
                        '}';
            }
        
            public void setStartIp(long startIp) {
                this.startIp = startIp;
            }
        
            public void setEndIp(long endIp) {
                this.endIp = endIp;
            }
        
            public void setCity(String city) {
                this.city = city;
            }
        
            public void setProvince(String province) {
                this.province = province;
            }
        
            public long getStartIp() {
                return startIp;
            }
        
            public long getEndIp() {
                return endIp;
            }
        
            public String getCity() {
                return city;
            }
        
            public String getProvince() {
                return province;
            }
        }
                                                
      • IpV6Obj
        package com.aliyun.odps.udf.objects;
        
        public class IpV6Obj {
            public String startIp ;
            public String endIp ;
            public String city;
            public String province;
        
            public String getStartIp() {
                return startIp;
            }
        
            @Override
            public String toString() {
                return "IpV6Obj{" +
                        "startIp='" + startIp + '\'' +
                        ", endIp='" + endIp + '\'' +
                        ", city='" + city + '\'' +
                        ", province='" + province + '\'' +
                        '}';
            }
        
            public IpV6Obj(String startIp, String endIp, String city, String province) {
                this.startIp = startIp;
                this.endIp = endIp;
                this.city = city;
                this.province = province;
            }
        
            public void setStartIp(String startIp) {
                this.startIp = startIp;
            }
        
            public String getEndIp() {
                return endIp;
            }
        
            public void setEndIp(String endIp) {
                this.endIp = endIp;
            }
        
            public String getCity() {
                return city;
            }
        
            public void setCity(String city) {
                this.city = city;
            }
        
            public String getProvince() {
                return province;
            }
        
            public void setProvince(String province) {
                this.province = province;
            }
        }
                                                
  2. 編寫(xiě)MaxCompute UDF代碼。
    1. Project區(qū)域,右鍵單擊Module的源碼目錄(即src > main > java),選擇new > MaxCompute Java
      編寫(xiě)UDF
    2. Create new MaxCompute java class對(duì)話框,單擊UDF并填寫(xiě)Name后,按Enter鍵并在代碼編寫(xiě)區(qū)域輸入代碼。
      創(chuàng)建UDF例如Java Class名稱為IpLocation。代碼內(nèi)容如下,代碼可直接復(fù)制使用,無(wú)需修改。
      package com.aliyun.odps.udf.udfFunction;
      
      import com.aliyun.odps.udf.ExecutionContext;
      import com.aliyun.odps.udf.UDF;
      import com.aliyun.odps.udf.UDFException;
      import com.aliyun.odps.udf.utils.IpUtils;
      import com.aliyun.odps.udf.objects.IpV4Obj;
      import com.aliyun.odps.udf.objects.IpV6Obj;
      import java.io.*;
      import java.util.ArrayList;
      import java.util.Comparator;
      import java.util.List;
      import java.util.stream.Collectors;
      
      public class IpLocation extends UDF {
          public static IpV4Obj[] ipV4ObjsArray;
          public static IpV6Obj[] ipV6ObjsArray;
      
          public IpLocation() {
              super();
          }
      
          @Override
          public void setup(ExecutionContext ctx) throws UDFException, IOException {
              //IPV4
              if(ipV4ObjsArray==null)
              {
                  BufferedInputStream bufferedInputStream = ctx.readResourceFileAsStream("ipv4.txt");
      
                  BufferedReader br = new BufferedReader(new InputStreamReader(bufferedInputStream));
                  ArrayList<IpV4Obj> ipV4ObjArrayList=new ArrayList<>();
                  String line = null;
                  while ((line = br.readLine()) != null) {
                      String[] f = line.split("\\|", -1);
                      if(f.length>=5)
                      {
                          long startIp = IpUtils.StringToLong(f[0]);
                          long endIp = IpUtils.StringToLong(f[1]);
                          String city=f[3];
                          String province=f[4];
                          IpV4Obj ipV4Obj = new IpV4Obj(startIp, endIp, city, province);
                          ipV4ObjArrayList.add(ipV4Obj);
                      }
                  }
                  br.close();
                  List<IpV4Obj> collect = ipV4ObjArrayList.stream().sorted(Comparator.comparing(IpV4Obj::getStartIp)).collect(Collectors.toList());
                  ArrayList<IpV4Obj> basicIpV4DataList=(ArrayList)collect;
                  IpV4Obj[] ipV4Objs = new IpV4Obj[basicIpV4DataList.size()];
                  ipV4ObjsArray = basicIpV4DataList.toArray(ipV4Objs);
              }
      
              //IPV6
              if(ipV6ObjsArray==null)
              {
                  BufferedInputStream bufferedInputStream = ctx.readResourceFileAsStream("ipv6.txt");
                  BufferedReader br = new BufferedReader(new InputStreamReader(bufferedInputStream));
                  ArrayList<IpV6Obj> ipV6ObjArrayList=new ArrayList<>();
                  String line = null;
                  while ((line = br.readLine()) != null) {
                      String[] f = line.split("\\|", -1);
                      if(f.length>=5)
                      {
                          String startIp = IpUtils.StringToBigIntString(f[0]);
                          String endIp = IpUtils.StringToBigIntString(f[1]);
                          String city=f[3];
                          String province=f[4];
                          IpV6Obj ipV6Obj = new IpV6Obj(startIp, endIp, city, province);
                          ipV6ObjArrayList.add(ipV6Obj);
                      }
                  }
                  br.close();
                  List<IpV6Obj> collect = ipV6ObjArrayList.stream().sorted(Comparator.comparing(IpV6Obj::getStartIp)).collect(Collectors.toList());
                  ArrayList<IpV6Obj> basicIpV6DataList=(ArrayList)collect;
                  IpV6Obj[] ipV6Objs = new IpV6Obj[basicIpV6DataList.size()];
                  ipV6ObjsArray = basicIpV6DataList.toArray(ipV6Objs);
              }
      
          }
      
          public String evaluate(String ip){
              if(ip==null||ip.trim().isEmpty()||!(ip.contains(".")||ip.contains(":")))
              {
                  return null;
              }
              int ipV4OrV6=0;
              try {
                  ipV4OrV6= IpUtils.isIpV4OrV6(ip);
              } catch (Exception e) {
                  return null;
              }
              //如果是IPv4
              if(ipV4OrV6==4)
              {
                  int i = binarySearch(ipV4ObjsArray, IpUtils.StringToLong(ip));
                  if(i>=0)
                  {
                      IpV4Obj ipV4Obj = ipV4ObjsArray[i];
                      return ipV4Obj.city+","+ipV4Obj.province;
                  }else{
                      return null;
                  }
              }else if(ipV4OrV6==6)//如果是IPv6
              {
                  int i = binarySearchIPV6(ipV6ObjsArray, IpUtils.StringToBigIntString(ip));
                  if(i>=0)
                  {
                      IpV6Obj ipV6Obj = ipV6ObjsArray[i];
                      return ipV6Obj.city+","+ipV6Obj.province;
                  }else{
                      return null;
                  }
              }else{//如果不符合IPv4或IPv6格式
                  return null;
              }
      
          }
      
      
          @Override
          public void close() throws UDFException, IOException {
              super.close();
          }
      
          private static int binarySearch(IpV4Obj[] array,long ip){
              int low=0;
              int hight=array.length-1;
              while (low<=hight)
              {
                  int middle=(low+hight)/2;
                  if((ip>=array[middle].startIp)&&(ip<=array[middle].endIp))
                  {
                      return middle;
                  }
                  if (ip < array[middle].startIp)
                      hight = middle - 1;
                  else {
                      low = middle + 1;
                  }
              }
              return -1;
          }
      
      
          private static int binarySearchIPV6(IpV6Obj[] array,String ip){
              int low=0;
              int hight=array.length-1;
              while (low<=hight)
              {
                  int middle=(low+hight)/2;
                  if((ip.compareTo(array[middle].startIp)>=0)&&(ip.compareTo(array[middle].endIp)<=0))
                  {
                      return middle;
                  }
                  if (ip.compareTo(array[middle].startIp) < 0)
                      hight = middle - 1;
                  else {
                      low = middle + 1;
                  }
              }
              return -1;
          }
      
          private class Invalid{
              private Invalid()
              {
      
              }
          }
      }
                                      
  3. 準(zhǔn)備本地調(diào)試數(shù)據(jù)。
    1. 在本地項(xiàng)目的warehouse/example_project/__tables__/wc_in2/p1=2/p2=1/目錄下,打開(kāi)data文件。
    2. data文件的最后一列數(shù)據(jù)修改為ipv4.txt中的IP地址(可在ipv4.txt中任選3個(gè)IP地址填入),并保存。
  4. 調(diào)試MaxCompute UDF,確保代碼可以運(yùn)行成功。
    更多調(diào)試操作,請(qǐng)參見(jiàn)通過(guò)本地運(yùn)行調(diào)試UDF
    1. 右鍵單擊編寫(xiě)完成的MaxCompute UDF腳本,選擇Run
    2. Run/Debug Configurations對(duì)話框,配置下圖紅框所示運(yùn)行參數(shù),單擊OK
      填寫(xiě)運(yùn)行信息返回?zé)o報(bào)錯(cuò),說(shuō)明代碼運(yùn)行成功,即可繼續(xù)執(zhí)行后續(xù)步驟。如有報(bào)錯(cuò),請(qǐng)按照IntelliJ IDEA報(bào)錯(cuò)信息處理。
      說(shuō)明 運(yùn)行參數(shù)可參照?qǐng)D示數(shù)據(jù)填寫(xiě)。

步驟四:注冊(cè)MaxCompute UDF

  1. 右鍵單擊已經(jīng)編譯成功的MaxCompute UDF腳本,選擇Deploy to server…
    上傳至服務(wù)器
  2. Package a jar, submit resource and register function對(duì)話框中,配置參數(shù)信息。
    更多參數(shù)解釋,請(qǐng)參見(jiàn)打包、上傳及注冊(cè)生成JAR包并注冊(cè)函數(shù)Extra resources必須選中步驟一中上傳的IP地址庫(kù)文件ipv4.txt和ipv6.txt。假設(shè)注冊(cè)好的函數(shù)名稱為ipv4_ipv6_aton。

步驟五:調(diào)用MaxCompute UDF轉(zhuǎn)換IP地址為歸屬地

  1. 登錄MaxCompute客戶端
  2. 執(zhí)行SQL SELECT語(yǔ)句調(diào)用MaxCompute UDF將IPv4或IPv6地址轉(zhuǎn)換為歸屬地。
    命令示例如下。
    • 轉(zhuǎn)換IPv4地址為歸屬地
      select ipv4_ipv6_aton('116.11.XX.XX');
      返回結(jié)果如下。
      北海市,廣西壯族自治區(qū)
    • 轉(zhuǎn)換IPv6地址為歸屬地
      select ipv4_ipv6_aton('2001:0250:080b:0:0:0:0:0');
      返回結(jié)果如下。
      保定市,河北省