日本熟妇hd丰满老熟妇,中文字幕一区二区三区在线不卡 ,亚洲成片在线观看,免费女同在线一区二区

在ECI中訪問OSS數據

使用Hadoop、Spark等運行批處理作業時,可以選擇對象存儲OSS作為存儲。本文以Spark為例,演示如何上傳文件到OSS中,并在Spark中進行訪問。

準備數據并上傳到OSS

  1. 登錄OSS管理控制臺

  2. 創建Bucket。具體操作,請參見創建存儲空間

  3. 上傳文件到OSS。具體操作,請參見簡單上傳

    上傳文件后,記錄該文件在OSS Bucket的地址(例如oss://test***-hust/test.txt)和OSS endpoint(例如oss-cn-hangzhou-internal.aliyuncs.com)。

在Spark應用中讀取OSS數據

  1. 開發應用。

    SparkConf conf = new SparkConf().setAppName(WordCount.class.getSimpleName());
    JavaRDD<String> lines = sc.textFile("oss://test***-hust/test.txt", 250);
    ...
    wordsCountResult.saveAsTextFile("oss://test***-hust/test-result");
    sc.close();   
  2. 在應用中配置OSS信息。

    說明

    請根據實際替換OSS endpoint、AccessKey ID和AccessKey Secret。

    • 方式一:使用靜態配置文件

      修改core-site.xml,然后將其放入到應用項目的resources目錄下。

      <?xml version="1.0" encoding="UTF-8"?>
      <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
      <!--
        Licensed under the Apache License, Version 2.0 (the "License");
        you may not use this file except in compliance with the License.
        You may obtain a copy of the License at
          http://www.apache.org/licenses/LICENSE-2.0
        Unless required by applicable law or agreed to in writing, software
        distributed under the License is distributed on an "AS IS" BASIS,
        WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
        See the License for the specific language governing permissions and
        limitations under the License. See accompanying LICENSE file.
      -->
      <!-- Put site-specific property overrides in this file. -->
      <configuration>
          <!-- OSS配置 -->
          <property>
              <name>fs.oss.impl</name>
              <value>org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem</value>
          </property>
          <property>
              <name>fs.oss.endpoint</name>
              <value>oss-cn-hangzhou-internal.aliyuncs.com</value>
          </property>
          <property>
              <name>fs.oss.accessKeyId</name>
              <value>{your AccessKey ID}</value>
          </property>
          <property>
              <name>fs.oss.accessKeySecret</name>
              <value>{your AccessKey Secret}</value>
          </property>
          <property>
              <name>fs.oss.buffer.dir</name>
              <value>/tmp/oss</value>
          </property>
          <property>
              <name>fs.oss.connection.secure.enabled</name>
              <value>false</value>
          </property>
          <property>
              <name>fs.oss.connection.maximum</name>
              <value>2048</value>
          </property>
      </configuration>
    • 方式二:提交應用時進行動態設置

      以Spark為例,在提交應用時進行設置,示例如下:

      hadoopConf:
          # OSS
          "fs.oss.impl": "org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem"
          "fs.oss.endpoint": "oss-cn-hangzhou-internal.aliyuncs.com"
          "fs.oss.accessKeyId": "your AccessKey ID"
          "fs.oss.accessKeySecret": "your AccessKey Secret"
  3. 打包JAR文件。

    打包的JAR文件中需包含所有依賴。應用的pom.xml如下:

     1<?xml version="1.0" encoding="UTF-8"?>
     2<project xmlns="http://maven.apache.org/POM/4.0.0"
     3         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
     4         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
     5    <modelVersion>4.0.0</modelVersion>
     6
     7    <groupId>com.aliyun.liumi.spark</groupId>
     8    <artifactId>SparkExampleJava</artifactId>
     9    <version>1.0-SNAPSHOT</version>
    10
    11    <dependencies>
    12        <dependency>
    13            <groupId>org.apache.spark</groupId>
    14            <artifactId>spark-core_2.12</artifactId>
    15            <version>2.4.3</version>
    16        </dependency>
    17
    18        <dependency>
    19            <groupId>com.aliyun.dfs</groupId>
    20            <artifactId>aliyun-sdk-dfs</artifactId>
    21            <version>1.0.3</version>
    22        </dependency>
    23
    24    </dependencies>
    25
    26    <build>
    27    <plugins>
    28        <plugin>
    29            <groupId>org.apache.maven.plugins</groupId>
    30            <artifactId>maven-assembly-plugin</artifactId>
    31            <version>2.6</version>
    32            <configuration>
    33                <appendAssemblyId>false</appendAssemblyId>
    34                <descriptorRefs>
    35                    <descriptorRef>jar-with-dependencies</descriptorRef>
    36                </descriptorRefs>
    37                <archive>
    38                    <manifest>
    39                        <mainClass>com.aliyun.liumi.spark.example.WordCount</mainClass>
    40                    </manifest>
    41                </archive>
    42            </configuration>
    43            <executions>
    44                <execution>
    45                    <id>make-assembly</id>
    46                    <phase>package</phase>
    47                    <goals>
    48                        <goal>assembly</goal>
    49                    </goals>
    50                </execution>
    51            </executions>
    52        </plugin>
    53    </plugins>
    54    </build>
    55</project>
  4. 編寫Dockerfile。

    # spark base image
    FROM registry.cn-beijing.aliyuncs.com/eci_open/spark:2.4.4
    RUN rm $SPARK_HOME/jars/kubernetes-client-*.jar
    ADD https://repo1.maven.org/maven2/io/fabric8/kubernetes-client/4.4.2/kubernetes-client-4.4.2.jar $SPARK_HOME/jars
    RUN mkdir -p /opt/spark/jars
    COPY SparkExampleJava-1.0-SNAPSHOT.jar /opt/spark/jars
    # OSS 依賴JAR
    COPY aliyun-sdk-oss-3.4.1.jar /opt/spark/jars
    COPY hadoop-aliyun-2.7.3.2.6.1.0-129.jar /opt/spark/jars
    COPY jdom-1.1.jar /opt/spark/jars
    說明

    OSS依賴JAR的下載地址請參見通過HDP 2.6 Hadoop讀取和寫入OSS數據

  5. 構建應用鏡像。

    docker build -t registry.cn-beijing.aliyuncs.com/liumi/spark:2.4.4-example -f Dockerfile .
  6. 將鏡像推送到阿里云ACR。

    docker push registry.cn-beijing.aliyuncs.com/liumi/spark:2.4.4-example

完成上述操作,即準備好Spark應用鏡像后,可以使用該鏡像在Kubernetes集群中部署Spark應用。