本文為您介紹Spark-1.x依賴的配置以及Spark-1.x相關示例。
配置Spark-1.x的依賴
通過MaxCompute提供的Spark客戶端提交應用,需要在pom.xml文件中添加以下依賴。
<properties>
<spark.version>1.6.3</spark.version>
<cupid.sdk.version>3.3.3-public</cupid.sdk.version>
<scala.version>2.10.4</scala.version>
<scala.binary.version>2.10</scala.binary.version>
</properties>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>com.aliyun.odps</groupId>
<artifactId>cupid-sdk</artifactId>
<version>${cupid.sdk.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>com.aliyun.odps</groupId>
<artifactId>hadoop-fs-oss</artifactId>
<version>${cupid.sdk.version}</version>
</dependency>
<dependency>
<groupId>com.aliyun.odps</groupId>
<artifactId>odps-spark-datasource_${scala.binary.version}</artifactId>
<version>${cupid.sdk.version}</version>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-actors</artifactId>
<version>${scala.version}</version>
</dependency>
上述代碼中Scope的定義如下:
- spark-core、spark-sql等所有Spark社區發布的包,使用providedScope。
- odps-spark-datasource使用默認的compileScope。
WordCount示例(Scala)
- 代碼示例
- 提交方式
cd /path/to/MaxCompute-Spark/spark-1.x mvn clean package # 環境變量spark-defaults.conf的配置請參見搭建開發環境。 cd $SPARK_HOME bin/spark-submit --master yarn-cluster --class com.aliyun.odps.spark.examples.WordCount \ /path/to/MaxCompute-Spark/spark-1.x/target/spark-examples_2.10-1.0.0-SNAPSHOT-shaded.jar
MaxCompute Table讀寫示例(Scala)
- 代碼示例
- 提交方式
cd /path/to/MaxCompute-Spark/spark-1.x mvn clean package # 環境變量spark-defaults.conf的配置請參見搭建開發環境。 cd $SPARK_HOME bin/spark-submit --master yarn-cluster --class com.aliyun.odps.spark.examples.sparksql.SparkSQL \ /path/to/MaxCompute-Spark/spark-1.x/target/spark-examples_2.10-1.0.0-SNAPSHOT-shaded.jar
MaxCompute Table讀寫示例(Python)
實現MaxCompute Table讀寫的Python示例代碼請參見spark_sql.py。
MaxCompute Table讀寫示例(Java)
實現MaxCompute Table讀寫的Java示例代碼請參見JavaSparkSQL.java。