这里写自定义目录标题
- 前置环境
- 官方示例
- 三个小坑
- maven文件
- 引用不明确
- 未传递master url
前置环境
见上一篇:https://blog.csdn.net/shuzip/article/details/115606522
官方示例
https://spark.apache.org/docs/3.1.1/quick-start.html
/* SimpleApp.java */
import org.apache.spark.sql.SparkSession;
import org.apache.spark.sql.Dataset;
public class SimpleApp {
public static void main(String[] args) {
String logFile = "YOUR_SPARK_HOME/README.md"; // Should be some file on your system
SparkSession spark = SparkSession.builder().appName("Simple Application").getOrCreate();
Dataset<String> logData = spark.read().textFile(logFile).cache();
long numAs = logData.filter(s -> s.contains("a")).count();
long numBs = logData.filter(s -> s.contains("b")).count();
System.out.println("Lines with a: " + numAs + ", lines with b: " + numBs);
spark.stop();
}
}
三个小坑
maven文件
官方示例里 scope指定为provided
<project>
<groupId>edu.berkeley</groupId>
<artifactId>simple-project</artifactId>
<modelVersion>4.0.0</modelVersion>
<name>Simple Project</name>
<packaging>jar</packaging>
<version>1.0</version>
<dependencies>
<dependency> <!-- Spark dependency -->
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.12</artifactId>
<version>3.1.1</version>
<scope>provided</scope>
</dependency>
</dependencies>
</project>
但是如果想在本地执行调试的话,scope需要改为compile
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>org.shuzip</groupId>
<artifactId>simple-project</artifactId>
<version>1.0-SNAPSHOT</version>
<dependencies>
<dependency> <!-- Spark dependency -->
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.12</artifactId>
<version>3.1.1</version>
<scope>compile</scope>
</dependency>
</dependencies>
<properties>
<maven.compiler.source>8</maven.compiler.source>
<maven.compiler.target>8</maven.compiler.target>
</properties>
</project>
引用不明确
把pox文件修改完后,再把代码里的文件地址替换后成个人路径后,在个人的Windows上跑这个示例,会发现报错
java: 对filter的引用不明确
所以需要修改一下代码
long numAS = logData.filter((FilterFunction<String>) s -> s.contains("a")).count();
long numBs = logData.filter((FilterFunction<String>) s -> s.contains("b")).count();
未传递master url
基于上面的修改,再执行,会发现,还有个小问题没解决
ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: A master URL must be set in your configuration
需要在spark初始化前,设置一下
System.setProperty("spark.master", "local");
最后能顺利执行的完整代码
/*SimpleApp.java */
import org.apache.spark.api.java.function.FilterFunction;
import org.apache.spark.sql.SparkSession;
import org.apache.spark.sql.Dataset;
public class SimpleApp{
public static void main(String[] args) {
System.setProperty("spark.master", "local");//设置指定本地启动
String logFile = "D:\\JAVA_project\\simple-project\\src\\main\\resources\\README.md";
SparkSession spark = SparkSession.builder().appName("SimpleAppliaction").getOrCreate();
Dataset<String> logData = spark.read().textFile(logFile).cache();
long numAS = logData.filter((FilterFunction<String>) s -> s.contains("a")).count();
long numBs = logData.filter((FilterFunction<String>) s -> s.contains("b")).count();
System.out.println("Lines with a: " + numAS + ", lines with b: " + numBs);
spark.stop();
}
}
执行成功
顺利执行打印统计,接下来就可以顺利的在官方文档里畅游学习了~