运行Quick Start的Java和Python示例
运行Python示例
Python示例的时候会报下面这个错,执行pyspark报错env: ‘python’: No such file or directory问题
https://blog.csdn.net/qq_42881421/article/details/88069211
运行下面的示例,如果使用spark-submit命令不会有问题,但如果运行#>python SimpleApp.py会要求安装pip
"""SimpleApp.py"""
from pyspark.sql import SparkSession
logFile = "YOUR_SPARK_HOME/README.md" # Should be some file on your system
spark = SparkSession.builder.appName("SimpleApp").getOrCreate()
logData = spark.read.text(logFile).cache()
numAs = logData.filter(logData.value.contains('a')).count()
numBs = logData.filter(logData.value.contains('b')).count()
print("Lines with a: %i, lines with b: %i" % (numAs, numBs))
spark.stop()
使用下面命令进行安装:#>apt install python-pip
再安装pyspark,使用如下命令安装:#>pip install pyspark
最后运行示例即可
运行Java示例
使用apt install maven安装maven
pom文件要支持使用JDK8编译
<properties>
<maven.compiler.target>1.8</maven.compiler.target>
<maven.compiler.source>1.8</maven.compiler.source>
</properties>
<build>
<pluginManagement>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.6.1</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>
</plugins>
<pluginManagement>
</build>
运行这个示例会报错:reference to filter is ambiguous
改成下面的代码
import org.apache.spark.api.java.function.FilterFunction;
long numAs = logData.filter((FilterFunction<String>) s -> s.contains("a")).count();