Spark-2.4.5官方文档说明-Ubuntu运行示例(二)

运行Quick Start的Java和Python示例

运行Python示例

Python示例的时候会报下面这个错,执行pyspark报错env: ‘python’: No such file or directory问题

https://blog.csdn.net/qq_42881421/article/details/88069211

运行下面的示例,如果使用spark-submit命令不会有问题,但如果运行#>python SimpleApp.py会要求安装pip

"""SimpleApp.py"""
from pyspark.sql import SparkSession

logFile = "YOUR_SPARK_HOME/README.md"  # Should be some file on your system
spark = SparkSession.builder.appName("SimpleApp").getOrCreate()
logData = spark.read.text(logFile).cache()

numAs = logData.filter(logData.value.contains('a')).count()
numBs = logData.filter(logData.value.contains('b')).count()

print("Lines with a: %i, lines with b: %i" % (numAs, numBs))

spark.stop()

 使用下面命令进行安装:#>apt install python-pip

再安装pyspark,使用如下命令安装:#>pip install pyspark

最后运行示例即可

运行Java示例

使用apt install maven安装maven

pom文件要支持使用JDK8编译

<properties>
        <maven.compiler.target>1.8</maven.compiler.target>
        <maven.compiler.source>1.8</maven.compiler.source>
</properties>

<build>
    <pluginManagement>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.6.1</version>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                </configuration>
            </plugin>
        </plugins>
    <pluginManagement>
</build>

运行这个示例会报错:reference to filter is ambiguous

改成下面的代码

import org.apache.spark.api.java.function.FilterFunction;

long numAs = logData.filter((FilterFunction<String>) s -> s.contains("a")).count();

猜你喜欢

转载自blog.csdn.net/penker_zhao/article/details/104517641