pycharm connects to spark in the virtual machine

1. Open pycharm

2. Unzip hadoop, unzip it under windows, remember not to have a Chinese path

insert image description here
insert image description here
insert image description here

3. Unzip spark, unzip it under windows, remember not to have a Chinese path

insert image description here
insert image description here
insert image description here

4. Configure the environment variables corresponding to haoop and sprk into pycharm

4.1 Create a new project

insert image description here

insert image description here
insert image description here

4.2 Create a new python file in the project

insert image description here

insert image description here
insert image description here

4.3 Add hadoop to pycharm

insert image description here
insert image description here
insert image description here

HADOOP_HOME

insert image description here

4.4 put the winutils.exe plug-in under hadoop/bin

insert image description here

4.5 Add spark to pycharm

insert image description here

SPARK_HOME、PYTHONPATH

insert image description here
insert image description here
insert image description here

insert image description here
insert image description here
insert image description here

5. Install the plugin

insert image description here
insert image description here
insert image description here
insert image description here

6. Test

6.1 Put the following code into our newly created testspark.py file in step 4.2

import findspark
findspark.init()
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("WordCount").getOrCreate()
spark.sparkContext.textFile("file:///D:/ruanjian/spark/spark-2.4.6-bin-hadoop2.7/README.md")\
        .flatMap(lambda x: x.split(' '))\
        .map(lambda x: (x, 1))\
        .reduceByKey(lambda x, y: x + y)\
        .foreach(print)

caution caution caution
insert image description here

6.2 Install pyspark and findspark

insert image description here

6.3 Testing

insert image description here
insert image description here

Guess you like

Origin blog.csdn.net/weixin_45955039/article/details/129819543