[Pyspark] pyspark submitted to the yarn run python code

1. compressed project file
sudo zip -r project .zip.gz ./*

2. Configure PYTHONPATH, point to that directory


Create a profile conf.py file 3. Project
AI_PLATFORM_SOURCE = r '/ usr / project .zip'

2. Code References external module
# referenced module path from the conf
from the conf Import item path
sys.path.append (item path)
from path item Settings Import


Reference compressed class
 import_module = "engineering class path. {0}." The format (class_name)
                Module1 = importlib.import_module (import_module, base_dir)
                HandlerClass = getattr (Module1, class_name)
                # = HandlerClass Handler (json.dumps (the params ))
                filename = data_dir + 'feature_filter /' + 'feature_filter.json'
                Handler HandlerClass = (filename)

                res = handler.execute(gai_ss.ss.sparkContext,gai_ss.ss )


4. Execute the program
using zip -r gai_platform.zip packaged in a project subdirectory *

Submit a cluster running
 bin / spark-submit --py-files project .zip project path --master the Yarn the --deploy-the MODE /demo.py Cluster
 bin / spark-submit --py-files project .zip project path / demo. py --master yarn --deploy-mode client

 


spark-submit --py-files hdfs: // localhost: 8020 / user / dp / data / program path /demo.py --master local project .zip


bin/spark-submit  \
    main.py

    --py-files main.py module packages needed, py files are packaged together (labeled [* .zip * .egg (a third module (numpy, pandas))] local file)
   
    to execute the script places where df
 

Reprinted from: https://blog.csdn.net/dymkkj/article/details/86006088

Guess you like

Origin blog.csdn.net/zkq_1986/article/details/95312140