【大数据】Spark开源REST服务--Apache Livy的安装和使用

安装

前提:需要安装HADOOP/HDFS/YARN、SPARK等组件,并且配置环境变量

1. 下载livy安装包

livy官网下载界面

cd /opt
wget https://dlcdn.apache.org/incubator/livy/0.7.1-incubating/apache-livy-0.7.1-incubating-bin.zip

2. 解压安装包

unzip apache-livy-0.7.1-incubating-bin.zip

3. 配置

  1. 修改livy-env.sh
JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_211.jdk/Contents/Home
HADOOP_CONF_DIR=/Users/xxx/Documents/software/hadoop-3.3.1/etc/hadoop
SPARK_HOME=/Users/xxx/Documents/software/spark-3.2.1
SPARK_CONF_DIR=/Users/xxx/Documents/software/spark-3.2.1/conf
  1. 配置livy.conf
# 配置livy会话所使用的spark集群部署模式
livy.spark.master = yarn
# 配置livy会话所使用的Spark集群部署模式
livy.spark.deploy.mode = cluster
# 默认使用hiveContext
livy.repl.enable.hive-context = true
# 开启用户代理
livy.impersonation.enabled = true
# 配置session空闲过期时间
livy.server.session.timeout = 1h
# 配置thriftserver
livy.server.thrift.enabled = true
livy.server.thrift.port = 10002
# 配置 recovery 
livy.server.recovery.mode = recovery
livy.server.recovery.state-store = filesystem
livy.server.recovery.state-store.url = hdfs://10.253.128.30:9000/livy/
  1. 配置log4j
cp log4j.properties.template log4j.properties
  1. 拷贝jersey-core-1.9.jar包到jars目录下

4. 启动livy

# 进入到livy目录下
cd /opt/livy-0.7.1
bin/livy-server start

访问livy-ui

curl http://ip:8998/ui

Livy配置项

配置 header默认值 说明
livy.server.spark-home spark目录
livy.spark.master
livy.spark.deploy-mode
livy.spark.scala-version
livy.spark.version
livy.session.staging-dir
livy.file.upload.max.size
livy.file.local-dir-whitelist
livy.repl.enable-hive-context
livy.environment
livy.server.host
livy.server.port 8998
livy.ui.basePath
livy.ui.enabled
livy.server.request-header.size 131072
livy.server.response-header.size 131072
livy.server.csrf-protection.enabled false
livy.impersonation.enabled false
livy.superusers null
livy.server.access-control.enabled false
livy.server.access-control.allowed-users *
livy.server.access-control.modify-users null
livy.server.access-control.view-users null
livy.keystore
livy.keystore.password
livy.key-password

Livy 使用

livy-session

通过livy-session, 可以通过rest来执行spark-shell,用于处理交互式的请求

  1. session的创建
curl -XPOST 'http://10.253.128.30:8998/sessions' -H 'Content-Type:application/json' --data '{"kind": "spark"}'
  1. session查看
    http://10.253.128.30:8998/ui

  2. session使用 curl -XPOST ‘http://10.253.128.30:8998/sessions/2/statements’ -H ‘Content-Type:application/json’ --d ‘{“code”: “sc.textFile(”“)”}’

注意: 待到livy server的状态转换成idle的时候,向其发送请求,才会去执行。执行时,其状态转变成busy。执行完毕之后,状态又变成idle

livyy-batch

通过livy-batch处理非交互式请求,即,相当于spark-submit操作。
examples:

curl -XPOST  -H 'Content-Type:application/json' http://10.253.128.30:8998/batches --data '{"conf": {"spark.master": "yarn-cluster"}, "file": "hdfs://", "className":"", "name":"", "executorCores": "","executorMemory":"512m", "driverCores": 1, "driverMemory":"512m", "queue":"default","args":[\"100\"] }'

猜你喜欢

转载自blog.csdn.net/u013412066/article/details/129793483