K8S应用笔记 —— 部署Dolphinscheduler及简单应用(一)

一、Dolphinscheduler简介

Apache DolphinScheduler 是一个分布式易扩展的可视化DAG工作流任务调度开源系统。适用于企业级场景,提供了一个可视化操作任务、工作流和全生命周期数据处理过程的解决方案。

Apache DolphinScheduler 旨在解决复杂的大数据任务依赖关系,并为应用程序提供数据和各种 OPS 编排中的关系。 解决数据研发ETL依赖错综复杂,无法监控任务健康状态的问题。 DolphinSchedulerDAG(Directed Acyclic Graph,DAG)流式方式组装任务,可以及时监控任务的执行状态,支持重试、指定节点恢复失败、暂停、恢复、终止任务等操作。

在这里插入图片描述

二、本章目标

  • 基于K8S环境完成Dolphinscheduler部署
  • 使用本地文件存储而非 HDFSS3
  • 基于K8S环境Dolphinscheduler简单应用(支持Python3MySQL数据源及工作流编排)

三、前提条件

四、安装helm

helm官方文档,https://helm.sh/docs/intro/install/

4.1 下载所需版本

下载路径:https://github.com/helm/helm/releases

在这里插入图片描述

我选择的选择的版本是:helm-v3.12.2-linux-amd64

4.2 上传至服务器并解压

tar -zxvf helm-v3.12.2-linux-amd64.tar.gz

4.3 移到到可执行目录

helm在解压后的目录中找到二进制文件,然后将其移至所需的目标位置

mv linux-amd64/helm /usr/local/bin/helm

4.4 从脚本安装

安装脚本:

curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3
chmod 700 get_helm.sh
./get_helm.sh

五、安装 dolphinscheduler

5.1 下载解压

请下载源码包 apache-dolphinscheduler--src.tar.gz下载地址

发布一个名为 dolphinscheduler 的版本(release),官方给出的参照如下:

tar -zxvf apache-dolphinscheduler-<version>-src.tar.gz
cd apache-dolphinscheduler-<version>-src/deploy/kubernetes/dolphinscheduler
helm repo add bitnami https://charts.bitnami.com/bitnami
helm dependency update .
helm install dolphinscheduler . --set image.tag=<version>

我选择的是3.1.8版本,具体执行命令如下:

tar -zxvf apache-dolphinscheduler-3.1.8-src.tar.gz
cd apache-dolphinscheduler-3.1.8-src/deploy/kubernetes/dolphinscheduler

5.2 变更RAW资源请求地址

原请求地址为:https://raw.githubusercontent.com/,国内可能连接不上,变更RAW资源加速地址:https://raw.gitmirror.com

修改Chart.yaml文件中下面的配置项:

vi Chart.yaml
#原地址
#repository: https://raw.githubusercontent.com/bitnami/charts/archive-full-index/bitnami
#资源加速地址
repository: https://raw.gitmirror.com/bitnami/charts/archive-full-index/bitnami

在这里插入图片描述

5.3 支持本地文件存储而非 HDFS和S3

修改 values.yaml 文件中下面的配置项:

common:
  configmap:
    RESOURCE_STORAGE_TYPE: "NONE"
    RESOURCE_UPLOAD_PATH: "/dolphinscheduler"
    FS_DEFAULT_FS: "file:///"
  fsFileResourcePersistence:
    enabled: true
    accessModes:
    - "ReadWriteMany"
    storageClassName: "-"
    storage: "20Gi"

storageClassNamestorage 按需修改为实际值,注意:storageClassName 必须支持访问模式:ReadWriteMany

在这里插入图片描述

5.4 部署

helm dependency update .
helm install dolphinscheduler . -n dolphinscheduler

部署效果:

kubectl get pod,svc,pvc -o wide -n dolphinscheduler

在这里插入图片描述
在这里插入图片描述

5.5 访问前端页面

port-forward 端口转发:

kubectl port-forward --address 0.0.0.0 -n dolphinscheduler svc/dolphinscheduler-api 12345:12345

其他NodePortingress访问方式请自行探索。

访问前端页面:http://localhost:12345/dolphinscheduler/ui

默认的用户是admin,默认的密码是dolphinscheduler123

六、支持Python3和MySQL

6.1 dolphinscheduler-worker 镜像构建

下载 MySQL 驱动包 mysql-connector-java-8.0.16.jar

目录结构:
在这里插入图片描述

创建一个新的 Dockerfile,用于添加 MySQL 的驱动包和安装Python 3

FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler-worker:3.1.8
# 添加mysql驱动
COPY ./mysql-connector-java-8.0.16.jar /opt/dolphinscheduler/libs
# 添加自定义requirements.txt
COPY ./requirements.txt /tmp
RUN apt-get update && \
    apt-get install -y --no-install-recommends python3-pip && \
	pip3 install --no-cache-dir -r /tmp/requirements.txt -i https://mirrors.aliyun.com/pypi/simple/ && \
    rm -rf /var/lib/apt/lists/*

requirements.txt

Flask_Cors==3.0.10
pandas==1.4.2
PyMySQL==1.0.2
SQLAlchemy==1.4.32
xlwt==1.3.0
xlsxwriter==3.0.3
gunicorn
greenlet
eventlet
gevent
pypinyin
openpyxl

构建一个包含新镜像:

docker build -t apache/dolphinscheduler-worker:python3-mysql .

在这里插入图片描述

将构建好的新镜像进行分发。

6.2 dolphinscheduler-api 镜像构建

创建一个新的 Dockerfile,用于添加 MySQL 的驱动包和安装Python 3

FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler-api:3.1.8
# 添加mysql驱动

构建一个包含新镜像:

docker build -t apache/dolphinscheduler-api:support-mysql .

将构建好的新镜像进行分发。

6.3 修改PYTHON_HOME

  • 修改 values.yaml 文件中的 PYTHON_HOME/usr/bin/python3
  • 或在Kuboard - dolphinscheduler 名称空间 - 配置中心 - 配置字典 - dolphinscheduler-common - 编辑
    在这里插入图片描述

6.4 修改dolphinscheduler-worker 运行镜像版本

在这里插入图片描述

6.5 修改dolphinscheduler-api 运行镜像版本

在这里插入图片描述

七、简单使用验证

7.1 登录

在这里插入图片描述

7.2 创建租户

在这里插入图片描述

7.3 创建项目

在这里插入图片描述

7.4 创建数据源

在这里插入图片描述

7.5 创建项目

在这里插入图片描述

7.6 创建工作流

[{
    
    
	"processDefinition": {
    
    
		"id": 1,
		"code": 10577001612288,
		"name": "测试",
		"version": 1,
		"releaseState": "OFFLINE",
		"projectCode": 10576969989760,
		"description": "",
		"globalParams": "[]",
		"globalParamList": [],
		"globalParamMap": {
    
    },
		"createTime": "2023-08-15 09:33:45",
		"updateTime": "2023-08-15 09:33:45",
		"flag": "YES",
		"userId": 1,
		"userName": null,
		"projectName": null,
		"locations": "[{\"taskCode\":10576976496512,\"x\":175,\"y\":216},{\"taskCode\":10576980784256,\"x\":498,\"y\":216},{\"taskCode\":10576997351040,\"x\":834,\"y\":216}]",
		"scheduleReleaseState": null,
		"timeout": 0,
		"tenantId": -1,
		"tenantCode": null,
		"modifyBy": null,
		"warningGroupId": 0,
		"executionType": "PARALLEL"
	},
	"processTaskRelationList": [{
    
    
		"id": 1,
		"name": "",
		"processDefinitionVersion": 1,
		"projectCode": 10576969989760,
		"processDefinitionCode": 10577001612288,
		"preTaskCode": 0,
		"preTaskVersion": 0,
		"postTaskCode": 10576976496512,
		"postTaskVersion": 1,
		"conditionType": "NONE",
		"conditionParams": {
    
    },
		"createTime": "2023-08-15 09:33:45",
		"updateTime": "2023-08-15 09:33:45",
		"operator": 1,
		"operateTime": "2023-08-15 09:33:45"
	}, {
    
    
		"id": 2,
		"name": "",
		"processDefinitionVersion": 1,
		"projectCode": 10576969989760,
		"processDefinitionCode": 10577001612288,
		"preTaskCode": 10576976496512,
		"preTaskVersion": 1,
		"postTaskCode": 10576980784256,
		"postTaskVersion": 1,
		"conditionType": "NONE",
		"conditionParams": {
    
    },
		"createTime": "2023-08-15 09:33:45",
		"updateTime": "2023-08-15 09:33:45",
		"operator": 1,
		"operateTime": "2023-08-15 09:33:45"
	}, {
    
    
		"id": 3,
		"name": "",
		"processDefinitionVersion": 1,
		"projectCode": 10576969989760,
		"processDefinitionCode": 10577001612288,
		"preTaskCode": 10576980784256,
		"preTaskVersion": 1,
		"postTaskCode": 10576997351040,
		"postTaskVersion": 1,
		"conditionType": "NONE",
		"conditionParams": {
    
    },
		"createTime": "2023-08-15 09:33:45",
		"updateTime": "2023-08-15 09:33:45",
		"operator": 1,
		"operateTime": "2023-08-15 09:33:45"
	}],
	"taskDefinitionList": [{
    
    
		"id": 1,
		"code": 10576976496512,
		"name": "sql",
		"version": 1,
		"description": "",
		"projectCode": 10576969989760,
		"userId": 1,
		"taskType": "SQL",
		"taskParams": {
    
    
			"localParams": [],
			"resourceList": [],
			"type": "MYSQL",
			"datasource": 1,
			"sql": "show tables",
			"sqlType": "0",
			"preStatements": [],
			"postStatements": [],
			"segmentSeparator": "",
			"displayRows": 10
		},
		"taskParamList": [],
		"taskParamMap": null,
		"flag": "YES",
		"taskPriority": "MEDIUM",
		"userName": null,
		"projectName": null,
		"workerGroup": "default",
		"environmentCode": -1,
		"failRetryTimes": 0,
		"failRetryInterval": 1,
		"timeoutFlag": "CLOSE",
		"timeoutNotifyStrategy": null,
		"timeout": 0,
		"delayTime": 0,
		"resourceIds": "",
		"createTime": "2023-08-15 09:33:45",
		"updateTime": "2023-08-15 09:33:45",
		"modifyBy": null,
		"taskGroupId": 0,
		"taskGroupPriority": 0,
		"cpuQuota": -1,
		"memoryMax": -1,
		"taskExecuteType": "BATCH",
		"operator": 1,
		"operateTime": "2023-08-15 09:33:45"
	}, {
    
    
		"id": 2,
		"code": 10576980784256,
		"name": "shell",
		"version": 1,
		"description": "",
		"projectCode": 10576969989760,
		"userId": 1,
		"taskType": "SHELL",
		"taskParams": {
    
    
			"localParams": [],
			"rawScript": "echo \"hello shell\"",
			"resourceList": []
		},
		"taskParamList": [],
		"taskParamMap": null,
		"flag": "YES",
		"taskPriority": "MEDIUM",
		"userName": null,
		"projectName": null,
		"workerGroup": "default",
		"environmentCode": -1,
		"failRetryTimes": 0,
		"failRetryInterval": 1,
		"timeoutFlag": "CLOSE",
		"timeoutNotifyStrategy": null,
		"timeout": 0,
		"delayTime": 0,
		"resourceIds": "",
		"createTime": "2023-08-15 09:33:45",
		"updateTime": "2023-08-15 09:33:45",
		"modifyBy": null,
		"taskGroupId": 0,
		"taskGroupPriority": 0,
		"cpuQuota": -1,
		"memoryMax": -1,
		"taskExecuteType": "BATCH",
		"operator": 1,
		"operateTime": "2023-08-15 09:33:45"
	}, {
    
    
		"id": 3,
		"code": 10576997351040,
		"name": "python",
		"version": 1,
		"description": "",
		"projectCode": 10576969989760,
		"userId": 1,
		"taskType": "PYTHON",
		"taskParams": {
    
    
			"localParams": [],
			"rawScript": "print(\"hello python\")",
			"resourceList": []
		},
		"taskParamList": [],
		"taskParamMap": null,
		"flag": "YES",
		"taskPriority": "MEDIUM",
		"userName": null,
		"projectName": null,
		"workerGroup": "default",
		"environmentCode": -1,
		"failRetryTimes": 0,
		"failRetryInterval": 1,
		"timeoutFlag": "CLOSE",
		"timeoutNotifyStrategy": null,
		"timeout": 0,
		"delayTime": 0,
		"resourceIds": "",
		"createTime": "2023-08-15 09:33:45",
		"updateTime": "2023-08-15 09:33:45",
		"modifyBy": null,
		"taskGroupId": 0,
		"taskGroupPriority": 0,
		"cpuQuota": -1,
		"memoryMax": -1,
		"taskExecuteType": "BATCH",
		"operator": 1,
		"operateTime": "2023-08-15 09:33:45"
	}],
	"schedule": null
}]

在这里插入图片描述

7.7 上线运行

在这里插入图片描述

在这里插入图片描述

猜你喜欢

转载自blog.csdn.net/ctwy291314/article/details/132273930
今日推荐