2.airflow系列之K8S多Worker节点安装后问题总结

1. 卸载重新安装无法登录问题

还记得install.sh吗,我把创建用户关闭了,并且删除了airflow_plus数据库,以下配置导致不创建默认用户,所以登录不进去,设置为true,重新install即可

# 关闭创建用户job
--set webserver.defaultUser.enabled=false \

2. Worker节点重启导致任务失败问题

原因为自定义安装依赖,版本冲突导致。检查重启Worker所在pod的事件,查看信息如下

"/home/airflow/.local/lib/python3.7/site-packages/kombu/entity.py", line 7, in <module> from .serialization import prepare_accept_content File "/home/airflow/.local/lib/python3.7/site-packages/kombu/serialization.py", line 440, in <module> for ep, args in entrypoints('kombu.serializers'): # pragma: no cover File "/home/airflow/.local/lib/python3.7/site-packages/kombu/utils/compat.py", line 82, in entrypoints for ep in importlib_metadata.entry_points().get(namespace, []) AttributeError: 'EntryPoints' object has no attribute 'get'

检查依赖,把依赖版本固定

importlib-metadata==4.13.0

3. 动态安装依赖问题

  • 对于单worker节点,可以每次任务前前执行安装方法,但是比较麻烦,不实用:
import logging
import os
log = logging.getLogger(__name__)

def install():
    log.info("begin install requirements")
    os.system("pip install -r /opt/airflow/dags/repo/dags/requirements.txt")
    os.system("pip install -I my_utils")
    log.info("finish install requirements")

DAG中的第一个TASK具体执行文件开头引入,这样可以在WebServer Log中可以看到具体日志

import sys
sys.path.insert(0, '/opt/airflow/dags/repo')
import dags.install as install
install.install()
  • 对于多worker节点,由于每个任务中下一过程流转可能运行在不同worker上面,所以无解,最简便手工安装

4. Triggerer一直启动中无法定时调度问题

{triggerer_job.py:101} INFO - Starting the triggerer
[2023-03-17T13:47:30.947+0000] {triggerer_job.py:348} ERROR - Triggerer's async thread was blocked for 0.36 seconds, likely by a badly-written trigger. Set PYTHONASYNCIODEBUG=1 to get more information on overrunning coroutines.
[2023-03-17T22:15:17.394+0000] {triggerer_job.py:348} ERROR - Triggerer's async thread was blocked for 0.27 seconds, likely by a badly-written trigger. Set PYTHONASYNCIODEBUG=1 to get more information on overrunning coroutines.

应该是版本问题,我把版本改为了2.2.1,清空数据库,uninstall后重新安装即可【这个问题排查了好久,无奈】,注意在values.yaml中写版本,在install.sh中的版本去掉,不然不生效

airflowVersion: 2.2.1
defaultAirflowTag: 2.2.1
config:
  core:
    dags_folder: /opt/airflow/dags/repo/dags
    hostname_callable: airflow.utils.net.get_host_ip_address

5. .airflowignore忽略文件问题

.airflowignore文件用户忽略检查非DAG文件,放置在dags_folders下,该配置在values.yaml中,我配置的是/opt/airflow/dags/repo/dags,所以放在此处

如配置忽略merge下面的py非DAG文件

jh/merge/*

此时我们在jh目录下就不能命名为merge_dag.py文件,否则在WebServer中不显示该DAG,我们统一改为命名前缀为dag_解决此问题

6. 挂载问题

笔者试过把worker节点进行挂载到cephfs目录,挂载成功了,但是worker节点一直阻塞不成功,放弃中间表为csv方式,改为写入ClickHouse

7. Scheduler 401权限问题

确保Scheduler的部署executor标签为CeleryKubernetesExecutor,如果是其他会出现该问题

8. Scheduler与Triggerer健康检查问题

Liveness probe failed: No alive jobs found

修改values.yaml,修改存活检查配置重新install即可

scheduler:
  livenessProbe:
    command: ["bash", "-c", "airflow jobs check --job-type SchedulerJob --allow-multiple --limit 100"]
triggerer:
  livenessProbe:
    command: ["bash", "-c", "airflow jobs check --job-type TriggererJob --allow-multiple --limit 100"]

希望对你有所帮助,欢迎关注公众号算法小生

猜你喜欢

转载自blog.csdn.net/SJshenjian/article/details/129643119