如何解决在Kubernetes上执行气流运行和回填命令时出现路径问题
版本
- 气流:1.10.7
- Kubernetes:1.14.9
设置
Airflow配置为使用Kubernetes执行器;正常运行就可以了;
通过PersistentVolume
和PersistentVolumeClaim
规范定义的EFS卷访问数据和日志;
我有以下k8s规格,我想使用它运行回填作业;
apiVersion: v1
kind: Pod
metadata:
name: backfill-test
namespace: airflow
spec:
serviceAccountName: airflow-service-account
volumes:
- name: airflow-dags
persistentVolumeClaim:
claimName: airflow-dags
- name: airflow-logs
persistentVolumeClaim:
claimName: airflow-logs
containers:
- name: somename
image: myimage
volumeMounts:
- name: airflow-dags
mountPath: /usr/local/airflow/dags
readOnly: true
- name: airflow-logs
mountPath: /usr/local/airflow/logs
readOnly: false
env:
- name: AIRFLOW__CORE__EXECUTOR
value: KubernetesExecutor
- name: AIRFLOW__KUBERNETES__NAMESPACE
value: airflow
- name: AIRFLOW__CORE__DAGS_FOLDER
value: dags
- name: AIRFLOW__CORE__BASE_LOG_FOLDER
value: logs
# - name: AIRFLOW__KUBERNETES__DAGS_VOLUME_MOUNT_POINT
# value: /usr/local/airflow/dags
- name: AIRFLOW__KUBERNETES__DAGS_VOLUME_SUBPATH
value: dags
- name: AIRFLOW__KUBERNETES__LOGS_VOLUME_SUBPATH
value: logs
- name: AIRFLOW__KUBERNETES__DAGS_VOLUME_CLAIM
value: airflow-dags
- name: AIRFLOW__KUBERNETES__LOGS_VOLUME_CLAIM
value: airflow-logs
- name: AIRFLOW__KUBERNETES__WORKER_CONTAINER_REPOSITORY
value: someimage_uri
- name: AIRFLOW__KUBERNETES__WORKER_CONTAINER_TAG
value: latest
- name: AIRFLOW__KUBERNETES__WORKER_SERVICE_ACCOUNT_NAME
value: airflow-service-account
- name: AIRFLOW_HOME
value: usr/local/airflow
# command: ["sleep","1h"]
command: ["airflow","backfill","my_dag",# # "--subdir",".",# "--local","--task_regex","my_task_task","--start_date","2020-07-01T00:00:00","--end_date","2020-08-01T00:00:00"]
restartPolicy: Never
问题
将任务添加到队列时,与此相关的问题似乎是一些路径问题
在运行初始命令时,cli会找到dag及其相关任务;
airflow@backfill-test:~$ airflow run my_dag my_task 2020-07-01T01:15:00+00:00 --local --raw --force
[2020-08-27 23:14:42,038] {__init__.py:51} INFO - Using executor KubernetesExecutor
[2020-08-27 23:14:42,040] {dagbag.py:403} INFO - Filling up the DagBag from /usr/local/airflow/dags
Running %s on host %s <TaskInstance: my_dag.my_task 2020-07-01T01:15:00+00:00 [failed]> backfill-test
但是,该任务会通过一些奇怪的路径添加到队列中。实际任务执行尝试的日志如下。
[2020-08-27 23:14:43,019] {taskinstance.py:867} INFO - Starting attempt 3 of 2
[2020-08-27 23:14:43,019] {taskinstance.py:868} INFO -
--------------------------------------------------------------------------------
[2020-08-27 23:14:43,043] {taskinstance.py:887} INFO - Executing <Task(PostgresOperator): my_task> on 2020-07-01T01:15:00+00:00
[2020-08-27 23:14:43,046] {standard_task_runner.py:52} INFO - Started process 191 to run task
[2020-08-27 23:14:43,085] {logging_mixin.py:112} INFO - [2020-08-27 23:14:43,085] {dagbag.py:403} INFO - Filling up the DagBag from /usr/local/airflow/dags/usr/local/airflow/my_dag.py
[2020-08-27 23:14:53,006] {logging_mixin.py:112} INFO - [2020-08-27 23:14:53,006] {local_task_job.py:103} INFO - Task exited with return code 1
在初始命令中添加--subdir
实际上不会传播到任务队列,并导致相同的日志输出。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。