Ansible Playbook 任务重试
默认情况下,Ansible 在任务失败时就会停止这个节点的任务并将节点踢出队列,任务重试的作用就是在任务失败时可以再次执行任务,知道任务成功条件或超时后任务才会结束。
但是有些时候,任务执行失败可能只是暂时的。举个例子,部署了一个服务,服务可能 10s 启动,也可能 15s 启动,需要在服务启动后才能继续后边的任务,Ansible 支持任务重试,比方说每 5 秒检查一下服务是否启动,检查到服务启动后再继续后边的任务。
下边这个 YAML 内容是通过 Shell 生成一个随机数,当随机数可以被三整除时任务判断成功,不能被整除时重新执行任务,最多重复 10 次。
---
- name: Test playbook with loop + until
hosts: localhost
gather_facts: false
vars:
users:
- alice
- bob
- charlie
tasks:
- name: Simulate task that succeeds randomly per item
shell: |
echo "Processing {{ item }}"
if [ $((RANDOM % 3)) -eq 0 ]; then
echo "OK: {{ item }}"
else
echo "FAIL: {{ item }}"
exit 1
fi
register: result
until: result.stdout is search("OK")
retries: 10
delay: 1
timeout: 10
loop: "{{ users }}"
- name: Show result per user
debug:
msg: "{{ item.item }} => {{ item.stdout }}"
loop: "{{ result.results }}"
下表是任务重试可选参数:
| 参数 | 类型 | 默认值 | 作用 | 应用于 |
|---|---|---|---|---|
until |
表达式 | 无 | 设置任务的成功条件,只有满足时才停止重试 | 每次 loop 项目 |
retries |
整数 | 1 | 最大重试次数(含第一次执行) | 每次 loop 项目 |
delay |
秒数 | 5 | 每次重试之间的等待时间(秒) | 每次 loop 项目 |
timeout |
秒数 | 无 | 每次任务尝试的最大执行时间,超时中止 | 每次尝试 |
poll |
整数 | 0(立即结束) | 轮询间隔(通常配合 async 使用) |
async 异步任务 |
async |
秒数 | 无 | 启用异步任务,设置最大运行时长 | 整个任务 |
until + retries + delay |
一起配合 | 无 | 实现“尝试直到成功”逻辑 | 每个循环项或任务 |
显示个执行结果吧:
[root@remote-host ~]# ansible-playbook /tmp/test.yml
[WARNING]: provided hosts list is empty, only localhost is available. Note that the implicit localhost does not match 'all'
PLAY [Test playbook with loop + until] ************************************************************************************************************************************
TASK [Simulate task that succeeds randomly per item] **********************************************************************************************************************
FAILED - RETRYING: [localhost]: Simulate task that succeeds randomly per item (10 retries left).
changed: [localhost] => (item=alice)
FAILED - RETRYING: [localhost]: Simulate task that succeeds randomly per item (10 retries left).
changed: [localhost] => (item=bob)
FAILED - RETRYING: [localhost]: Simulate task that succeeds randomly per item (10 retries left).
FAILED - RETRYING: [localhost]: Simulate task that succeeds randomly per item (9 retries left).
FAILED - RETRYING: [localhost]: Simulate task that succeeds randomly per item (8 retries left).
FAILED - RETRYING: [localhost]: Simulate task that succeeds randomly per item (7 retries left).
changed: [localhost] => (item=charlie)
TASK [Show result per user] ***********************************************************************************************************************************************
ok: [localhost] => (item={'changed': True, 'stdout': 'Processing alice\nOK: alice', 'stderr': '', 'rc': 0, 'cmd': 'echo "Processing alice"\nif [ $((RANDOM % 3)) -eq 0 ]; then\n echo "OK: alice"\nelse\n echo "FAIL: alice"\n exit 1\nfi\n', 'start': '2025-12-28 22:00:34.733149', 'end': '2025-12-28 22:00:34.738718', 'delta': '0:00:00.005569', 'msg': '', 'invocation': {'module_args': {'_raw_params': 'echo "Processing alice"\nif [ $((RANDOM % 3)) -eq 0 ]; then\n echo "OK: alice"\nelse\n echo "FAIL: alice"\n exit 1\nfi\n', '_uses_shell': True, 'expand_argument_vars': True, 'stdin_add_newline': True, 'strip_empty_ends': True, 'argv': None, 'chdir': None, 'executable': None, 'creates': None, 'removes': None, 'stdin': None}}, 'stdout_lines': ['Processing alice', 'OK: alice'], 'stderr_lines': [], 'failed': False, 'attempts': 2, 'item': 'alice', 'ansible_loop_var': 'item'}) => {
"msg": "alice => Processing alice\nOK: alice"
}
ok: [localhost] => (item={'changed': True, 'stdout': 'Processing bob\nOK: bob', 'stderr': '', 'rc': 0, 'cmd': 'echo "Processing bob"\nif [ $((RANDOM % 3)) -eq 0 ]; then\n echo "OK: bob"\nelse\n echo "FAIL: bob"\n exit 1\nfi\n', 'start': '2025-12-28 22:00:36.377115', 'end': '2025-12-28 22:00:36.381956', 'delta': '0:00:00.004841', 'msg': '', 'invocation': {'module_args': {'_raw_params': 'echo "Processing bob"\nif [ $((RANDOM % 3)) -eq 0 ]; then\n echo "OK: bob"\nelse\n echo "FAIL: bob"\n exit 1\nfi\n', '_uses_shell': True, 'expand_argument_vars': True, 'stdin_add_newline': True, 'strip_empty_ends': True, 'argv': None, 'chdir': None, 'executable': None, 'creates': None, 'removes': None, 'stdin': None}}, 'stdout_lines': ['Processing bob', 'OK: bob'], 'stderr_lines': [], 'failed': False, 'attempts': 2, 'item': 'bob', 'ansible_loop_var': 'item'}) => {
"msg": "bob => Processing bob\nOK: bob"
}
ok: [localhost] => (item={'changed': True, 'stdout': 'Processing charlie\nOK: charlie', 'stderr': '', 'rc': 0, 'cmd': 'echo "Processing charlie"\nif [ $((RANDOM % 3)) -eq 0 ]; then\n echo "OK: charlie"\nelse\n echo "FAIL: charlie"\n exit 1\nfi\n', 'start': '2025-12-28 22:00:41.923074', 'end': '2025-12-28 22:00:41.928383', 'delta': '0:00:00.005309', 'msg': '', 'invocation': {'module_args': {'_raw_params': 'echo "Processing charlie"\nif [ $((RANDOM % 3)) -eq 0 ]; then\n echo "OK: charlie"\nelse\n echo "FAIL: charlie"\n exit 1\nfi\n', '_uses_shell': True, 'expand_argument_vars': True, 'stdin_add_newline': True, 'strip_empty_ends': True, 'argv': None, 'chdir': None, 'executable': None, 'creates': None, 'removes': None, 'stdin': None}}, 'stdout_lines': ['Processing charlie', 'OK: charlie'], 'stderr_lines': [], 'failed': False, 'attempts': 5, 'item': 'charlie', 'ansible_loop_var': 'item'}) => {
"msg": "charlie => Processing charlie\nOK: charlie"
}
PLAY RECAP ****************************************************************************************************************************************************************
localhost : ok=2 changed=1 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
Ansible Playbook 循环控制
Ansibel 通过 loop_control 对 loop 做自定义配置。
跟踪循环进度
通过 index_var 可以获取循环的次数编号(第一次编号为 0)
---
- name: test loop control
hosts: localhost
tasks:
- name: print var with index
ansible.builtin.debug:
msg: "{{ item }} with index {{ index_id }}"
loop:
- Linux
- Mac
- Windows
loop_control:
index_var: index_id
输出:
[root@remote-host ansible]# ansible-playbook test.yml
PLAY [test loop control] ***********************************************************************************************
TASK [Gathering Facts] *************************************************************************************************
ok: [localhost]
TASK [print var with index] ********************************************************************************************
ok: [localhost] => (item=Linux) => {
"msg": "Linux with index 0"
}
ok: [localhost] => (item=Mac) => {
"msg": "Mac with index 1"
}
ok: [localhost] => (item=Windows) => {
"msg": "Windows with index 2"
}
PLAY RECAP *************************************************************************************************************
localhost : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
特殊的循环变量
Ansible 有一组特殊的变量,用于获取执行 loop 时的一些信息:
| 变量名 | 类型 | 说明 |
|---|---|---|
ansible_loop.allitems |
list |
整个循环中的所有元素列表 |
ansible_loop.index |
int |
当前是第几次循环(从 1 开始) |
ansible_loop.index0 |
int |
当前是第几次循环(从 0 开始) |
ansible_loop.revindex |
int |
距离结尾还有几次(从 1 开始) |
ansible_loop.revindex0 |
int |
距离结尾还有几次(从 0 开始) |
ansible_loop.first |
bool |
当前是否是第一次循环(是则为 true) |
ansible_loop.last |
bool |
当前是否是最后一次循环(是则为 true) |
ansible_loop.length |
int |
当前循环总共多少项 |
ansible_loop.previtem |
any |
上一次循环的值(第一轮为空) |
ansible_loop.nextitem |
any |
下一次循环的值(最后一轮为空) |
这里列一个简单的 Playbook,变量可以自行测试.
---
- name: test loop control
hosts: localhost
tasks:
- name: print var with index
ansible.builtin.debug:
msg: |-
If first: {{ ansible_loop.first }}
If last: {{ ansible_loop.last }}
Current index: {{ ansible_loop.index }}
Current index0: {{ ansible_loop.index0 }}
loop:
- Linux
- Mac
- Windows
loop_control:
extended: true
设置
extended: true时,每次循环都会包含完整循环数据的引用结果(ansible_loop.allitems),这回占用更多的内存,可以通过extended_allitems: false来禁用ansible_loop.allitems。
loop_control:
extended: true
extended_allitems: false
嵌套循环引用
直接看 Playbook:
[root@remote-host ansible]# cat include_tasks.yml
- name: print other var and system var
ansible.builtin.debug:
msg: "{{ item }} and {{ system_var }}"
loop:
- a
- b
- c
[root@remote-host ansible]# cat test.yml
---
- name: test loop control
hosts: localhost
tasks:
- name: include tasks
ansible.builtin.include_tasks: include_tasks.yml
loop:
- Linux
- Mac
- Windows
loop_control:
loop_var: system_var
这里有一个 Playbook(test.yml) 通过 ansible.builtin.include_tasks 导入了一个 tasks(include_tasks.yml)。
- 在
test.yml中,对Linux、Mac和Windows进行循环,每次循环导入include_tasks.yml。 - 在
include_tasks.yml中,有一个循环,对a、b和c进行循环并打印变量值。
如果想在 include_tasks.yml 打印 test.yml 循环的变量值就可以使用 loop_var ,loop_var 可以被循环的变量重新设置个变量名,可以看下边的结果输出:
[root@remote-host ansible]# ansible-playbook test.yml
PLAY [test loop control] ***********************************************************************************************
TASK [Gathering Facts] *************************************************************************************************
ok: [localhost]
TASK [include tasks] ***************************************************************************************************
included: /root/ansible/include_tasks.yml for localhost => (item=Linux)
included: /root/ansible/include_tasks.yml for localhost => (item=Mac)
included: /root/ansible/include_tasks.yml for localhost => (item=Windows)
TASK [print other var and system var] **********************************************************************************
ok: [localhost] => (item=a) => {
"msg": "a and Linux"
}
ok: [localhost] => (item=b) => {
"msg": "b and Linux"
}
ok: [localhost] => (item=c) => {
"msg": "c and Linux"
}
TASK [print other var and system var] **********************************************************************************
ok: [localhost] => (item=a) => {
"msg": "a and Mac"
}
ok: [localhost] => (item=b) => {
"msg": "b and Mac"
}
ok: [localhost] => (item=c) => {
"msg": "c and Mac"
}
TASK [print other var and system var] **********************************************************************************
ok: [localhost] => (item=a) => {
"msg": "a and Windows"
}
ok: [localhost] => (item=b) => {
"msg": "b and Windows"
}
ok: [localhost] => (item=c) => {
"msg": "c and Windows"
}
PLAY RECAP *************************************************************************************************************
localhost : ok=7 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
循环所有主机
#hosts (2):
# master1
# worker1
---
- name: Test playbook
hosts: localhost
gather_facts: false
tasks:
- name: loop inventory
debug:
msg: "{{ item }}"
loop: "{{ query('inventory_hostnames', 'all') }}"
- name: loop inventory
debug:
msg: "{{ item }}"
loop: "{{ groups['all'] }}"
- name: loop inventory
debug:
msg: "{{ item }}"
loop: "{{ ansible_play_batch }}"