Heat Stack 创建失败


问题记录

环境:OSP10 1Controller + 4Compute SR-IOV

现象:创建 Stack 时,出现 504 Gateway 错误。

[sqg04@controller-0 ~]$ heat stack-show vnfcs_knife_stack 
...
| stack_status          | CREATE_FAILED                                            |
| stack_status_reason   | Resource CREATE failed: ResourceInError:                 |
|                       | resources.oam_node_in_redundant.resources.server: Went   |
|                       | to status ERROR due to "Message: Exceeded maximum        |
|                       | number of retries. Exceeded max scheduling attempts 3    |
|                       | for instance 950e7b96-b071-4b2c-8007-8f5a8174bd10. Last  |
|                       | exception: 504 Gateway Time-out: The server didn't       |
|                       | respond in time. (HTTP N/A), Code: 500"                  |
...

只含有网络的 Stack 能够创建成功,怀疑是存储的问题。

解决方案

Step1. 按 OpenStack 官方文档创建 Demo Stack

官方文档参考:[Create Stack](https://docs.openstack.org/project-install-guide/orchestration/ocata/launch-instance.html)

[heat-admin@controller-0 ~]$ source overcloudrc
[heat-admin@controller-0 ~]$ openstack image create "cirros" --file cirros-0.3.4-x86_64-disk.img --disk-format qcow2 --container-format bare --public
''
[heat-admin@controller-0 ~]$ openstack image list
+--------------------------------------+-----------------------+--------+
| ID                                   | Name                  | Status |
+--------------------------------------+-----------------------+--------+
| d8508e06-e2a4-4f7c-9812-6737063aa9c8 | robot_test_image      | active |
| 0f2c805b-eda6-4f14-b57e-614d7c58ea63 | 174_upgrade_main      | active |
| 12183805-f185-471d-94a1-7cefdacab617 | 174_Image_for_Upgrade | active |
| 38ed2488-d76c-4a5e-8045-b2c058a7daa9 | knife_ding            | active |
| acf243cb-0a66-4cbd-80ee-e709961a7554 | 5gbts_4.3113.652      | active |
+--------------------------------------+-----------------------+--------+

发现创建 Cirros Image 失败。

Step2. 尝试重启所有 Swift 服务

[heat-admin@controller-0 ~]$ for i in $(systemctl | grep swift | awk '{print $1}'); do sudo systemctl restart $i; done

Step3. 再次上传 Image 成功

[heat-admin@controller-0 ~]$ openstack image create "cirros" --file cirros-0.3.4-x86_64-disk.img --disk-format qcow2 --container-format bare --public --debug
+------------------+------------------------------------------------------------------------------+
| Field            | Value                                                                        |
+------------------+------------------------------------------------------------------------------+
| checksum         | ee1eca47dc88f4879d8a229cc70a07c6                                             |
| container_format | bare                                                                         |
| created_at       | 2018-12-20T02:19:51Z                                                         |
| disk_format      | qcow2                                                                        |
| file             | /v2/images/b4543214-06a4-44fc-bd63-b94c565253a1/file                         |
| id               | b4543214-06a4-44fc-bd63-b94c565253a1                                         |
| min_disk         | 0                                                                            |
| min_ram          | 0                                                                            |
| name             | cirros                                                                       |
| owner            | e51c3027da254fa191aa42f0c46cdccc                                             |
| properties       | direct_url='swift+config://ref1/glance/b4543214-06a4-44fc-bd63-b94c565253a1' |
| protected        | False                                                                        |
| schema           | /v2/schemas/image                                                            |
| size             | 13287936                                                                     |
| status           | active                                                                       |
| tags             |                                                                              |
| updated_at       | 2018-12-20T02:19:53Z                                                         |
| virtual_size     | None                                                                         |
| visibility       | public                                                                       |
+------------------+------------------------------------------------------------------------------+

Step4. 创建 Stack 成功

[heat-admin@controller-0 ~]$ openstack stack list
+--------------------------------------+---------------------------------------+-----------------+----------------------+--------------+
| ID                                   | Stack Name                            | Stack Status    | Creation Time        | Updated Time |
+--------------------------------------+---------------------------------------+-----------------+----------------------+--------------+
| 11fe9f53-6e09-49a6-ae99-95a30956518c | CBAM-e6b45c9d95d04635b4bd54de19301e30 | CREATE_COMPLETE | 2018-12-20T02:23:20Z | None         |
+--------------------------------------+---------------------------------------+-----------------+----------------------+--------------+
[heat-admin@controller-0 ~]$ openstack stack show 11fe9f53-6e09-49a6-ae99-95a30956518c
+-----------------------+-----------------------------------------+
| Field                 | Value                                   |
+-----------------------+-----------------------------------------+
| id                    | 11fe9f53-6e09-49a6-ae99-95a30956518c    |
| stack_name            | CBAM-e6b45c9d95d04635b4bd54de19301e30   |
| description           | Template for create all VNFCs           |
| creation_time         | 2018-12-20T02:23:20Z                    |
| updated_time          | None                                    |
| stack_status          | CREATE_COMPLETE                         |
| stack_status_reason   | Stack CREATE completed successfully     |
...

分析总结

该问题其实已经出现很多次了,从最初 Donna 发现 Image 无法上传删除到后来的 Instance 无法创建,仔细一想都是与存储相关的。原本计划按照官方文档使用 Heat Template 来创建一个 Instance,结果上传 Image 就出现了错误。另外在定位过程中,彩虹说只包含网络的 Stack 能够创建成功,那就没跑了。用一条命令 for i in $(systemctl | grep swift | awk '{print $1}'); do sudo systemctl restart $i; done 重启了所有 Swift 服务(该环境使用 Swift 来存储 Image),问题果然迎刃而解。

但是仔细想想,重启服务这个方案,治标不治本。一套健康的环境,理论上是不会出现这些问题的。想到的第一个方案就是用其他方案来代替 Swift。但是和健哥讨论之后,这是客户对环境的需求,无法修改。那就要从 Swift 本身来着手。在重启 Swift 服务后查看状态,发现部分服务有如下错误:

Dec 20 10:18:40 controller-0.localdomain systemd[1]: Started OpenStack Object Storage (swift) -  Object Updater.
Dec 20 10:18:40 controller-0.localdomain systemd[1]: Starting OpenStack Object Storage (swift) - Object Updater...
Dec 20 10:18:40 controller-0.localdomain liberasurecode[771222]: liberasurecode_instance_create: dynamic linking error libJerasure.so.2: cannot open shared object file: No such file or directory
Dec 20 10:18:40 controller-0.localdomain liberasurecode[771222]: liberasurecode_instance_create: dynamic linking error libJerasure.so.2: cannot open shared object file: No such file or directory
Dec 20 10:18:40 controller-0.localdomain liberasurecode[771222]: liberasurecode_instance_create: dynamic linking error libisal.so.2: cannot open shared object file: No such file or directory
Dec 20 10:18:40 controller-0.localdomain liberasurecode[771222]: liberasurecode_instance_create: dynamic linking error libshss.so.1: cannot open shared object file: No such file or directory

错误显示无法找到共享文件,但是 Swift 功能正常。猜测这可能与 Swift 部署在 Controller 节点上,并且只有一个 Controller 节点有关。如果有多个 Controller 节点,或者 Swift 部署在多个 Compute 节点上,就会有共享文件来进行分布式存储。

另外,在 /var/log/swift/swift.log 文件内也没有找到任何 ERROR,也没有其他思路,问题暂时中断。


发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注