3-TECS OpenStack (V7.23.40) Fault Information Collection
3-TECS OpenStack (V7.23.40) Fault Information Collection
Version: V7.23.40
ZTE CORPORATION
ZTE Plaza, Keji Road South, Hi-Tech Industrial Park,
Nanshan District, Shenzhen, P.R.China
Postcode: 518057
Tel: +86-755-26771900
URL: http://support.zte.com.cn
E-mail: support@zte.com.cn
LEGAL INFORMATION
Copyright 2024 ZTE CORPORATION.
The contents of this document are protected by copyright laws and international treaties. Any reproduction or
distribution of this document or any portion of this document, in any form by any means, without the prior written
consent of ZTE CORPORATION is prohibited. Additionally, the contents of this document are protected by
All company, brand and product names are trade or service marks, or registered trade or service marks, of ZTE
This document is provided as is, and all express, implied, or statutory warranties, representations or conditions are
disclaimed, including without limitation any implied warranty of merchantability, fitness for a particular purpose,
title or non-infringement. ZTE CORPORATION and its licensors shall not be liable for damages resulting from the
ZTE CORPORATION or its licensors may have current or pending intellectual property rights or applications
covering the subject matter of this document. Except as expressly provided in any written license between ZTE
CORPORATION and its licensee, the user of this document shall not acquire any license to the subject matter
herein.
ZTE CORPORATION reserves the right to upgrade or make technical change to this product without further notice.
Users may visit the ZTE technical support website http://support.zte.com.cn to inquire for related information.
delivered together with this product of ZTE, the embedded software must be used as only a component of this
product. If this product is discarded, the licenses for the embedded software must be void either and must not be
transferred. ZTE will provide technical support for the embedded software of this product.
Revision History
I
II
Chapter 1
Fault Symptom
Identification
Description
For the impact and symptoms of the two types of faults, refer to the following table.
Service related fault The services of all NEs Services cannot be read or written properly.
symptoms are abnormal. The links between internal modules of an NE are
Some service modules of broken or the link between an NE and the peer
some NEs are abnormal. end is broken.
A single NE or module is VMs are restarted improperly.
faulty.
Resource pool related All nodes in the resource A compute node reports a node port down alarm,
fault symptoms pool are abnormal. a node fault alarm or a network disconnection
An error occurs on a alarm.
specific compute node. The VM reports a kernel fault or port down alarm.
A compute node in a The storage pool reports a degrade or
module (under a pair of unavailable alarm or a disk unavailable alarm.
TORs) is abnormal. A hardware alarm related to memory, CPU,
The compute domain and power supply, or temperature is raised.
administrative domain
are not affected, but
the storage domain is
abnormal.
Compute nodes are
not affected, and the
To collect operation information, you need to check the on-site environment and know the
operation information of your devices and surrounding devices within one week before the fault
occurs.
Infrastructure Check
Check whether there are engineering operations one week before the fault occurs, such
as server firmware upgrade, operating system upgrade, TECS version upgrade, patch
installation, compute node expansion, storage node expansion, disk array expansion, and
networking reconstruction.
Check whether any data is modified one hour before the fault occurs, such as process
restart, kill service, file deletion and configuration modification.
Check whether the bearer and data communication equipment (such as EOR, TOR, DCGW
and CE) perform engineering operation or has a fault one week before the fault occurs.
In accordance with on-site situation, determine whether the fault is caused by construction.
For example, a cable in the system is disconnected by mistake.
Based on alarm management and the status of indicators on boards, determine the
operational status of the system, especially the status of internal cable connections.
Check the environment in the equipment room to see whether there is any environmental
problem, including temperature, humidity, air conditioner, cabinet voltage, current, and
power supply to the shelf (including server, disk array, and switch) problems.
Check whether the supporting NEs (such as SDNC and third-party storage devices) perform
engineering operations or have faults one week before the fault occurs.
Collection Method
1. Log in to the Provider, and select DevOps Mgmt→ Alarm Management from the menu bar.
The Alarm Management page is displayed.
2. Click Current Alarm, and click Export to export the current alarms. See Figure 3-1.
3. Click History Alarm, select a period, and click Export to export the TECS historical alarms.
See Figure 3-2.
4. Click Notification, select the occurrence time, and click Export to export the notifications.
See Figure 3-3.
Collection Method
Collection Method
For the controller node, directly log in to the corresponding node, and download the OS log
as needed to the local through SFTP. In the case of multiple groups of controller nodes, get
the OpenStack component logs from the corresponding group of controller nodes.
For a compute node, copy the OS logs as needed to the controller node, and then download
them from the controller node.
For how to log in to the node and how to run the copy command, refer to 4 Logging In to a
Node Through CLI.
Collection Method
Collection Method
1. Log in to the Provider, and select DevOps Mgmt→Log Management→System Log. The
System Log page is displayed.
2. Select the time range for the logs to be retrieved and click Query to query the Provider logs
within the time range. Click AllDownload to download all Provider logs. See Figure 3-6.
Collection Method
For the controller node, directly log in to the corresponding node, and download the Nova
logs as need to the local through SFTP. In the case of multiple groups of controller nodes,
get the OpenStack component logs from the corresponding group of controller nodes.
For a compute node, copy the Nova logs as needed to the controller node, and then
download them from the controller node.
For how to log in to the nodes and how to run the copy command, refer to 4 Logging In to a
Node Through CLI.
Collection Method
Log in to the controller node directly, and download the Cinder logs as needed to the
local through SFTP. In the case of multiple groups of controller nodes, get the OpenStack
component logs from the corresponding group of controller nodes.
For how to log in the node, refer to 4 Logging In to a Node Through CLI.
Collection Method
For the controller node, log in to the controller node directly, and download the Neutron logs
as needed to the local through SFTP. In the case of multiple groups of controller nodes, get
the OpenStack component logs from the corresponding group of controller nodes.
For the compute node, copy the Neutron logs as needed to the controller node, and then
download them from the controller node.
For how to log in to the nodes and how to run the copy command, refer to 4 Logging In to a
Node Through CLI.
Collection Method
For the controller node, log in to the controller node directly, and download the Zabbix logs as
needed to the local through SFTP. In the case of multiple groups of controller nodes, get the
OpenStack component logs from the corresponding group of controller nodes.
For how to log in the node, refer to 4 Logging In to a Node Through CLI.
Collection Method
Copy the Dvs logs as needed to the controller node, log in to the controller node, and download
the logs to the local through SFTP. In the case of multiple groups of controller nodes, get the
OpenStack component logs from the corresponding group of controller nodes.
For how to log in to the nodes and how to run the copy command, refer to 4 Logging In to a
Node Through CLI.
Collection Method
1. Check whether the message queue is blocked. Log in to the controller node through SSH
and run the rabbitmqctl list_queues | awk '$2>10’ command.
2. Collect logs: Copy the RabbitMQ logs as needed to the controller node, and log in to
the controller node to download the logs to the local computer through SFTP. In the
case of multiple groups of controller nodes, get the OpenStack component logs from the
corresponding group of controller nodes.
For how to log in to the nodes and how to run the copy command, refer to 4 Logging In to a
Node Through CLI.
This chapter describes the command line operations on the controller node or compute node of
the TECS OpenStack system for collecting logs.
Steps
1. Start the SSH Client tool and enter the IP address and username of the node for login, for
example, vtu.
2. Some commands require high-level user authority. Because vtu is a common user, some
operation commands cannot be executed. You can switch to other users.
[root@host-2025-10-93-131--85 ~]#
3. In the scenario where only the controller node is reachable, you cannot directly log in to
the compute node through the SSH protocol. In this case, you can indirectly access the
compute node through the controller node.
4. Run the following command to copy logs from the compute node to the controller node.
5. Run the ctrlinfo command on the first group of controller nodes to query the controller node
groups that each service runs on.
====ctrlvm info====
[vm_hosts]
host_names: NFV-D-OPENLAB-01A-2C210-JE08-M-SRV-02,NFV-D-OPENLAB-01A-2C210-JE08-M
-SRV-03,NFV-D-OPENLAB-01A-2C210-JE08-M-SRV-01
host_ips: 2409:8086:8412:b::202,2409:8086:8412:b::203,2409:8086:8412:b::201
[default_controller]
ctrlvm_names: NFV-D-OPENLAB-01A-2C210-JE08-M-SRV-03-VM,NFV-D-OPENLAB-01A-2C210-
JE08-M-SRV-01-VM,NFV-D-OPENLAB-01A-2C210-JE08-M-SRV-02-VM
ctrlvm_ips: 2409:8086:8412:b::103,2409:8086:8412:b::101,2409:8086:8412:b::102
roles: mariadb,common-agent,controller_heat,common-server,memcached,ntp_server,
provider,swift,amqp,dns_server,freezer_client,controller_telemetry,tacker,
controller_glance,controller_camellia,controller_keystone,freezer_server,
cinder_volume,controller_cinder,mongodb,controller_neutron,controller_nova,
controller_ironic,reverse_proxy,controller_barbican
====ctrlhost info====
'name': u'NFV-D-OPENLAB-01A-2C210-JE08-M-SRV-02'}
'name': u'NFV-D-OPENLAB-01A-2C210-JE08-M-SRV-03'}
'name': u'NFV-D-OPENLAB-01A-2C210-JE08-M-SRV-01'}
[root@NFV-D-OPENLAB-01A-2C210-JE08-M-SRV-01-VM tecs]#
6. Go to the target controller node group, and run the crm_mon -1 command on this group of
controller nodes to query the specific controller node corresponding to the target service .
Stack: corosync
Version: 1.1.20.40311-5.el7-f2d0cbc
3 Nodes configured
VM NFV-D-OPENLAB-01A-2C210-JE08-M-SRV-03-VM ]
-SRV-01-VM