0% found this document useful (0 votes)
62 views17 pages

3-TECS OpenStack (V7.23.40) Fault Information Collection

Uploaded by

khidr.gadora
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views17 pages

3-TECS OpenStack (V7.23.40) Fault Information Collection

Uploaded by

khidr.gadora
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

TECS OpenStack

Tulip Elastic Cloud System OpenStack


Fault Information Collection

Version: V7.23.40

ZTE CORPORATION
ZTE Plaza, Keji Road South, Hi-Tech Industrial Park,
Nanshan District, Shenzhen, P.R.China
Postcode: 518057
Tel: +86-755-26771900
URL: http://support.zte.com.cn
E-mail: support@zte.com.cn
LEGAL INFORMATION
Copyright 2024 ZTE CORPORATION.
The contents of this document are protected by copyright laws and international treaties. Any reproduction or
distribution of this document or any portion of this document, in any form by any means, without the prior written

consent of ZTE CORPORATION is prohibited. Additionally, the contents of this document are protected by

contractual confidentiality obligations.

All company, brand and product names are trade or service marks, or registered trade or service marks, of ZTE

CORPORATION or of their respective owners.

This document is provided as is, and all express, implied, or statutory warranties, representations or conditions are

disclaimed, including without limitation any implied warranty of merchantability, fitness for a particular purpose,

title or non-infringement. ZTE CORPORATION and its licensors shall not be liable for damages resulting from the

use of or reliance on the information contained herein.

ZTE CORPORATION or its licensors may have current or pending intellectual property rights or applications

covering the subject matter of this document. Except as expressly provided in any written license between ZTE
CORPORATION and its licensee, the user of this document shall not acquire any license to the subject matter

herein.

ZTE CORPORATION reserves the right to upgrade or make technical change to this product without further notice.

Users may visit the ZTE technical support website http://support.zte.com.cn to inquire for related information.

The ultimate right to interpret this product resides in ZTE CORPORATION.

Statement on the Use of Third-Party Embedded Software:


If third-party embedded software such as Oracle, Sybase/SAP, Veritas, Microsoft, VMware, and Redhat is

delivered together with this product of ZTE, the embedded software must be used as only a component of this
product. If this product is discarded, the licenses for the embedded software must be void either and must not be

transferred. ZTE will provide technical support for the embedded software of this product.

Revision History

Revision No. Revision Date Revision Reason


R1.0 2024-01-20 First edition

Serial Number: SJ-20240124113225-027

Publishing Date: 2024-01-20 (R1.0)


Contents
1 Fault Symptom Identification................................................................................1
2 Operating Environment Collection....................................................................... 3
3 Information Collection........................................................................................... 4
3.1 Collecting Alarm Information............................................................................................... 4
3.2 Collecting Operation Logs................................................................................................... 6
3.3 Collecting OS Logs............................................................................................................. 6
3.4 Collecting Daisy Logs..........................................................................................................7
3.5 Collecting Provider Logs..................................................................................................... 8
3.6 Collecting Nova Logs.......................................................................................................... 8
3.7 Collecting Cinder Logs........................................................................................................ 9
3.8 Collecting Neutron Logs...................................................................................................... 9
3.9 Collecting Zabbix Logs...................................................................................................... 10
3.10 Collecting DVS Logs....................................................................................................... 10
3.11 Collecting RabbitMQ Logs...............................................................................................11
4 Logging In to a Node Through CLI.................................................................... 12

I
II
Chapter 1
Fault Symptom
Identification
Description

Symptoms of faults are classified into the following two types:


 Service related fault symptoms
 Resource pool related fault symptoms

Fault Scope and Symptoms

For the impact and symptoms of the two types of faults, refer to the following table.

Type Scope Symptom

Service related fault  The services of all NEs  Services cannot be read or written properly.
symptoms are abnormal.  The links between internal modules of an NE are
 Some service modules of broken or the link between an NE and the peer
some NEs are abnormal. end is broken.
 A single NE or module is  VMs are restarted improperly.
faulty.

Resource pool related  All nodes in the resource  A compute node reports a node port down alarm,
fault symptoms pool are abnormal. a node fault alarm or a network disconnection
 An error occurs on a alarm.
specific compute node.  The VM reports a kernel fault or port down alarm.
 A compute node in a  The storage pool reports a degrade or
module (under a pair of unavailable alarm or a disk unavailable alarm.
TORs) is abnormal.  A hardware alarm related to memory, CPU,
 The compute domain and power supply, or temperature is raised.
administrative domain
are not affected, but
the storage domain is
abnormal.
 Compute nodes are
not affected, and the

SJ-20240124113225-027 | 2024-01-20 (R1.0) 1


TECS OpenStack Fault Information Collection

Type Scope Symptom


management system is
abnormal.

2 SJ-20240124113225-027 | 2024-01-20 (R1.0)


Chapter 2
Operating Environment
Collection
Description

To collect operation information, you need to check the on-site environment and know the
operation information of your devices and surrounding devices within one week before the fault
occurs.

Operating Environment Checks

Infrastructure Check

 Check whether there are engineering operations one week before the fault occurs, such
as server firmware upgrade, operating system upgrade, TECS version upgrade, patch
installation, compute node expansion, storage node expansion, disk array expansion, and
networking reconstruction.
 Check whether any data is modified one hour before the fault occurs, such as process
restart, kill service, file deletion and configuration modification.
 Check whether the bearer and data communication equipment (such as EOR, TOR, DCGW
and CE) perform engineering operation or has a fault one week before the fault occurs.

Auxiliary Device Check

 In accordance with on-site situation, determine whether the fault is caused by construction.
For example, a cable in the system is disconnected by mistake.
 Based on alarm management and the status of indicators on boards, determine the
operational status of the system, especially the status of internal cable connections.
 Check the environment in the equipment room to see whether there is any environmental
problem, including temperature, humidity, air conditioner, cabinet voltage, current, and
power supply to the shelf (including server, disk array, and switch) problems.
 Check whether the supporting NEs (such as SDNC and third-party storage devices) perform
engineering operations or have faults one week before the fault occurs.

SJ-20240124113225-027 | 2024-01-20 (R1.0) 3


Chapter 3
Information Collection
Table of Contents
Collecting Alarm Information........................................................................................................ 4
Collecting Operation Logs............................................................................................................ 6
Collecting OS Logs...................................................................................................................... 6
Collecting Daisy Logs...................................................................................................................7
Collecting Provider Logs.............................................................................................................. 8
Collecting Nova Logs................................................................................................................... 8
Collecting Cinder Logs................................................................................................................. 9
Collecting Neutron Logs...............................................................................................................9
Collecting Zabbix Logs...............................................................................................................10
Collecting DVS Logs.................................................................................................................. 10
Collecting RabbitMQ Logs......................................................................................................... 11

3.1 Collecting Alarm Information


Collection Content

 Current alarms: all alarms not cleared


 History alarms: alarms cleared within one day
 Notifications: notifications reported within one day

Collection Method

1. Log in to the Provider, and select DevOps Mgmt→ Alarm Management from the menu bar.
The Alarm Management page is displayed.
2. Click Current Alarm, and click Export to export the current alarms. See Figure 3-1.

4 SJ-20240124113225-027 | 2024-01-20 (R1.0)


3 Information Collection

Figure 3-1 Exporting Current Alarms

3. Click History Alarm, select a period, and click Export to export the TECS historical alarms.
See Figure 3-2.

Figure 3-2 Exporting Historical Alarms

4. Click Notification, select the occurrence time, and click Export to export the notifications.
See Figure 3-3.

Figure 3-3 Exporting Notifications

SJ-20240124113225-027 | 2024-01-20 (R1.0) 5


TECS OpenStack Fault Information Collection

3.2 Collecting Operation Logs


Collection Content

Log Path Node

Operation logs on the Provider Log Management page Faulty node

Command execution records /var/log/tfg/shellrecord.log Faulty node

Collection Method

1. Get the operation logs on the Provider.


Log in to the Provider, and select DevOps Mgmt→Log Management→Operation Log .
The Operation Log tab is displayed. Click Export to export the operation logs. See Figure
3-4.

Figure 3-4 Exporting Operation Logs

2. Get the command execution records on the back-end node.


 For the controller node, log in to the controller node directly, and download the operation
logs as needed to the local through SFTP. In the case of multiple groups of controller
nodes, get the OpenStack component logs from the corresponding group of controller
nodes.
 For a compute node, copy the operation logs as needed to the controller node, and then
download them from the controller node.
For how to log in to the back end and how to run the copy command, refer to 4 Logging In to a
Node Through CLI.

3.3 Collecting OS Logs


Collection Content

Log Path Node

OS startup and operation logs /var/log/message Faulty node

6 SJ-20240124113225-027 | 2024-01-20 (R1.0)


3 Information Collection

Log Path Node

Logs related to memory/CPU /var/log/mcelog Faulty node


hardware errors

Collection Method

 For the controller node, directly log in to the corresponding node, and download the OS log
as needed to the local through SFTP. In the case of multiple groups of controller nodes, get
the OpenStack component logs from the corresponding group of controller nodes.
 For a compute node, copy the OS logs as needed to the controller node, and then download
them from the controller node.
For how to log in to the node and how to run the copy command, refer to 4 Logging In to a
Node Through CLI.

3.4 Collecting Daisy Logs


Collection Content

Log Path Node

Daisy logs Commands Controller node

Collection Method

1. Log in to the primary Daisy node.


2. Enter the Daisy container and run the following commands:
 docker ps
 Docker-manage enter <Daisy container name >
 cd /var/log/Daisy
 tar -czvf Daisy_all.tar.gz *.*
 exit
3. Copy Daisy_all.tar.gz inside the container to the /home/tecs directory outside the
container. Name of the docker cp Daisy container: /var/log/Daisy/Daisy_all.tar.
gz/home/tecs/. See Figure 3-5.

Figure 3-5 Daisy Container

SJ-20240124113225-027 | 2024-01-20 (R1.0) 7


TECS OpenStack Fault Information Collection

4. Log in to the primary Daisy node and download /home/tecs/Daisy_all.tar.gz to the


local through SFTP.

3.5 Collecting Provider Logs


Collection Content

Log Path Node

Provider logs O&M Log page Controller node

Collection Method

1. Log in to the Provider, and select DevOps Mgmt→Log Management→System Log. The
System Log page is displayed.
2. Select the time range for the logs to be retrieved and click Query to query the Provider logs
within the time range. Click AllDownload to download all Provider logs. See Figure 3-6.

Figure 3-6 Exporting Provider Logs

3.6 Collecting Nova Logs


Collection Content

Log Path Node

Nova scheduler logs /var/log/nova/nova-scheduler.log Controller node

Nova API request logs /var/log/nova/nova-api.log Controller node

Nova conductor logs /var/log/nova/nova-conductor.log Controller node

VM lifecycle management logs /var/log/nova/nova-compute.log Compute node

VM/libvirtd logs /var/log/libvirt/libvirtd.log Compute node

8 SJ-20240124113225-027 | 2024-01-20 (R1.0)


3 Information Collection

Collection Method

 For the controller node, directly log in to the corresponding node, and download the Nova
logs as need to the local through SFTP. In the case of multiple groups of controller nodes,
get the OpenStack component logs from the corresponding group of controller nodes.
 For a compute node, copy the Nova logs as needed to the controller node, and then
download them from the controller node.
For how to log in to the nodes and how to run the copy command, refer to 4 Logging In to a
Node Through CLI.

3.7 Collecting Cinder Logs


Collection Content

Log Path Node

Cinder API request logs /var/log/cinder/cinder-api.log Controller node

Cinder volume logs /var/log/cinder/cinder-volume.log Controller node

Cinder scheduler logs /var/log/cinder/cinder-scheduler.log Controller node

Collection Method

Log in to the controller node directly, and download the Cinder logs as needed to the
local through SFTP. In the case of multiple groups of controller nodes, get the OpenStack
component logs from the corresponding group of controller nodes.
For how to log in the node, refer to 4 Logging In to a Node Through CLI.

3.8 Collecting Neutron Logs


Collection Content

Log Path Node

Neutron API request logs /var/log/neutron/server.log Controller node

Openswitch related operation logs /var/log/neutron/openvswitch-agen Compute node


t.log

Sriov related operation logs /var/log/neutron/sriov-nic-switch-ag Compute node


ent.log

SJ-20240124113225-027 | 2024-01-20 (R1.0) 9


TECS OpenStack Fault Information Collection

Collection Method

 For the controller node, log in to the controller node directly, and download the Neutron logs
as needed to the local through SFTP. In the case of multiple groups of controller nodes, get
the OpenStack component logs from the corresponding group of controller nodes.
 For the compute node, copy the Neutron logs as needed to the controller node, and then
download them from the controller node.
For how to log in to the nodes and how to run the copy command, refer to 4 Logging In to a
Node Through CLI.

3.9 Collecting Zabbix Logs


Collection Content

Log Path Node

Zabbix server logs /var/log/zabbix/zabbix_server.log Controller node

Collection Method

For the controller node, log in to the controller node directly, and download the Zabbix logs as
needed to the local through SFTP. In the case of multiple groups of controller nodes, get the
OpenStack component logs from the corresponding group of controller nodes.
For how to log in the node, refer to 4 Logging In to a Node Through CLI.

3.10 Collecting DVS Logs


Collection Content

Log Path Node

Ovs operation logs /var/log/openswitch/ovs-vswitchd. Compute node


log

Dvs operation logs /var/log/dvs/dvs-vswitchd.log Compute node

Collection Method

Copy the Dvs logs as needed to the controller node, log in to the controller node, and download
the logs to the local through SFTP. In the case of multiple groups of controller nodes, get the
OpenStack component logs from the corresponding group of controller nodes.
For how to log in to the nodes and how to run the copy command, refer to 4 Logging In to a
Node Through CLI.

10 SJ-20240124113225-027 | 2024-01-20 (R1.0)


3 Information Collection

3.11 Collecting RabbitMQ Logs


Collection Content

 Message queue usage


 RabbitMQ logs (refer to the following table)

Log Path Node

Rabbitmq operation log /var/log/rabbitmq/rabbit@{ Controller node


hostname}.log

Collection Method

1. Check whether the message queue is blocked. Log in to the controller node through SSH
and run the rabbitmqctl list_queues | awk '$2>10’ command.
2. Collect logs: Copy the RabbitMQ logs as needed to the controller node, and log in to
the controller node to download the logs to the local computer through SFTP. In the
case of multiple groups of controller nodes, get the OpenStack component logs from the
corresponding group of controller nodes.
For how to log in to the nodes and how to run the copy command, refer to 4 Logging In to a
Node Through CLI.

SJ-20240124113225-027 | 2024-01-20 (R1.0) 11


Chapter 4
Logging In to a Node
Through CLI
Abstract

This chapter describes the command line operations on the controller node or compute node of
the TECS OpenStack system for collecting logs.

Steps

1. Start the SSH Client tool and enter the IP address and username of the node for login, for
example, vtu.
2. Some commands require high-level user authority. Because vtu is a common user, some
operation commands cannot be executed. You can switch to other users.

[vtu@host-2025-10-93-131--85 ~]$ su - root

Password: //Enter the root user password.

Last login: Mon May 18 17:05:39 CST 2020 on pts/16

[root@host-2025-10-93-131--85 ~]#

Logs on a Compute Node

3. In the scenario where only the controller node is reachable, you cannot directly log in to
the compute node through the SSH protocol. In this case, you can indirectly access the
compute node through the controller node.

[vtu@host-2025-10-93-131--85 ~]#ssh vtu@host-2025-10-93-131--87

// host-2025-10-93-131--87 is the node that you can log in through host-2025-10-93-131--85.

4. Run the following command to copy logs from the compute node to the controller node.

scp xx logs tecs@ controller node:/home/tecs

Logs on One of the Controller Nodes

5. Run the ctrlinfo command on the first group of controller nodes to query the controller node
groups that each service runs on.

12 SJ-20240124113225-027 | 2024-01-20 (R1.0)


4 Logging In to a Node Through CLI

[root@NFV-D-OPENLAB-01A-2C210-JE08-M-SRV-01-VM tecs]# ctrlinfo

====ctrlvm info====

[vm_hosts]

host_names: NFV-D-OPENLAB-01A-2C210-JE08-M-SRV-02,NFV-D-OPENLAB-01A-2C210-JE08-M

-SRV-03,NFV-D-OPENLAB-01A-2C210-JE08-M-SRV-01

host_ips: 2409:8086:8412:b::202,2409:8086:8412:b::203,2409:8086:8412:b::201

[default_controller]

ctrlvm_names: NFV-D-OPENLAB-01A-2C210-JE08-M-SRV-03-VM,NFV-D-OPENLAB-01A-2C210-

JE08-M-SRV-01-VM,NFV-D-OPENLAB-01A-2C210-JE08-M-SRV-02-VM

ctrlvm_ips: 2409:8086:8412:b::103,2409:8086:8412:b::101,2409:8086:8412:b::102

roles: mariadb,common-agent,controller_heat,common-server,memcached,ntp_server,

provider,swift,amqp,dns_server,freezer_client,controller_telemetry,tacker,

controller_glance,controller_camellia,controller_keystone,freezer_server,

cinder_volume,controller_cinder,mongodb,controller_neutron,controller_nova,

controller_ironic,reverse_proxy,controller_barbican

====ctrlhost info====

{'ip': u'2409:8086:8412:b::202', 'id': u'35447b56-114a-4d7a-aff4-ccab76318e81',

'name': u'NFV-D-OPENLAB-01A-2C210-JE08-M-SRV-02'}

{'ip': u'2409:8086:8412:b::203', 'id': u'8ec9f316-eb98-4994-8579-4ee2d6e48d7a',

'name': u'NFV-D-OPENLAB-01A-2C210-JE08-M-SRV-03'}

{'ip': u'2409:8086:8412:b::201', 'id': u'30657365-d3e5-4fba-9dc2-ab306fcf3107',

'name': u'NFV-D-OPENLAB-01A-2C210-JE08-M-SRV-01'}

[root@NFV-D-OPENLAB-01A-2C210-JE08-M-SRV-01-VM tecs]#

6. Go to the target controller node group, and run the crm_mon -1 command on this group of
controller nodes to query the specific controller node corresponding to the target service .

[root@NFV-D-OPENLAB-01A-2C210-JE08-M-SRV-01-VM tecs]# crm_mon -1

Last updated: Thu Apr 15 14:39:22 2021

Last change: Wed Feb 24 17:59:05 2021 via crmd on NFV-D-OPENLAB-01A-2C210-JE08-M-SRV-01-VM

Stack: corosync

Current DC: NFV-D-OPENLAB-01A-2C210-JE08-M-SRV-03-VM (1) - partition with quorum-180408

Version: 1.1.20.40311-5.el7-f2d0cbc

3 Nodes configured

105 Resources configured

Online: [ NFV-D-OPENLAB-01A-2C210-JE08-M-SRV-01-VM NFV-D-OPENLAB-01A-2C210-JE08-M-SRV-02-

VM NFV-D-OPENLAB-01A-2C210-JE08-M-SRV-03-VM ]

nginx_tecsclient_ip (ocf::heartbeat:IPaddr2): Started NFV-D-OPENLAB-01A-2C210-JE08-M

-SRV-01-VM

SJ-20240124113225-027 | 2024-01-20 (R1.0) 13

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy